[Bjonnh.net]# _

NAPRALERT is a leading database containing >40 years of the natural products literature.


NAPRALERT, acronym of NAtural PRoducts ALERT, is a database created at UIC in 1975 by the late Norman R. Farnsworth. The database aims at being a systematic and curated database of all the research related to Natural Products and Pharmacognosy in general. It includes but is not limited to the coverage of:

  • Traditional use of plants and Natural Products
  • Distribution of compounds produced/present in organisms
  • Distribution of organisms producing/presenting a compound
  • Biological assays in-vitro and in-vivo of organisms, extracts or purified compounds

With its thousands of users and more than 200,000 scientific papers manually annotated, NAPRALERT is a unique trove of information.


My role

The database was running until 2015 on an aging IT infrastructure that was both costly and difficult to maintain and enhance. This is why Guido Pauli recently appointed director of NAPRALERT, James Graham, its current editor, and I decided that it would greatly benefit from a complete rewrite.

When I was given this project, it was running as a .NET application querying a MSSQL database on a contractor’s premises. As the cost of maintaining this resource was prohibitive and reduced our abilities to make it evolve, it was decided to upgrade it.

As the application is mostly a simple CRUD with advanced data-entry (autocompletion, multiple-level of data) and advanced queries, I decided to investigate Ruby On Rails. But after the initial prototype, I decided to go back to a Django-based system as it proved to be much more efficient for the task at hand (to be fair, I knew much more Python than Ruby).

My role was really wide in that project (all of that took around 9 months not full-time):

  • Choose a back-end framework
  • Choose the database
  • Convert the data from MSSQL to the selected database
  • Make sense of how the data was stored, this part is pretty interesting there are a lot of idiosyncrasies. I realized that the Conway’s law really apply.
  • Work with the internal users for a new data-entry approach
  • Write the new application
  • Write the UI
  • Order the infrastructure
  • Configure and secure it
  • Work with the network teams of the university to get our server public facing and get a SMTP server our system can talk to.

Starting in January 2015 and until August 2015, I wrote the new NAPRALERT from scratch using the most Free (as in Free Speech) technologies I was more familiar with:

During that journey, I learned or improved with a lot of other tools and technologies:

  • Ansible, for the easy deployment and test. (thanks kuroishi for introducing me to it)
  • Packer, for creating new Virtual Machines in a reliable manner. (thanks hef for introducing me to it)
  • Docker, for creating easy to toss containers and spinning up instances quickly.
  • Django, as a web framework. (thanks Carl and Sheila for showing me the way)
  • Django autocomplete-light, as an autocomplete solution that really works.
  • Celery, as a task queuing and scheduling system.
  • Semantic UI, as a really nice and easy to use web UI interface.
  • Gunicorn, as a pre-forker reducing load and allowing way better response times (yes now we can handle 1000 times more than our average user charge…)
  • And many other things that would be too long to list.

Where it is now

The service went public on October 2015, and is now happilly serving thousands of users. There is still a lot to do now that we have a nice and shiny new platform to bring all the nice ideas that we compiled. Currently it mostly gets security and admin-side feature upgrades, as most of our resources are focused on the next project that could make this one obsolete.

You can find an example of what can be done with it in my academic profile.

What is its future

Currently, I am working on a different project that should both complement and embrace NAPRALERT. It is based on a new ontology, the Pharmacognosy Ontology, connected to existing ontologies and with a text-mining component based on word-embedding and bayesian models. The idea is to pre-annotate the literature to reduce the burden for our data-entry people. But more on that later as it will benefit from a few pages by itself.

Posters / NAPRALERT, from an historical information silo to a linked resource able to address the new challenges in Natural Products Chemistry and Pharmacognosy. >

Abstract from conference NAPRALERT is a database on natural products, including data on ethnobotany, chemistry, pharmacology, toxicology, and clinical trials from literature dating back to the 19th century. Established in 1975 by Norman R. Farnsworth, it became a web accessible resource in 2005 but soon became stagnant while literature grew exponentially. After a complete rewrite of the platform, the focus is now on connecting this resource to the rest of the existing databases and expanding its usability.
category posters

Posters / Reviving NAPRALERT and Making It Ready For Improvement and New Challenges In Natural Products Chemistry and Pharmacognosy >

NAPRALERT is a database on natural products, including data on the ethnobotany, chemistry, pharmacology, toxicology, and clinical trials. It was established in 1975 by the late Norman R. Farnsworth, at a time when computerized databases were just starting. It became web-accessible in 2005. Due to resource constraints, few enhancements were made to the existing database structure. Now, 10 years later, NAPRALERT faces the challenge of catching-up with other well-established resources.
category posters

authors Jonathan Bisson ORCID , James McAlpine , J. Brent Friesen ORCID , Shao-Nong Chen ORCID , James Graham , Guido F. Pauli ORCID
journal Journal of Medicinal Chemistry (RoMEO status: White)
subjects Pharmacognosy Phytochemistry Perspectives Fundamental research IMP bioactivity data mining NAPRALERT
High-throughput biology has contributed a wealth of data on chemicals, including natural products (NPs). Recently, attention was drawn to certain, predominantly synthetic, compounds that are responsible for disproportionate percentages of hits but are false actives. Spurious bioassay interference led to their designation as pan-assay interference compounds (PAINS). NPs lack comparable scrutiny, which this study aims to rectify. Systematic mining of 80+ years of the phytochemistry and biology literature, using the NAPRALERT database, revealed that only 39 compounds represent the NPs most reported by occurrence, activity, and distinct activity.
categories publications science

Posters / Minimizing the problems with “PIMPs” >

A recent article by Baell(1) on the problems experienced by medicinal chemists with pan-assay interference compounds (PAINS) and Shoichet’s work(2) on the impact of aggregation occurring in high throughput screening libraries, prompts a consideration of how these and other similar problems are experienced by pharmacognosists with promiscuous invalid metabolites as panaceas (PIMPs). Contrary to the classical definition of secondary metabolites as being species specific (or near specific), several natural products, particularly in the more extensively investigated plant kingdom, are common across species, genera, and even families (e.
category posters