Authors: Jonathan Bisson , James McAlpine, J. Brent Friesen, Shao-Nong Chen, James Graham, Guido F. Pauli
Journal: Journal of Medicinal Chemistry (RoMEO status: White) 59(5), 1671-1690, (2016)
Subjects: Pharmacognosy Phytochemistry Perspectives Fundamental research IMP bioactivity data mining NAPRALERT
High-throughput biology has contributed a wealth of data on chemicals, including natural products (NPs). Recently, attention was drawn to certain, predominantly synthetic, compounds that are responsible for disproportionate percentages of hits but are false actives. Spurious bioassay interference led to their designation as pan-assay interference compounds (PAINS). NPs lack comparable scrutiny, which this study aims to rectify. Systematic mining of 80+ years of the phytochemistry and biology literature, using the NAPRALERT database, revealed that only 39 compounds represent the NPs most reported by occurrence, activity, and distinct activity. Over 50% are not explained by phenomena known for synthetic libraries, and all had manifold ascribed bioactivities, designating them as invalid metabolic panaceas (IMPs). Cumulative distributions of ∼200,000 NPs uncovered that NP research follows power-law characteristics typical for behavioral phenomena. Projection into occurrence–bioactivity–effort space produces the hyperbolic black hole of NPs, where IMPs populate the high-effort base.