[Bjonnh.net]# _

Projects / Lotus search >


category projects techniques Dash Lotus Wikidata Python RDKit

Lotus search

Lotus search is a tool to search for compounds and taxa in the Lotus database.

A view of the search interface for compounds

What is LOTUS?

The LOTUS database is a database of natural products, their associated producing organisms and the bibliographical references in which the discovery or identification was made. It is hosted on Wikidata as the main source of truth.

It is the result of a pretty large collaboration within the LOTUS initiative.

How was the search made?

Initially I used Streamlit for the prototype, it worked quite well, but it was becoming tricky to make a nice interface.

So I switched to using Dash and Dash Bootstrap Components.

To draw molecules, I had to create a component of EPAM’s ketcher that can be found here: https://github.com/lotusnprod/plotly-dash-ketcher.

The chemical search itself is handled by RDKit.

Currently everything is done in memory, new data is gathered from Wikidata every night and the database is rebuilt (it takes only a few minutes), there is a little downtime of a couple seconds when the switch is made, but that’s the price to pay for a simple and cheap deployment.

It is hosted at Hetzner on a tiny ARM machine (4 cores, 8GB RAM) and it works surprisingly well and could run on only half of that.

Contributors

The only direct contributor to the search project is Adriano Rutz. But really the whole initiative is a huge collaboration between a lot of people, go see our website: LOTUS initiative

What’s next?

Two major projects:

  • An API, so that the data can be used in other applications and we can decouple the UI from the data part.
  • Display the references and more information about the compounds and taxa.