Acknowledgement
Repuragent relies on several open-source scientific tools, data, and guidelines. This page summarizes these resources and links to the original sources for proper acknowledgment.
REMEDI4ALL Standard Operating Procedures
The Standard Operating Procedures used in our system are provided by REMEDi4ALL. Visit REMEDi4ALL home page for more details.
REMEDI4ALL Chemical Annotator
For compound annotations, we use the REMEDI4ALL Chemical Annotator, which queries ChEMBL, UniChem, PubChem, and KEGG from SMILES/InChI inputs.
Knowledge Graph Generator (KGG)
We rely on the Knowledge Graph Generator (KGG) from Fraunhofer ITMP to create disease-specific knowledge graphs and extract information from them.
LitSense
Literature grounding relies on LitSense, a PubMed semantic search engine. LitSense indexes titles, abstracts, and full text where available, and combines their semantic representations.
Hugging Face Local Python Executor
The python_executor tool in the data agent was built on the Smolagents Python executor. It helps keep code running safely scoped to a curated import list.
Unstructured
Our SOP RAG system relies on Unstructured for parsing PDF files. Unstructured processes text, tables, and images before chunking and embedding them with the OpenAI embedding model, which is then stored in ChromaDB for semantic search later.
CPSign package and TDC data
Predictive models were trained and evaluated using CPSign. The data for training the models was downloaded from Therapeutics Data Commons