File size: 4,197 Bytes
451c222
 
 
 
 
bcfb767
e20dc82
451c222
 
 
 
 
702623e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116042e
702623e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116042e
702623e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116042e
702623e
 
116042e
702623e
116042e
702623e
116042e
702623e
116042e
702623e
 
 
116042e
702623e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: MCP Res
emoji: πŸ“Š
colorFrom: red
colorTo: gray
sdk: docker
sdk_version: 1.46.0
app_file: app.py
pinned: false
short_description: Research
---

MedGenesis AI
MedGenesis AI is a biomedical literature discovery workbench that unifies live data from PubMed, arXiv, MyGene.info, ClinicalTrials.gov v2, DisGeNET, openFDA, Open Targets, DrugCentral, UMLS and moreβ€”then lets you explore the evidence in a rich Streamlit interface powered by OpenAI or Gemini LLMs.


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Streamlit UI  (app.py)                    β”‚
β”‚  β€’ Results / Genes / Trials / Graph tabs   β”‚
β”‚  β€’ PDF / CSV export & follow-up Q&A        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚ async calls
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Orchestrator (mcp/orchestrator.py)         β”‚
β”‚  β€’ pulls PubMed, arXiv                     β”‚
β”‚  β€’ keyword extraction (spaCy)              β”‚
β”‚  β€’ fans-out to MyGene, CT.gov v2, UMLS…    β”‚
β”‚  β€’ merges & summarises with LLM            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚ helpers (mcp/*.py)
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ External APIs + local TSV (DrugBank)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜



πŸ”‘ Features
Domain	Source / API	What you get
Literature	PubMed + arXiv	titles, abstracts, authors, year
Gene info	MyGene.info + NCBI Gene	symbol, name, GO, ClinVar, MeSH definitions
Trials	ClinicalTrials.gov v2	NCT ID, phase, status, start date
Disease ↔ gene	DisGeNET	top associations & scores
Drug safety	openFDA, DrugCentral	adverse events, approvals, MoA
Graph edges	Open Targets GraphQL	gene–disease-drug links (+ OT score)
Ontology	UMLS, HPO, Wikidata	concept CUI, phenotype look-ups

πŸš€ Quick start
bash
Copy
Edit
# clone repo
git clone https://github.com/your-org/medgenesis.git
cd medgenesis

# build & run locally
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
streamlit run app.py
app.py starts a Streamlit server on localhost:8501.
Enter a biomedical question (e.g. β€œCRISPR glioblastoma therapy”) and press Run Search πŸš€.

🐳 Docker / Hugging Face Space
The included Dockerfile is CPU-only and downloads the spaCy model at build time:

bash
Copy
Edit
docker build -t medgenesis .
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... medgenesis
HF Spaces: push the repo, set the environment secrets below, and Spaces will pick up Dockerfile.

πŸ” Environment variables
Variable	Description
OPENAI_API_KEY	OpenAI account key (GPT-4o, GPT-4o-mini …)
GEMINI_KEY	Google Generative AI key (Gemini 1.5 Flash)
UMLS_KEY	UMLS Licensing key (ticket auth)
DISGENET_KEY	DisGeNET Bearer token (optional)
PUB_KEY	NCBI E-utils key (optional, boosts quota)
BIO_KEY	NCBI E-utils key for Gene/MeSH (optional)

Set them in .env, your shell, or HF Secrets.

πŸ—ƒοΈ Local data
mcp/data/drugbank_open_structured_drug_links.tsv – DrugBank Open Data
Download from the DrugBank Open-Data page and place it here.

The file is lazy-loaded and cached; the app still works without it.

πŸ§ͺ Tests
bash
Copy
Edit
pytest tests/
Unit tests mock external APIs and verify parsing, caching and orchestrator merges.

πŸ› οΈ Contributing
Fork & create a feature branch.

Follow Conventional Commits for PR titles.

Run pre-commit install to auto-format with black & ruff.

Submit a PR; GitHub Actions will run lint + tests.

πŸ“„ License
Apache 2.0 – free for research and commercial use.
API terms of each external provider still apply.

Happy discovering!