mgbam commited on
Commit
702623e
Β·
verified Β·
1 Parent(s): 590f907

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -109
README.md CHANGED
@@ -10,120 +10,102 @@ pinned: false
10
  short_description: Research
11
  ---
12
 
13
- MedGenesis AI 🧬
14
- Unified Biomedical Research Assistantβ€”AI, Literature, Safety & Knowledge Graphs
15
-
16
- πŸš€ Overview
17
- MedGenesis AI is the next-generation biomedical research assistant that unifies PubMed, arXiv, UMLS, OpenFDA, and advanced AI (OpenAI GPT-4o) into one seamless platform.
18
- Discover, synthesize, and visualize research, drugs, concepts, and safetyβ€”all in a single app.
19
-
20
- Semantic search across top biomedical databases
21
-
22
- UMLS concept mapping and standardization
23
-
24
- Drug safety insights powered by OpenFDA
25
-
26
- AI-powered summaries and follow-up Q&A
27
-
28
- Interactive knowledge graph for instant relationship exploration
29
-
30
- Export results as PDF or CSV
31
-
32
- Workspace to save and revisit your research sessions
33
-
34
- ✨ Key Features
35
- Unified Search: Find and aggregate results from PubMed and arXiv, ranked by semantic similarity.
36
-
37
- Biomedical Concept Augmentation: Map results to UMLS concepts for standardized knowledge.
38
-
39
- Drug & Safety Insights: Connect literature with latest FDA safety data.
40
-
41
- AI-Powered Synthesis: Summarize, answer questions, and suggest next research steps using GPT-4o.
42
-
43
- Knowledge Graph: Visual, interactive network of papers, drugs, and concepts.
44
-
45
- Export & Workspace: Download data, save sessions, and organize your findings.
46
-
47
- πŸ–₯️ Live Demo
48
- Try the app on Hugging Face Spaces
49
-
50
- 🐳 Docker-Based Deployment
51
- MedGenesis AI uses a Dockerfile for guaranteed reproducibility and robust dependency management.
52
-
53
- Run Locally:
54
  bash
55
- Copy code
56
- git clone https://huggingface.co/spaces/MCP_Res
57
- cd YOUR_REPO
58
- docker build -t medgenesis-ai .
59
- docker run -p 7860:7860 medgenesis-ai
60
- Then open http://localhost:7860 in your browser.
61
-
62
- On Hugging Face Spaces:
63
- Make sure your Dockerfile, requirements.txt, and app.py are present in your repo root.
64
-
65
- Push all code to your Space.
66
-
67
- In the Space settings, select "Docker" as the runtime.
68
-
69
- Your Space will build and launch automatically.
 
70
 
71
- ⚑ Quick Start (Streamlit UI)
72
- Enter a biomedical research question.
73
-
74
- Click Run Search to query all sources.
75
-
76
- Explore:
77
-
78
- Results: Detailed, AI-synthesized findings.
79
-
80
- Knowledge Graph: Interactive visualization.
81
-
82
- Visualizations: Year histogram and more.
83
-
84
- Download results or save to your workspace.
85
-
86
- Ask follow-up questions with the integrated AI assistant.
87
-
88
- πŸ› οΈ Customization
89
- Add new data sources by extending mcp/ utilities.
90
-
91
- Update color/theme in .streamlit/config.toml.
92
-
93
- Want to run in SDK mode? Simply remove the Dockerfile and deploy with requirements.txt as usual.
94
-
95
- Use the built-in workspace to organize your research projects.
96
-
97
- 🀝 Contributing
98
- Pull requests, bug reports, and feature suggestions are welcome!
99
- For large features, open an issue first to discuss your ideas.
100
-
101
- πŸ§‘β€πŸ’» Author
102
- Oluwafemi Idiakhoa
103
-
104
- πŸ“ License
105
- MIT License
106
-
107
- ❀️ Acknowledgments
108
- Hugging Face Spaces
109
-
110
- PubMed, arXiv, OpenFDA, UMLS
111
-
112
- OpenAI GPT-4o
113
-
114
- The open-source biomedical AI community
115
 
116
- πŸ”— Links
117
- MedGenesis AI Space
118
 
119
- OpenFDA
120
 
121
- UMLS
122
 
123
- arXiv
124
 
125
- PubMed
 
 
126
 
127
- Note on Docker
128
- If you’re using Docker, all dependencies (including Jinja2, pyvis) are installed as specified in the Dockerfile and requirements.txtβ€”no extra setup needed.
129
- If deploying with Hugging Face SDK mode, ensure your requirements.txt matches your dependencies.
 
10
  short_description: Research
11
  ---
12
 
13
+ MedGenesis AI
14
+ MedGenesis AI is a biomedical literature discovery workbench that unifies live data from PubMed, arXiv, MyGene.info, ClinicalTrials.gov v2, DisGeNET, openFDA, Open Targets, DrugCentral, UMLS and moreβ€”then lets you explore the evidence in a rich Streamlit interface powered by OpenAI or Gemini LLMs.
15
+
16
+
17
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
18
+ β”‚ Streamlit UI (app.py) β”‚
19
+ β”‚ β€’ Results / Genes / Trials / Graph tabs β”‚
20
+ β”‚ β€’ PDF / CSV export & follow-up Q&A β”‚
21
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
22
+ β”‚ async calls
23
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
24
+ β”‚ Orchestrator (mcp/orchestrator.py) β”‚
25
+ β”‚ β€’ pulls PubMed, arXiv β”‚
26
+ β”‚ β€’ keyword extraction (spaCy) β”‚
27
+ β”‚ β€’ fans-out to MyGene, CT.gov v2, UMLS… β”‚
28
+ β”‚ β€’ merges & summarises with LLM β”‚
29
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
30
+ β”‚ helpers (mcp/*.py)
31
+ β–Ό
32
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
33
+ β”‚ External APIs + local TSV (DrugBank) β”‚
34
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
35
+
36
+
37
+
38
+ πŸ”‘ Features
39
+ Domain Source / API What you get
40
+ Literature PubMed + arXiv titles, abstracts, authors, year
41
+ Gene info MyGene.info + NCBI Gene symbol, name, GO, ClinVar, MeSH definitions
42
+ Trials ClinicalTrials.gov v2 NCT ID, phase, status, start date
43
+ Disease ↔ gene DisGeNET top associations & scores
44
+ Drug safety openFDA, DrugCentral adverse events, approvals, MoA
45
+ Graph edges Open Targets GraphQL gene–disease-drug links (+ OT score)
46
+ Ontology UMLS, HPO, Wikidata concept CUI, phenotype look-ups
47
+
48
+ πŸš€ Quick start
 
 
 
 
 
49
  bash
50
+ Copy
51
+ Edit
52
+ # clone repo
53
+ git clone https://github.com/your-org/medgenesis.git
54
+ cd medgenesis
55
+
56
+ # build & run locally
57
+ python -m venv .venv && source .venv/bin/activate
58
+ pip install -r requirements.txt
59
+ python -m spacy download en_core_web_sm
60
+ streamlit run app.py
61
+ app.py starts a Streamlit server on localhost:8501.
62
+ Enter a biomedical question (e.g. β€œCRISPR glioblastoma therapy”) and press Run Search πŸš€.
63
+
64
+ 🐳 Docker / Hugging Face Space
65
+ The included Dockerfile is CPU-only and downloads the spaCy model at build time:
66
 
67
+ bash
68
+ Copy
69
+ Edit
70
+ docker build -t medgenesis .
71
+ docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... medgenesis
72
+ HF Spaces: push the repo, set the environment secrets below, and Spaces will pick up Dockerfile.
73
+
74
+ πŸ” Environment variables
75
+ Variable Description
76
+ OPENAI_API_KEY OpenAI account key (GPT-4o, GPT-4o-mini …)
77
+ GEMINI_KEY Google Generative AI key (Gemini 1.5 Flash)
78
+ UMLS_KEY UMLS Licensing key (ticket auth)
79
+ DISGENET_KEY DisGeNET Bearer token (optional)
80
+ PUB_KEY NCBI E-utils key (optional, boosts quota)
81
+ BIO_KEY NCBI E-utils key for Gene/MeSH (optional)
82
+
83
+ Set them in .env, your shell, or HF Secrets.
84
+
85
+ πŸ—ƒοΈ Local data
86
+ mcp/data/drugbank_open_structured_drug_links.tsv – DrugBank Open Data
87
+ Download from the DrugBank Open-Data page and place it here.
88
+
89
+ The file is lazy-loaded and cached; the app still works without it.
90
+
91
+ πŸ§ͺ Tests
92
+ bash
93
+ Copy
94
+ Edit
95
+ pytest tests/
96
+ Unit tests mock external APIs and verify parsing, caching and orchestrator merges.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
+ πŸ› οΈ Contributing
99
+ Fork & create a feature branch.
100
 
101
+ Follow Conventional Commits for PR titles.
102
 
103
+ Run pre-commit install to auto-format with black & ruff.
104
 
105
+ Submit a PR; GitHub Actions will run lint + tests.
106
 
107
+ πŸ“„ License
108
+ Apache 2.0 – free for research and commercial use.
109
+ API terms of each external provider still apply.
110
 
111
+ Happy discovering!