MooseML commited on
Commit
7c9c550
·
1 Parent(s): e5868f6

Updated README

Browse files
Files changed (1) hide show
  1. README.md +68 -49
README.md CHANGED
@@ -4,91 +4,110 @@ emoji: 🧬
4
  colorFrom: indigo
5
  colorTo: purple
6
  sdk: docker
7
- app_file: app.py
8
  pinned: false
9
  ---
10
 
 
11
 
12
- This web app uses a trained Graph Neural Network (GNN) to predict HOMO–LUMO energy gaps from molecular SMILES strings. Built with [Streamlit](https://streamlit.io), it enables fast single or batch predictions with visualization.
13
-
14
- ### Live App
15
-
16
- [Click here to launch the app](https://www.willfillinoncedeployed.com)
17
 
 
 
18
 
19
  ---
20
 
21
  ## Features
22
 
23
- - Predict HOMO–LUMO gap for one or many molecules
24
- - Accepts comma-separated SMILES or CSV uploads
25
- - RDKit rendering of molecule structures
26
- - Downloadable CSV of predictions
27
- - Powered by a trained hybrid GNN model with RDKit descriptors
28
 
29
  ---
30
 
31
- ## Usage
32
 
33
- 1. **Input Options**:
34
- - Type one or more SMILES strings separated by commas
35
- - OR upload a `.csv` file with a single column of SMILES
36
 
37
- 2. **Example SMILES**: CC(=O)Oc1ccccc1C(=O)O, C1=CC=CC=C1
 
 
38
 
39
- 3. **CSV Format**:
40
- - One column
41
- - No header
42
- - Each row contains a SMILES string
43
 
44
- 4. **Output**:
45
- - Predictions displayed in-browser (up to 10 molecules shown)
46
- - Full results available for download as CSV
47
 
48
- ---
 
 
 
 
 
 
49
 
50
- ## Project Structure
51
 
52
- streamlit-app/
53
-
54
- ├── app.py # Main Streamlit app
55
- ├── model.py # Hybrid GNN architecture and model loader
56
- ├── utils.py # RDKit and SMILES processing
57
- ├── requirements.txt # Python dependencies
58
- └── predictions.db # SQLite log of predictions
59
 
60
  ---
61
 
62
- ## Requirements
63
 
64
- To run locally:
65
- ```
66
- pip install -r requirements.txt
67
- streamlit run app.py
68
 
69
- ```
70
 
 
71
 
72
- ## Model Info
73
-
74
- The app uses a trained hybrid GNN model combining:
75
 
76
- * AtomEncoder and BondEncoder from OGB
77
- * GINEConv layers from PyTorch Geometric
78
- * Global mean pooling
79
- * RDKit-based physicochemical descriptors
 
 
 
 
80
 
81
- Trained on the [OGB PCQM4Mv2 dataset](https://ogb.stanford.edu/docs/lsc/pcqm4mv2/), optimized using Optuna
82
 
 
83
 
84
- ## Author
85
 
86
- Developed by [Matthew Graham](https://github.com/MooseML)
87
- For inquiries, collaborations, or ideas, feel free to reach out!
 
 
88
 
 
89
 
 
90
 
 
 
 
91
 
 
92
 
 
93
 
 
 
94
 
 
 
 
4
  colorFrom: indigo
5
  colorTo: purple
6
  sdk: docker
7
+ app_file: app.py
8
  pinned: false
9
  ---
10
 
11
+ # HOMO‑LUMO Gap Predictor — Streamlit + Hybrid GNN
12
 
13
+ > **Live demo:** [huggingface.co/spaces/MooseML/homo-lumo-gap-predictor](https://huggingface.co/spaces/MooseML/homo-lumo-gap-predictor)  • 
14
+ > **Code:** <https://github.com/MooseML/homo-lumo-gap-predictor>
 
 
 
15
 
16
+ This web app predicts HOMO–LUMO energy gaps from molecular **SMILES** using a trained hybrid Graph Neural Network (PyTorch Geometric and RDKit descriptors).
17
+ It runs on Hugging Face Spaces via Docker and on any local machine with Docker or a Python ≥ 3.10 environment.
18
 
19
  ---
20
 
21
  ## Features
22
 
23
+ * Predict gaps for **single or batch** inputs (comma / newline SMILES or CSV upload)
24
+ * Shows **up to 20 molecules** per run with RDKit 2‑D depictions
25
+ * Download full predictions as CSV
26
+ * Logs all predictions to a lightweight SQLite DB (`/data/predictions.db`)
27
+ * Containerised environment identical to the public Space
28
 
29
  ---
30
 
31
+ ## Quick start
32
 
33
+ ### 1  Use the hosted Space
 
 
34
 
35
+ 1. Open the [live URL](https://huggingface.co/spaces/MooseML/homo-lumo-gap-predictor).
36
+ 2. Paste SMILES *or* upload a CSV (1 column, no header).
37
+ 3. Click **Run Prediction** → results and structures appear; a CSV is downloadable.
38
 
39
+ > **Heads‑up:** on the free HF tier large files (> ~5 MB) can take 10–30 s to upload because of proxy buffering.
40
+ > Local Docker runs are instant, see below:
 
 
41
 
42
+ ### 2  Run locally with Docker (mirrors the Space)
 
 
43
 
44
+ ```bash
45
+ git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
46
+ cd homo-lumo-gap-predictor
47
+ docker build -t homolumo .
48
+ docker run -p 7860:7860 homolumo
49
+ # open http://localhost:7860
50
+ ````
51
 
52
+ ### 3  Run locally with Python (no Docker)
53
 
54
+ ```bash
55
+ git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
56
+ cd homo-lumo-gap-predictor
57
+ pip install -r requirements.txt
58
+ streamlit run app.py
59
+ # app on http://localhost:8501
60
+ ```
61
 
62
  ---
63
 
64
+ ## Input guidelines
65
 
66
+ | Format | Example |
67
+ | ------------ | ---------------------------------------------- |
68
+ | **Textarea** | `O=C(C)Oc1ccccc1C(=O)O, C1=CC=CC=C1` |
69
+ | **CSV** | One column, no header:<br>`CCO`<br>`Cc1ccccc1` |
70
 
71
+ Invalid or exotic SMILES are skipped and listed in the terminal log (RDKit warnings)
72
 
73
+ ---
74
 
75
+ ## Project files
 
 
76
 
77
+ ```
78
+ .
79
+ ├── app.py – Streamlit front‑end
80
+ ├── model.py – Hybrid GNN loader (PyTorch Geometric)
81
+ ├── utils.py – RDKit helpers & SMILES→graph
82
+ ├── Dockerfile – identical to the Hugging Face Space
83
+ └── requirements.txt
84
+ ```
85
 
86
+ The Docker image creates `/data` (writable, 775) for the persistent SQLite DB when a volume is attached.
87
 
88
+ ---
89
 
90
+ ## Model in brief
91
 
92
+ * **Architecture:** AtomEncoder and BondEncoder → GINEConv layers → global mean pooling → dense head
93
+ * **Descriptors:** six RDKit physico‑chemical features per molecule
94
+ * **Training set:** [OGB PCQM4Mv2](https://ogb.stanford.edu/docs/lsc/pcqm4mv2/)
95
+ * **Optimiser / search:** Optuna hyperparameter sweep
96
 
97
+ ---
98
 
99
+ ## Roadmap
100
 
101
+ * Stream chunked CSV parsing to improve upload speed on the public Space
102
+ * Toggle between 2‑D and 3‑D (3Dmol.js) molecule renderings
103
+ * Serve the model weights from the HF Hub instead of bundling in the image
104
 
105
+ ---
106
 
107
+ ## Author
108
 
109
+ **Matthew Graham** — [@MooseML](https://github.com/MooseML)
110
+ Feel free to open issues or contact me for collaborations.
111
 
112
+ ```
113
+ ```