Spaces:
Sleeping
Sleeping
Updated README
Browse files
README.md
CHANGED
@@ -4,91 +4,110 @@ emoji: 🧬
|
|
4 |
colorFrom: indigo
|
5 |
colorTo: purple
|
6 |
sdk: docker
|
7 |
-
app_file: app.py
|
8 |
pinned: false
|
9 |
---
|
10 |
|
|
|
11 |
|
12 |
-
|
13 |
-
|
14 |
-
### Live App
|
15 |
-
|
16 |
-
[Click here to launch the app](https://www.willfillinoncedeployed.com)
|
17 |
|
|
|
|
|
18 |
|
19 |
---
|
20 |
|
21 |
## Features
|
22 |
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
|
29 |
---
|
30 |
|
31 |
-
##
|
32 |
|
33 |
-
1
|
34 |
-
- Type one or more SMILES strings separated by commas
|
35 |
-
- OR upload a `.csv` file with a single column of SMILES
|
36 |
|
37 |
-
|
|
|
|
|
38 |
|
39 |
-
|
40 |
-
|
41 |
-
- No header
|
42 |
-
- Each row contains a SMILES string
|
43 |
|
44 |
-
|
45 |
-
- Predictions displayed in-browser (up to 10 molecules shown)
|
46 |
-
- Full results available for download as CSV
|
47 |
|
48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
-
|
51 |
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
|
60 |
---
|
61 |
|
62 |
-
##
|
63 |
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
|
69 |
-
|
70 |
|
|
|
71 |
|
72 |
-
##
|
73 |
-
|
74 |
-
The app uses a trained hybrid GNN model combining:
|
75 |
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
|
|
|
|
|
|
|
|
80 |
|
81 |
-
|
82 |
|
|
|
83 |
|
84 |
-
##
|
85 |
|
86 |
-
|
87 |
-
|
|
|
|
|
88 |
|
|
|
89 |
|
|
|
90 |
|
|
|
|
|
|
|
91 |
|
|
|
92 |
|
|
|
93 |
|
|
|
|
|
94 |
|
|
|
|
|
|
4 |
colorFrom: indigo
|
5 |
colorTo: purple
|
6 |
sdk: docker
|
7 |
+
app_file: app.py
|
8 |
pinned: false
|
9 |
---
|
10 |
|
11 |
+
# HOMO‑LUMO Gap Predictor — Streamlit + Hybrid GNN
|
12 |
|
13 |
+
> **Live demo:** [huggingface.co/spaces/MooseML/homo-lumo-gap-predictor](https://huggingface.co/spaces/MooseML/homo-lumo-gap-predictor) •
|
14 |
+
> **Code:** <https://github.com/MooseML/homo-lumo-gap-predictor>
|
|
|
|
|
|
|
15 |
|
16 |
+
This web app predicts HOMO–LUMO energy gaps from molecular **SMILES** using a trained hybrid Graph Neural Network (PyTorch Geometric and RDKit descriptors).
|
17 |
+
It runs on Hugging Face Spaces via Docker and on any local machine with Docker or a Python ≥ 3.10 environment.
|
18 |
|
19 |
---
|
20 |
|
21 |
## Features
|
22 |
|
23 |
+
* Predict gaps for **single or batch** inputs (comma / newline SMILES or CSV upload)
|
24 |
+
* Shows **up to 20 molecules** per run with RDKit 2‑D depictions
|
25 |
+
* Download full predictions as CSV
|
26 |
+
* Logs all predictions to a lightweight SQLite DB (`/data/predictions.db`)
|
27 |
+
* Containerised environment identical to the public Space
|
28 |
|
29 |
---
|
30 |
|
31 |
+
## Quick start
|
32 |
|
33 |
+
### 1 Use the hosted Space
|
|
|
|
|
34 |
|
35 |
+
1. Open the [live URL](https://huggingface.co/spaces/MooseML/homo-lumo-gap-predictor).
|
36 |
+
2. Paste SMILES *or* upload a CSV (1 column, no header).
|
37 |
+
3. Click **Run Prediction** → results and structures appear; a CSV is downloadable.
|
38 |
|
39 |
+
> **Heads‑up:** on the free HF tier large files (> ~5 MB) can take 10–30 s to upload because of proxy buffering.
|
40 |
+
> Local Docker runs are instant, see below:
|
|
|
|
|
41 |
|
42 |
+
### 2 Run locally with Docker (mirrors the Space)
|
|
|
|
|
43 |
|
44 |
+
```bash
|
45 |
+
git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
|
46 |
+
cd homo-lumo-gap-predictor
|
47 |
+
docker build -t homolumo .
|
48 |
+
docker run -p 7860:7860 homolumo
|
49 |
+
# open http://localhost:7860
|
50 |
+
````
|
51 |
|
52 |
+
### 3 Run locally with Python (no Docker)
|
53 |
|
54 |
+
```bash
|
55 |
+
git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
|
56 |
+
cd homo-lumo-gap-predictor
|
57 |
+
pip install -r requirements.txt
|
58 |
+
streamlit run app.py
|
59 |
+
# app on http://localhost:8501
|
60 |
+
```
|
61 |
|
62 |
---
|
63 |
|
64 |
+
## Input guidelines
|
65 |
|
66 |
+
| Format | Example |
|
67 |
+
| ------------ | ---------------------------------------------- |
|
68 |
+
| **Textarea** | `O=C(C)Oc1ccccc1C(=O)O, C1=CC=CC=C1` |
|
69 |
+
| **CSV** | One column, no header:<br>`CCO`<br>`Cc1ccccc1` |
|
70 |
|
71 |
+
Invalid or exotic SMILES are skipped and listed in the terminal log (RDKit warnings)
|
72 |
|
73 |
+
---
|
74 |
|
75 |
+
## Project files
|
|
|
|
|
76 |
|
77 |
+
```
|
78 |
+
.
|
79 |
+
├── app.py – Streamlit front‑end
|
80 |
+
├── model.py – Hybrid GNN loader (PyTorch Geometric)
|
81 |
+
├── utils.py – RDKit helpers & SMILES→graph
|
82 |
+
├── Dockerfile – identical to the Hugging Face Space
|
83 |
+
└── requirements.txt
|
84 |
+
```
|
85 |
|
86 |
+
The Docker image creates `/data` (writable, 775) for the persistent SQLite DB when a volume is attached.
|
87 |
|
88 |
+
---
|
89 |
|
90 |
+
## Model in brief
|
91 |
|
92 |
+
* **Architecture:** AtomEncoder and BondEncoder → GINEConv layers → global mean pooling → dense head
|
93 |
+
* **Descriptors:** six RDKit physico‑chemical features per molecule
|
94 |
+
* **Training set:** [OGB PCQM4Mv2](https://ogb.stanford.edu/docs/lsc/pcqm4mv2/)
|
95 |
+
* **Optimiser / search:** Optuna hyperparameter sweep
|
96 |
|
97 |
+
---
|
98 |
|
99 |
+
## Roadmap
|
100 |
|
101 |
+
* Stream chunked CSV parsing to improve upload speed on the public Space
|
102 |
+
* Toggle between 2‑D and 3‑D (3Dmol.js) molecule renderings
|
103 |
+
* Serve the model weights from the HF Hub instead of bundling in the image
|
104 |
|
105 |
+
---
|
106 |
|
107 |
+
## Author
|
108 |
|
109 |
+
**Matthew Graham** — [@MooseML](https://github.com/MooseML)
|
110 |
+
Feel free to open issues or contact me for collaborations.
|
111 |
|
112 |
+
```
|
113 |
+
```
|