File size: 3,475 Bytes
0f22579
 
 
 
 
 
7c9c550
0f22579
 
 
7c9c550
e3eae4d
8aec66d
7c9c550
e3eae4d
7c9c550
 
e3eae4d
 
 
 
 
7c9c550
 
 
 
 
e3eae4d
6568ddd
e3eae4d
7c9c550
e3eae4d
7c9c550
e3eae4d
7c9c550
 
 
e3eae4d
7c9c550
 
e3eae4d
7c9c550
e3eae4d
7c9c550
 
 
 
 
 
 
e3eae4d
7c9c550
e3eae4d
7c9c550
 
 
 
 
 
 
e3eae4d
6568ddd
 
7c9c550
e3eae4d
7c9c550
 
8aec66d
7c9c550
e3eae4d
7c9c550
e3eae4d
7c9c550
e3eae4d
7c9c550
e3eae4d
7c9c550
 
 
 
 
 
 
e3eae4d
7c9c550
e3eae4d
7c9c550
e3eae4d
7c9c550
e3eae4d
7c9c550
 
 
 
e3eae4d
7c9c550
e3eae4d
7c9c550
e3eae4d
7c9c550
 
 
e3eae4d
7c9c550
e3eae4d
7c9c550
e3eae4d
7c9c550
 
e3eae4d
64cc0d5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
title: 🧪 HOMO‑LUMO GapPredictor
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
---

# HOMO‑LUMO Gap Predictor — Streamlit + Hybrid GNN

> **Live demo:** [huggingface.co/spaces/MooseML/homo-lumo-gap-predictor](https://huggingface.co/spaces/MooseML/homo-lumo-gap-predictor)    
> **Code:** <https://github.com/MooseML/homo-lumo-gap-predictor>

This web app predicts HOMO–LUMO energy gaps from molecular **SMILES** using a trained hybrid Graph Neural Network (PyTorch Geometric and RDKit descriptors).  
It runs on Hugging Face Spaces via Docker and on any local machine with Docker or a Python ≥ 3.10 environment.

---

## Features

* Predict gaps for **single or batch** inputs (comma / newline SMILES or CSV upload)
* Shows **up to 20 molecules** per run with RDKit 2‑D depictions
* Download full predictions as CSV
* Logs all predictions to a lightweight SQLite DB (`/data/predictions.db`)
* Containerised environment identical to the public Space

---

## Quick start

### 1  Use the hosted Space

1. Open the [live URL](https://huggingface.co/spaces/MooseML/homo-lumo-gap-predictor).  
2. Paste SMILES *or* upload a CSV (1 column, no header).  
3. Click **Run Prediction** → results and structures appear; a CSV is downloadable.

> **Heads‑up:** on the free HF tier large files (> ~5 MB) can take 10–30 s to upload because of proxy buffering.  
> Local Docker runs are instant, see below:

### 2  Run locally with Docker (mirrors the Space)

```bash
git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
cd homo-lumo-gap-predictor
docker build -t homolumo .
docker run -p 7860:7860 homolumo
# open http://localhost:7860
````

### 3  Run locally with Python (no Docker)

```bash
git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
cd homo-lumo-gap-predictor
pip install -r requirements.txt
streamlit run app.py 
# app on http://localhost:8501
```

---

## Input guidelines

| Format       | Example                                        |
| ------------ | ---------------------------------------------- |
| **Text area** | `O=C(C)Oc1ccccc1C(=O)O`           |
| **CSV**      | One column, no header:<br>`CCO`<br>`Cc1ccccc1` |

Invalid or exotic SMILES are skipped and listed in the terminal log (RDKit warnings)

---

## Project files

```
├── app.py          – Streamlit front‑end
├── model.py        – Hybrid GNN loader (PyTorch Geometric)
├── utils.py        – RDKit helpers & SMILES→graph
├── Dockerfile      – identical to the Hugging Face Space
└── requirements.txt
```

The Docker image creates `/data` (writable, 775) for the persistent SQLite DB when a volume is attached.

---

## Model in brief

* **Architecture:** AtomEncoder and BondEncoder → GINEConv layers → global mean pooling → dense head
* **Descriptors:** six RDKit physico‑chemical features per molecule
* **Training set:** [OGB PCQM4Mv2](https://ogb.stanford.edu/docs/lsc/pcqm4mv2/)
* **Optimiser / search:** Optuna hyperparameter sweep

---

## Roadmap

* Stream chunked CSV parsing to improve upload speed on the public Space
* Toggle between 2‑D and 3‑D (3Dmol.js) molecule renderings
* Serve the model weights from the HF Hub instead of bundling in the image

---

## Author

**Matthew Graham** — [@MooseML](https://github.com/MooseML)
Feel free to open issues or contact me for collaborations.