Spaces:
Running
title: 🧪 HOMO‑LUMO Gap Predictor
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_file: app.py
pinned: false
HOMO‑LUMO Gap Predictor — Streamlit + Hybrid GNN
Live demo: huggingface.co/spaces/MooseML/homo-lumo-gap-predictor
Code: https://github.com/MooseML/homo-lumo-gap-predictor
This web app predicts HOMO–LUMO energy gaps from molecular SMILES using a trained hybrid Graph Neural Network (PyTorch Geometric and RDKit descriptors).
It runs on Hugging Face Spaces via Docker and on any local machine with Docker or a Python ≥ 3.10 environment.
Features
- Predict gaps for single or batch inputs (comma / newline SMILES or CSV upload)
- Shows up to 20 molecules per run with RDKit 2‑D depictions
- Download full predictions as CSV
- Logs all predictions to a lightweight SQLite DB (
/data/predictions.db
) - Containerised environment identical to the public Space
Quick start
1 Use the hosted Space
- Open the live URL.
- Paste SMILES or upload a CSV (1 column, no header).
- Click Run Prediction → results and structures appear; a CSV is downloadable.
Heads‑up: on the free HF tier large files (> ~5 MB) can take 10–30 s to upload because of proxy buffering.
Local Docker runs are instant, see below:
2 Run locally with Docker (mirrors the Space)
git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
cd homo-lumo-gap-predictor
docker build -t homolumo .
docker run -p 7860:7860 homolumo
# open http://localhost:7860
3 Run locally with Python (no Docker)
git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
cd homo-lumo-gap-predictor
pip install -r requirements.txt
streamlit run app.py
# app on http://localhost:8501
Input guidelines
Format | Example |
---|---|
Text area | O=C(C)Oc1ccccc1C(=O)O |
CSV | One column, no header:CCO Cc1ccccc1 |
Invalid or exotic SMILES are skipped and listed in the terminal log (RDKit warnings)
Project files
├── app.py – Streamlit front‑end
├── model.py – Hybrid GNN loader (PyTorch Geometric)
├── utils.py – RDKit helpers & SMILES→graph
├── Dockerfile – identical to the Hugging Face Space
└── requirements.txt
The Docker image creates /data
(writable, 775) for the persistent SQLite DB when a volume is attached.
Model in brief
- Architecture: AtomEncoder and BondEncoder → GINEConv layers → global mean pooling → dense head
- Descriptors: six RDKit physico‑chemical features per molecule
- Training set: OGB PCQM4Mv2
- Optimiser / search: Optuna hyperparameter sweep
Roadmap
- Stream chunked CSV parsing to improve upload speed on the public Space
- Toggle between 2‑D and 3‑D (3Dmol.js) molecule renderings
- Serve the model weights from the HF Hub instead of bundling in the image
Author
Matthew Graham — @MooseML Feel free to open issues or contact me for collaborations.