MooseML's picture
Update README
c7ecac8
metadata
title: 🧪 HOMO‑LUMO GapPredictor
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_file: app.py
pinned: false

HOMO‑LUMO Gap Predictor — Streamlit + Hybrid GNN

Live demo: huggingface.co/spaces/MooseML/homo-lumo-gap-predictor  
Code: https://github.com/MooseML/homo-lumo-gap-predictor

This web app predicts HOMO–LUMO energy gaps from molecular SMILES using a trained hybrid Graph Neural Network (PyTorch Geometric and RDKit descriptors).
It runs on Hugging Face Spaces via Docker and on any local machine with Docker or a Python ≥ 3.10 environment.


Features

  • Predict gaps for single or batch inputs (comma / newline SMILES or CSV upload)
  • Shows up to 20 molecules per run with RDKit 2‑D depictions
  • Download full predictions as CSV
  • Logs all predictions to a lightweight SQLite DB (/data/predictions.db)
  • Containerised environment identical to the public Space

Quick start

1  Use the hosted Space

  1. Open the live URL.
  2. Paste SMILES or upload a CSV (1 column, no header).
  3. Click Run Prediction → results and structures appear; a CSV is downloadable.

Heads‑up: on the free HF tier large files (> ~5 MB) can take 10–30 s to upload because of proxy buffering.
Local Docker runs are instant, see below:

2  Run locally with Docker (mirrors the Space)

git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
cd homo-lumo-gap-predictor
docker build -t homolumo .
docker run -p 7860:7860 homolumo
# open http://localhost:7860

3  Run locally with Python (no Docker)

git clone https://github.com/MooseML/homo-lumo-gap-predictor.git
cd homo-lumo-gap-predictor
pip install -r requirements.txt
streamlit run app.py 
# app on http://localhost:8501

Input guidelines

Format Example
Text area O=C(C)Oc1ccccc1C(=O)O
CSV One column, no header:
CCO
Cc1ccccc1

Invalid or exotic SMILES are skipped and listed in the terminal log (RDKit warnings)


Project files

├── app.py          – Streamlit front‑end
├── model.py        – Hybrid GNN loader (PyTorch Geometric)
├── utils.py        – RDKit helpers & SMILES→graph
├── Dockerfile      – identical to the Hugging Face Space
└── requirements.txt

The Docker image creates /data (writable, 775) for the persistent SQLite DB when a volume is attached.


Model in brief

  • Architecture: AtomEncoder and BondEncoder → GINEConv layers → global mean pooling → dense head
  • Descriptors: six RDKit physico‑chemical features per molecule
  • Training set: OGB PCQM4Mv2
  • Optimiser / search: Optuna hyperparameter sweep

Roadmap

  • Stream chunked CSV parsing to improve upload speed on the public Space
  • Toggle between 2‑D and 3‑D (3Dmol.js) molecule renderings
  • Serve the model weights from the HF Hub instead of bundling in the image

Author

Matthew Graham@MooseML Feel free to open issues or contact me for collaborations.