Spaces:

ml-jku
/

mhnfs

Running

App Files Files Community

mhnfs / README.md

Tschoui

Update README.md

e158751 verified 10 months ago

preview code

raw

history blame

4.33 kB

	---
	title: MHNfs
	emoji: 🔬
	short_description: Activity prediction for low-data scenarios
	colorFrom: gray
	colorTo: gray
	sdk: streamlit
	sdk_version: 1.29.0
	app_file: app.py
	pinned: true
	---

	# Activity Predictions with MHNfs for low-data scenarios

	## ⚙️ Under the hood
	<div style="text-align: justify">
	The predictive model (MHNfs) used in this application was specifically designed and
	trained for low-data scenarios. The model predicts whether a molecule is active or
	inactive. The predicted activity value is a continuous value between 0 and 1, and,
	similar to a probability, the higher/lower the value, the more confident the model
	is that the molecule is active/inactive.<br>
	<br>
	The model was trained on the FS-Mol dataset which
	includes 5120 tasks (roughly 5000 tasks were used for training, rest for evaluation).
	The training tasks are listed here:
	<a href="https://github.com/microsoft/FS-Mol/tree/main/datasets/targets"
	target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>.
	</div>

	## 🎯 About few-shot learning and the model MHNfs
	<div style="text-align: justify">
	<b>Few-shot learning</b> is a machine learning sub-field which aims to provide
	predictive models for scenarios in which only little data is known/available.<br>
	<br>
	<b>MHNfs</b> is a few-shot learning model which is specifically designed for drug
	discovery applications. It is built to use the input prompts in a way such that
	the provided available knowledge, i.e. the known active and inactive molecules,
	functions as context to predict the activity of the new requested molecules.
	Precisely, the provided active and inactive molecules are associated with a
	large set of general molecules - called context molecules - to enrich the
	provided information and to remove spurious correlations arising from the
	decoration of molecules. This is analogous to a Large Language Model which would
	not only use the provided information in the current prompt as context but would
	also have access to way more information, e.g., a prompting history.
	</div>

	## 💻 Run the prediction pipeline locally for larger screening chunks

	### Get started:
	```bash
	# Copied from hugging face
	# Make sure you have git-lfs installed (https://git-lfs.com)
	git lfs install

	# Clone repo
	git clone https://huggingface.co/spaces/tschouis/mhnfs

	# Alternatively, if you want to clone without large files
	GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/spaces/tschouis/mhnfs
	```

	### Install requirements
	```bash
	pip install -r requirements.txt
	```
	Notably, this command was tested inside a conda environment with python 3.7.

	### Run the prediction pipeline:
	For your screening, load the model, i.e. the Activity Predictor into your python file or notebook and simply run it:
	```python
	from src.prediction_pipeline load ActivityPredictor

	# Define inputs
	query_smiles = ["C1CCCCC1", "C1CCCCC1", "C1CCCCC1", "C1CCCCC1"] # Replace with your data
	support_actives_smiles = ["C1CCCCC1", "C1CCCCC1"] # Replace with your data
	support_inactives_smiles = ["C1CCCCC1", "C1CCCCC1"] # Replace with your data

	# Make predictions
	predictions = predictor.predict(query_smiles, support_actives_smiles support_inactives_smiles)
	```

	* Provide molecules in SMILES notation.
	* Make sure that the inputs to the Activity Predictor are either comma separated lists, or flattened numpy arrays, or pandas DataFrames. In the latter case, there should be a "smiles" column (both upper and lower case "SMILES" are accepted). All other columns are ignored.



	### Run the app locally with streamlib:
	```bash
	# Navigate into root directory of this project
	cd .../whatever_your_dir_name_is/ # Replace with your path

	# Run streamlit app
	python -m streamlit run
	```

	## 📚 Cite us

	```
	@inproceedings{
	schimunek2023contextenriched,
	title={Context-enriched molecule representations improve few-shot drug discovery},
	author={Johannes Schimunek and Philipp Seidl and Lukas Friedrich and Daniel Kuhn and Friedrich Rippmann and Sepp Hochreiter and Günter Klambauer},
	booktitle={The Eleventh International Conference on Learning Representations},
	year={2023},
	url={https://openreview.net/forum?id=XrMWUuEevr}
	}
	```