|  | --- | 
					
						
						|  | title: MHNfs | 
					
						
						|  | emoji: 🔬 | 
					
						
						|  | short_description: Activity prediction for low-data scenarios | 
					
						
						|  | colorFrom: gray | 
					
						
						|  | colorTo: gray | 
					
						
						|  | sdk: streamlit | 
					
						
						|  | sdk_version: 1.29.0 | 
					
						
						|  | app_file: app.py | 
					
						
						|  | pinned: true | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # Activity Predictions with MHNfs for low-data scenarios | 
					
						
						|  |  | 
					
						
						|  | ## ⚙️ Under the hood | 
					
						
						|  | <div style="text-align: justify"> | 
					
						
						|  | The predictive model (MHNfs) used in this application was specifically designed and | 
					
						
						|  | trained for low-data scenarios. The model predicts whether a molecule is active or | 
					
						
						|  | inactive. The predicted activity value is a continuous value between 0 and 1, and, | 
					
						
						|  | similar to a probability, the higher/lower the value, the more confident the model | 
					
						
						|  | is that the molecule is active/inactive.<br> | 
					
						
						|  | <br> | 
					
						
						|  | The model was trained on the FS-Mol dataset which | 
					
						
						|  | includes 5120 tasks (roughly 5000 tasks were used for training, rest for evaluation). | 
					
						
						|  | The training tasks are listed here: | 
					
						
						|  | <a href="https://github.com/microsoft/FS-Mol/tree/main/datasets/targets" | 
					
						
						|  | target="_blank">https://github.com/microsoft/FS-Mol/tree/main/datasets/targets</a>. | 
					
						
						|  | </div> | 
					
						
						|  |  | 
					
						
						|  | ## 🎯 About few-shot learning and the model MHNfs | 
					
						
						|  | <div style="text-align: justify"> | 
					
						
						|  | <b>Few-shot learning</b> is a machine learning sub-field which aims to provide | 
					
						
						|  | predictive models for scenarios in which only little data is known/available.<br> | 
					
						
						|  | <br> | 
					
						
						|  | <b>MHNfs</b> is a few-shot learning model which is specifically designed for drug | 
					
						
						|  | discovery applications. It is built to use the input prompts in a way such that | 
					
						
						|  | the provided available knowledge, i.e. the known active and inactive molecules, | 
					
						
						|  | functions as context to predict the activity of the new requested molecules. | 
					
						
						|  | Precisely, the provided active and inactive molecules are associated with a | 
					
						
						|  | large set of general molecules - called context molecules - to enrich the | 
					
						
						|  | provided information and to remove spurious correlations arising from the | 
					
						
						|  | decoration of molecules. This is analogous to a Large Language Model which would | 
					
						
						|  | not only use the provided information in the current prompt as context but would | 
					
						
						|  | also have access to way more information, e.g., a prompting history. | 
					
						
						|  | </div> | 
					
						
						|  |  | 
					
						
						|  | ## 💻 Run the prediction pipeline locally for larger screening chunks | 
					
						
						|  |  | 
					
						
						|  | ### Get started: | 
					
						
						|  | ```bash | 
					
						
						|  | # Copied from hugging face | 
					
						
						|  | # Make sure you have git-lfs installed (https://git-lfs.com) | 
					
						
						|  | git lfs install | 
					
						
						|  |  | 
					
						
						|  | # Clone repo | 
					
						
						|  | git clone https://huggingface.co/spaces/tschouis/mhnfs | 
					
						
						|  |  | 
					
						
						|  | # Alternatively, if you want to clone without large files | 
					
						
						|  | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/spaces/tschouis/mhnfs | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ### Install requirements | 
					
						
						|  | ```bash | 
					
						
						|  | pip install -r requirements.txt | 
					
						
						|  | ``` | 
					
						
						|  | Notably, this command was tested inside a conda environment with python 3.7. | 
					
						
						|  |  | 
					
						
						|  | ### Run the prediction pipeline: | 
					
						
						|  | For your screening, load the model, i.e. the **Activity Predictor** into your python file or notebook and simply run it: | 
					
						
						|  | ```python | 
					
						
						|  | from src.prediction_pipeline load ActivityPredictor | 
					
						
						|  |  | 
					
						
						|  | # Define inputs | 
					
						
						|  | query_smiles = ["C1CCCCC1", "C1CCCCC1", "C1CCCCC1", "C1CCCCC1"]  # Replace with your data | 
					
						
						|  | support_actives_smiles = ["C1CCCCC1", "C1CCCCC1"]  # Replace with your data | 
					
						
						|  | support_inactives_smiles = ["C1CCCCC1", "C1CCCCC1"]  # Replace with your data | 
					
						
						|  |  | 
					
						
						|  | # Make predictions | 
					
						
						|  | predictions = predictor.predict(query_smiles, support_actives_smiles support_inactives_smiles) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | * Provide molecules in SMILES notation. | 
					
						
						|  | * Make sure that the inputs to the Activity Predictor are either comma separated lists, or flattened numpy arrays, or pandas DataFrames. In the latter case, there should be a "smiles" column (both upper and lower case "SMILES" are accepted). All other columns are ignored. | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ### Run the app locally with streamlib: | 
					
						
						|  | ```bash | 
					
						
						|  | # Navigate into root directory of this project | 
					
						
						|  | cd .../whatever_your_dir_name_is/ # Replace with your path | 
					
						
						|  |  | 
					
						
						|  | # Run streamlit app | 
					
						
						|  | python -m streamlit run | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ## 📚 Cite us | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | @inproceedings{ | 
					
						
						|  | schimunek2023contextenriched, | 
					
						
						|  | title={Context-enriched molecule representations improve few-shot drug discovery}, | 
					
						
						|  | author={Johannes Schimunek and Philipp Seidl and Lukas Friedrich and Daniel Kuhn and Friedrich Rippmann and Sepp Hochreiter and Günter Klambauer}, | 
					
						
						|  | booktitle={The Eleventh International Conference on Learning Representations}, | 
					
						
						|  | year={2023}, | 
					
						
						|  | url={https://openreview.net/forum?id=XrMWUuEevr} | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  |