Spaces:
Sleeping
Sleeping
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: feature-extraction | |
| tags: | |
| - chemistry | |
| # selfies-ted | |
| selfies-ted is a project for encoding SMILES (Simplified Molecular Input Line Entry System) into SELFIES (SELF-referencing Embedded Strings) and generating embeddings for molecular representations. | |
|  | |
| ## Model Architecture | |
| Configuration details | |
| Encoder and Decoder FFN dimensions: 256 | |
| Number of attention heads: 4 | |
| Number of encoder and decoder layers: 2 | |
| Total number of hidden layers: 6 | |
| Maximum position embeddings: 128 | |
| Model dimension (d_model): 256 | |
| ## Pretrained Models and Training Logs | |
| We provide checkpoints of the selfies-ted model pre-trained on a dataset of molecules curated from PubChem. The pre-trained model shows competitive performance on molecular representation tasks. For model weights: "HuggingFace link". | |
| To install and use the pre-trained model: | |
| Download the selfies_ted_model.pkl file from the "HuggingFace link". | |
| Add the selfies-ted selfies_ted_model.pkl to the models/ directory. The directory structure should look like the following: | |
| ``` | |
| models/ | |
| └── selfies_ted_model.pkl | |
| ``` | |
| ## Installation | |
| To use this project, you'll need to install the required dependencies. We recommend using a virtual environment: | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # On Windows use `venv\Scripts\activate` | |
| ``` | |
| Install the required dependencies | |
| ``` | |
| pip install -r requirements.txt | |
| ``` | |
| ## Usage | |
| ### Import | |
| ``` | |
| import load | |
| ``` | |
| ### Training the Model | |
| To train the model, use the train.py script: | |
| ``` | |
| python train.py -f <path_to_your_data_file> | |
| ``` | |
| Note: The actual usage may depend on the specific implementation in load.py. Please refer to the source code for detailed functionality. | |
| ### Load the model and tokenizer | |
| ``` | |
| load.load("path/to/checkpoint.pkl") | |
| ``` | |
| ### Encode SMILES strings | |
| ``` | |
| smiles_list = ["COC", "CCO"] | |
| ``` | |
| ``` | |
| embeddings = load.encode(smiles_list) | |
| ``` | |
| ## Example Notebook | |
| Example notebook of this project is `selfies-ted-example.ipynb`. | |