--- title: ColiFormer - E. coli Codon Optimization emoji: ๐Ÿงฌ colorFrom: blue colorTo: green sdk: streamlit sdk_version: 1.28.1 app_file: app.py pinned: false license: mit short_description: E. coli codon optimization with fine-tuned transformers tags: - biology - codon-optimization - e-coli - protein-synthesis - bioinformatics - synthetic-biology - transformers - streamlit --- # ๐Ÿงฌ ColiFormer - E. coli Codon Optimization **ColiFormer** is a specialized codon optimization tool fine-tuned specifically for *Escherichia coli* sequences, achieving **6.2% better CAI scores** compared to the base CodonTransformer model. ## ๐Ÿš€ Features - **๐ŸŽฏ E. coli Specialized**: Fine-tuned on 4,300 high-CAI E. coli sequences - **๐Ÿ“Š Advanced Metrics**: CAI, tAI, GC content, and codon frequency analysis - **๐Ÿค– Auto-Loading**: Automatically downloads model and reference data from Hugging Face - **โšก Real-time**: Interactive sequence optimization with live metrics - **๐Ÿ”ฌ Research-Grade**: Based on BigBird Transformer architecture - **๐Ÿ“ˆ Performance**: Significant improvement over base models for E. coli ## ๐Ÿ“Š Model Performance | Metric | Base Model | ColiFormer | Improvement | |--------|------------|------------|-------------| | CAI Score | 0.742 | 0.788 | **+6.2%** | | tAI Score | 0.451 | 0.478 | **+6.0%** | | GC Content | 52.1% | 51.8% | Optimized | ## ๐Ÿ”— Related Resources - **Model**: [saketh11/ColiFormer](https://huggingface.co/saketh11/ColiFormer) - **Dataset**: [saketh11/ColiFormer-Data](https://huggingface.co/datasets/saketh11/ColiFormer-Data) - **Base Model**: [adibvafa/CodonTransformer](https://huggingface.co/adibvafa/CodonTransformer) - **Paper**: [CodonTransformer: The Global Translation of Genetic Code by Transformer](https://www.biorxiv.org/content/10.1101/2023.09.09.556981v1) ## ๐Ÿ’ก How to Use 1. **Enter your protein sequence** in single-letter amino acid format 2. **Select optimization parameters** (temperature, max length, etc.) 3. **Click "Optimize Sequence"** to generate the optimized DNA sequence 4. **View comprehensive metrics** including CAI, tAI, GC content, and codon usage 5. **Download results** as FASTA or Excel files ## ๐Ÿงช Example **Input Protein**: `MKRISTTITTTITITTGNGAG` **Optimized DNA**: `ATGAAACGTATTAGT...` (optimized for E. coli expression) **Metrics**: - CAI: 0.85 (High) - tAI: 0.52 (Good) - GC Content: 51.2% (Optimal) ## ๐Ÿ”ฌ Technical Details - **Architecture**: BigBird Transformer with 12 layers - **Training**: Adaptive Learning Methods (ALM) enhanced - **Context Length**: Up to 4096 tokens - **Fine-tuning**: 4,300 high-CAI E. coli sequences - **Reference Data**: 50,000+ E. coli gene sequences for metrics ## ๐Ÿ“œ Citation If you use ColiFormer in your research, please cite: ```bibtex @article{codon_transformer_2023, title={CodonTransformer: The Global Translation of Genetic Code by Transformer}, author={Adibvafa Fallahpour and Bartosz Grzybowski and Bogdan Gliwa and Bartosz Michalak}, journal={bioRxiv}, year={2023}, doi={10.1101/2023.09.09.556981} } ``` ## ๐Ÿ“„ License This project is licensed under the MIT License. --- **Built with โค๏ธ for the synthetic biology community**