Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.47.0
metadata
title: ColiFormer - E. coli Codon Optimization
emoji: π§¬
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: mit
short_description: E. coli codon optimization with fine-tuned transformers
tags:
- biology
- codon-optimization
- e-coli
- protein-synthesis
- bioinformatics
- synthetic-biology
- transformers
- streamlit
𧬠ColiFormer - E. coli Codon Optimization
ColiFormer is a specialized codon optimization tool fine-tuned specifically for Escherichia coli sequences, achieving 6.2% better CAI scores compared to the base CodonTransformer model.
π Features
- π― E. coli Specialized: Fine-tuned on 4,300 high-CAI E. coli sequences
- π Advanced Metrics: CAI, tAI, GC content, and codon frequency analysis
- π€ Auto-Loading: Automatically downloads model and reference data from Hugging Face
- β‘ Real-time: Interactive sequence optimization with live metrics
- π¬ Research-Grade: Based on BigBird Transformer architecture
- π Performance: Significant improvement over base models for E. coli
π Model Performance
Metric | Base Model | ColiFormer | Improvement |
---|---|---|---|
CAI Score | 0.742 | 0.788 | +6.2% |
tAI Score | 0.451 | 0.478 | +6.0% |
GC Content | 52.1% | 51.8% | Optimized |
π Related Resources
- Model: saketh11/ColiFormer
- Dataset: saketh11/ColiFormer-Data
- Base Model: adibvafa/CodonTransformer
- Paper: CodonTransformer: The Global Translation of Genetic Code by Transformer
π‘ How to Use
- Enter your protein sequence in single-letter amino acid format
- Select optimization parameters (temperature, max length, etc.)
- Click "Optimize Sequence" to generate the optimized DNA sequence
- View comprehensive metrics including CAI, tAI, GC content, and codon usage
- Download results as FASTA or Excel files
π§ͺ Example
Input Protein: MKRISTTITTTITITTGNGAG
Optimized DNA: ATGAAACGTATTAGT...
(optimized for E. coli expression)
Metrics:
- CAI: 0.85 (High)
- tAI: 0.52 (Good)
- GC Content: 51.2% (Optimal)
π¬ Technical Details
- Architecture: BigBird Transformer with 12 layers
- Training: Adaptive Learning Methods (ALM) enhanced
- Context Length: Up to 4096 tokens
- Fine-tuning: 4,300 high-CAI E. coli sequences
- Reference Data: 50,000+ E. coli gene sequences for metrics
π Citation
If you use ColiFormer in your research, please cite:
@article{codon_transformer_2023,
title={CodonTransformer: The Global Translation of Genetic Code by Transformer},
author={Adibvafa Fallahpour and Bartosz Grzybowski and Bogdan Gliwa and Bartosz Michalak},
journal={bioRxiv},
year={2023},
doi={10.1101/2023.09.09.556981}
}
π License
This project is licensed under the MIT License.
Built with β€οΈ for the synthetic biology community