takara-ai
/

SwarmFormer-Sentiment-Small

Model card Files Files and versions

takarajordan commited on Jan 24

Commit

adc1dd4

·

verified ·

1 Parent(s): 922282b

Create README.md

Files changed (1) hide show

README.md +143 -0

README.md ADDED Viewed

	@@ -0,0 +1,143 @@

+---
+datasets:
+- stanfordnlp/imdb
+language:
+- en
+---
+# Model Card for SwarmFormer-Small
+SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.
+## Model Details
+### Model Description
+Compact version of SwarmFormer with:
+- Token embedding layer with dropout (0.3)
+- Two SwarmFormer layers
+- Mean pooling and classification
+- Optimized for shorter sequences
+- **Developed by**: Jordan Legg, Mikus Sturmanis, Takara.ai
+- **Funded by**: Takara.ai
+- **Shared by**: Takara.ai
+- **Model type**: Hierarchical transformer
+- **Language(s)**: English
+- **License**: Not specified
+- **Finetuned from model**: Trained from scratch
+### Model Sources
+- **Repository**: https://github.com/takara-ai/SwarmFormer
+- **Paper**: Takara.ai Research
+- **Demo**: Not available
+## Uses
+### Direct Use
+- Text classification
+- Sentiment analysis
+- Resource-constrained environments
+### Out-of-Scope Use
+- Text generation
+- Machine translation
+- Tasks requiring >256 tokens
+- Tasks requiring high precision
+## Training Details
+### Training Data
+- Dataset: IMDB Movie Review
+- Size: 50,000 samples
+- Augmentation techniques applied
+### Training Procedure
+#### Model Architecture Details
+1. **Token Embedding Layer**:
+   ```python
+   - Embedding layer (vocab_size → 128)
+   - Dropout rate: 0.3
+   ```
+2. **Local Swarm Aggregator**:
+   ```python
+   - Input dropout: 0.3
+   - Local MLP:
+     - Linear(128 → 128)
+     - GELU
+     - Dropout(0.3)
+     - Linear(128 → 128)
+   - Gate network with GELU
+   ```
+3. **Clustering Mechanism**:
+   - Cluster size: 8 tokens
+   - Mean pooling per cluster
+4. **Global Cluster Attention**:
+   ```python
+   - Q/K/V projections: Linear(128 → 128)
+   - Attention dropout: 0.3
+   ```
+#### Training Hyperparameters
+- Embedding dimension: 128
+- Number of layers: 2
+- Local update steps: 3
+- Cluster size: 8
+- Sequence length: 256
+- Batch size: 96
+- Learning rate: 4.76 × 10⁻⁴
+- Weight decay: 0.0541
+- Dropout: 0.30
+## Evaluation
+### Results
+- Accuracy: 86.20%
+- Precision: 83.46%
+- Recall: 90.31%
+- F1: 86.75%
+- Inference time: 0.36s (25k samples)
+- Mean batch latency: 3.67ms
+- Throughput: 45k samples/s
+- Peak memory: 8GB
+## Technical Specifications
+### Compute Infrastructure
+- GPU: NVIDIA RTX 2080 Ti
+- VRAM: 8GB minimum
+- Training time: 3.6 minutes
+### How to Get Started
+```python
+from swarmformer import SwarmFormerModel
+model = SwarmFormerModel(
+    vocab_size=30000,
+    d_model=128,
+    seq_len=256,
+    cluster_size=8,
+    num_layers=2,
+    T_local=3
+)
+```
+## Citation
+```bibtex
+@article{legg2025swarmformer,
+  title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
+  author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
+  journal={Takara.ai Research},
+  year={2025},
+  url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
+}
+```
+## Model Card Authors
+Jordan Legg, Mikus Sturmanis, Takara.ai Research Team
+## Model Card Contact
+[email protected]