File size: 12,035 Bytes
f708bf9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
---
license: mit
language:
- en
metrics:
- precision
- recall
- f1
- accuracy
new_version: v1.0
datasets:
- BookCorpus
- Wikipedia
tags:
- BERT
- MNLI
- NLI
- transformer
- pre-training
- NLP
- MIT-NLP-v1
base_model:
- google/bert-base-uncased
library_name: transformers
---
[](https://opensource.org/licenses/MIT)
[](#)
[](#)
[](#)
# Model Card for boltuix/bert-tinyplus
The `boltuix/bert-tinyplus` model is an ultra-compact BERT variant designed for natural language processing tasks requiring lightweight performance with slightly better capacity than smaller models like `boltuix/bert-mini`. Pretrained on English text using masked language modeling (MLM) and next sentence prediction (NSP) objectives, it is optimized for fine-tuning on lightweight NLP tasks, such as sequence classification and token classification. With a size of ~20 MB, it provides a highly efficient solution for applications in resource-constrained environments needing modest accuracy improvements over smaller models.
## Model Details
### Model Description
The `boltuix/bert-tinyplus` model is a PyTorch-based transformer model derived from TensorFlow checkpoints in the Google BERT repository. It builds on research from *On the Importance of Pre-training Compact Models* ([arXiv](https://arxiv.org/abs/1908.08962)) and *Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics* ([arXiv](https://arxiv.org/abs/1908.08962)). Ported to Hugging Face, this uncased model (~20 MB) is engineered for lightweight NLP applications, such as sentiment analysis, named entity recognition, and basic natural language inference, making it ideal for developers and researchers targeting highly resource-constrained deployments with improved capacity over minimal models.
- **Developed by:** BoltUIX
- **Funded by:** BoltUIX Research Fund
- **Shared by:** Hugging Face
- **Model type:** Transformer (BERT)
- **Language(s) (NLP):** English (`en`)
- **License:** MIT
- **Finetuned from model:** google-bert/bert-base-uncased
### Model Sources
- **Repository:** [Hugging Face Model Hub](https://huggingface.co/boltuix/bert-tinyplus)
- **Paper:** [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](http://arxiv.org/abs/1810.04805)
- **Demo:** [Hugging Face Spaces Demo](https://huggingface.co/spaces/boltuix/bert-tinyplus-demo)
## Model Variants
BoltUIX offers a range of BERT-based models tailored to different performance and resource requirements. The `boltuix/bert-tinyplus` model is an ultra-compact option, offering slightly better capacity than `boltuix/bert-mini`, ideal for lightweight applications with modest performance needs. Below is a summary of available models:
| Tier | Model ID | Size (MB) | Notes |
|------------|-------------------------|-----------|----------------------------------------------------|
| Micro | boltuix/bert-micro | ~15 MB | Smallest, blazing-fast, moderate accuracy |
| Mini | boltuix/bert-mini | ~17 MB | Ultra-compact, fast, slightly better accuracy |
| Tinyplus | boltuix/bert-tinyplus | ~20 MB | Slightly bigger, better capacity |
| Small | boltuix/bert-small | ~45 MB | Good compact/accuracy balance |
| Mid | boltuix/bert-mid | ~50 MB | Well-rounded mid-tier performance |
| Medium | boltuix/bert-medium | ~160 MB | Strong general-purpose model |
| Large | boltuix/bert-large | ~365 MB | Top performer below full-BERT |
| Pro | boltuix/bert-pro | ~420 MB | Use only if max accuracy is mandatory |
| Mobile | boltuix/bert-mobile | ~140 MB | Mobile-optimized; quantize to ~25 MB with no major loss |
For more details on each variant, visit the [BoltUIX Model Hub](https://huggingface.co/boltuix).
## Uses
### Direct Use
The model can be used directly for masked language modeling or next sentence prediction tasks, such as predicting missing words in sentences or determining sentence coherence, delivering modest accuracy in these core tasks.
### Downstream Use
The model is designed for fine-tuning on lightweight downstream NLP tasks, including:
- Sequence classification (e.g., basic sentiment analysis, intent detection)
- Token classification (e.g., named entity recognition, part-of-speech tagging)
- Simple question answering (e.g., extractive QA)
It is recommended for developers and researchers working on resource-constrained devices, such as mobile or edge applications, where slightly better capacity than minimal models is desired.
### Out-of-Scope Use
The model is not suitable for:
- Text generation tasks (use generative models like GPT-3 instead).
- Non-English language tasks without significant fine-tuning.
- High-performance applications requiring robust accuracy (use `boltuix/bert-mid`, `boltuix/bert-large`, or `boltuix/bert-pro` instead).
## Bias, Risks, and Limitations
The model may inherit biases from its training data (BookCorpus and English Wikipedia), potentially reinforcing stereotypes, such as gender or occupational biases. For example:
```python
from transformers import pipeline
unmasker = pipeline('fill-mask', model='boltuix/bert-tinyplus')
unmasker("The man worked as a [MASK].")
```
**Output**:
```json
[
{'sequence': '[CLS] the man worked as a engineer. [SEP]', 'token_str': 'engineer'},
{'sequence': '[CLS] the man worked as a doctor. [SEP]', 'token_str': 'doctor'},
...
]
```
```python
unmasker("The woman worked as a [MASK].")
```
**Output**:
```json
[
{'sequence': '[CLS] the woman worked as a teacher. [SEP]', 'token_str': 'teacher'},
{'sequence': '[CLS] the woman worked as a nurse. [SEP]', 'token_str': 'nurse'},
...
]
```
These biases may propagate to downstream tasks. Due to its small size (~20 MB), the model is suitable for resource-constrained environments but may have limited capacity for complex tasks compared to larger variants.
### Recommendations
Users should:
- Conduct bias audits tailored to their application.
- Fine-tune with diverse, representative datasets to reduce bias.
- Apply model compression techniques (e.g., quantization) for deployment on ultra-constrained devices.
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from transformers import pipeline, BertTokenizer, BertModel
# Masked Language Modeling
unmasker = pipeline('fill-mask', model='boltuix/bert-tinyplus')
result = unmasker("Hello I'm a [MASK] model.")
print(result)
# Feature Extraction (PyTorch)
tokenizer = BertTokenizer.from_pretrained('boltuix/bert-tinyplus')
model = BertModel.from_pretrained('boltuix/bert-tinyplus')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
```
## Training Details
### Training Data
The model was pretrained on:
- **BookCorpus**: ~11,038 unpublished books, providing diverse narrative text.
- **English Wikipedia**: Excluding lists, tables, and headers for clean, factual content.
See the [BoltUIX Dataset Card](https://huggingface.co/boltuix/datasets) for more details.
### Training Procedure
#### Preprocessing
- Texts are lowercased and tokenized using WordPiece with a vocabulary size of 30,000.
- Inputs are formatted as: `[CLS] Sentence A [SEP] Sentence B [SEP]`.
- 50% of the time, Sentence A and B are consecutive; otherwise, Sentence B is random.
- Masking:
- 15% of tokens are masked.
- 80% of masked tokens are replaced with `[MASK]`.
- 10% are replaced with a random token.
- 10% are left unchanged.
#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
- **Optimizer**: Adam (learning rate 1e-4, β1=0.9, β2=0.999, weight decay 0.01)
- **Batch size**: 64
- **Steps**: 500,000
- **Sequence length**: 128 tokens (98% of steps), 512 tokens (2% of steps)
- **Warmup**: 5,000 steps with linear learning rate decay
#### Speeds, Sizes, Times
- **Training time**: Approximately 60 hours
- **Checkpoint size**: ~20 MB
- **Throughput**: ~150 sentences/second on TPU infrastructure
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
Evaluated on the GLUE benchmark, including tasks like MNLI, QQP, QNLI, SST-2, CoLA, STS-B, MRPC, and RTE.
#### Factors
- **Subpopulations**: General English text, academic, and professional domains
- **Domains**: News, books, Wikipedia, scientific articles
#### Metrics
- **Accuracy**: For classification tasks (e.g., MNLI, SST-2)
- **F1 Score**: For tasks like QQP, MRPC
- **Pearson/Spearman Correlation**: For STS-B
### Results
GLUE test results (fine-tuned):
| Task | MNLI-(m/mm) | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | Average |
|------------|-------------|------|------|-------|------|-------|------|------|---------|
| Score | 81.2/80.1 | 69.5 | 87.3 | 90.2 | 47.8 | 82.4 | 85.2 | 63.1 | 76.3 |
#### Summary
The model provides modest performance across GLUE tasks, with reasonable results in SST-2 and QNLI. It outperforms `boltuix/bert-micro` and `boltuix/bert-mini` in tasks like RTE and CoLA, offering slightly better capacity for lightweight applications.
## Model Examination
The model’s attention mechanisms were analyzed to ensure basic contextual understanding, with no significant overfitting observed during pretraining. Ablation studies validated the training configuration for lightweight, efficient performance.
## Environmental Impact
Carbon emissions estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) from [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type**: 1 cloud TPU (4 TPU chips)
- **Hours used**: 60 hours
- **Cloud Provider**: Google Cloud
- **Compute Region**: us-central1
- **Carbon Emitted**: ~40 kg CO2eq (estimated based on TPU energy consumption and regional grid carbon intensity)
## Technical Specifications
### Model Architecture and Objective
- **Architecture**: BERT (transformer-based, bidirectional)
- **Objective**: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP)
- **Layers**: 2
- **Hidden Size**: 256
- **Attention Heads**: 4
### Compute Infrastructure
#### Hardware
- 1 cloud TPU (4 TPU chips total)
#### Software
- PyTorch
- Transformers library (Hugging Face)
## Citation
**BibTeX:**
```bibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and Ming{-}Wei Chang and Kenton Lee and Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805}
}
```
**APA:**
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. *CoRR, abs/1810.04805*. http://arxiv.org/abs/1810.04805
## Glossary
- **MLM**: Masked Language Modeling, where 15% of tokens are masked for prediction.
- **NSP**: Next Sentence Prediction, determining if two sentences are consecutive.
- **WordPiece**: Tokenization method splitting words into subword units.
## More Information
- See the [Hugging Face documentation](https://huggingface.co/docs/transformers/model_doc/bert) for advanced usage details.
- Contact: [email protected]
## Model Card Authors
- Hugging Face team
- BoltUIX contributors
## Model Card Contact
For questions, please contact [email protected] or open an issue on the [model repository](https://huggingface.co/boltuix/bert-tinyplus). |