|
--- |
|
datasets: |
|
- nhull/tripadvisor-split-dataset-v2 |
|
base_model: |
|
- huawei-noah/TinyBERT_General_4L_312D |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
- f1 |
|
- confusion_matrix |
|
--- |
|
# TinyBERT Sentiment Analysis Model |
|
|
|
This is a fine-tuned TinyBERT model for sentiment analysis on the Tripadvisor dataset. |
|
|
|
## Model Details |
|
- **Base Model**: `huawei-noah/TinyBERT_General_4L_312D` |
|
- **Dataset**: `nhull/tripadvisor-split-dataset-v2` |
|
- **Task**: Multiclass sentiment analysis (5 classes) |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
# Load the model |
|
tokenizer = AutoTokenizer.from_pretrained("elo4/TinyBERT-sentiment-model") |
|
model = AutoModelForSequenceClassification.from_pretrained("elo4/TinyBERT-sentiment-model") |
|
|
|
# Predict sentiment |
|
text = "The hotel was amazing and had great service!" |
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model(**inputs) |
|
predicted_class = outputs.logits.argmax().item() |
|
print(f"Predicted class: {predicted_class}") |
|
``` |
|
|
|
## Testing results |
|
- **Evaluation accuracy**: 0.6535 |
|
- **Precision**: 0.635 |
|
- **Recall**: 0.641 |
|
- **F1 score**: 0.636 |
|
- **Confusion matrix**: |
|
``` |
|
| Predicted → | 1 | 2 | 3 | 4 | 5 | |
|
|---------------|------|------|------|------|------| |
|
| Actual ↓ | | | | | | |
|
| 1 (Very Neg.) | 1219 | 318 | 48 | 6 | 9 | |
|
| 2 (Negative) | 432 | 826 | 294 | 32 | 16 | |
|
| 3 (Neutral) | 51 | 306 | 928 | 275 | 40 | |
|
| 4 (Positive) | 3 | 22 | 223 | 833 | 519 | |
|
| 5 (Very Pos.) | 9 | 6 | 16 | 247 | 1322 | |
|
``` |