|
--- |
|
license: mit |
|
datasets: |
|
- Yelp/yelp_review_full |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- distilbert/distilbert-base-uncased |
|
library_name: transformers |
|
tags: |
|
- Sentiment Analysis |
|
- Text Classification |
|
- BERT |
|
- Yelp Reviews |
|
- Fine-tuned |
|
--- |
|
# Yelp Review Classifier |
|
|
|
This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is **star ratings (1 to 5 stars)**. The model was fine-tuned using the `distilbert-base-uncased` model architecture, based on the [DistilBERT model](https://huggingface.co/distilbert/distilbert-base-uncased) from Hugging Face, and trained on a Yelp reviews dataset. |
|
|
|
## Model Details |
|
- **Model Type**: DistilBERT-based model for sequence classification |
|
- **Model Architecture**: `distilbert-base-uncased` |
|
- **Number of Parameters**: Approximately 66M parameters |
|
- **Training Dataset**: The model was trained on a curated Yelp reviews dataset, labeled for star ratings (1 to 5 stars). |
|
- **Fine-Tuning Task**: Multi-class classification for Yelp reviews, predicting the star rating (from 1 to 5 stars) based on the content of the review. |
|
|
|
## Training Data |
|
- **Dataset**: Custom Yelp reviews dataset |
|
- **Data Description**: The dataset consists of Yelp reviews, labeled for star ratings (1 to 5 stars). |
|
- **Preprocessing**: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs. |
|
|
|
## Training Details |
|
- **Training Framework**: Hugging Face Transformers and PyTorch |
|
- **Learning Rate**: 2e-5 |
|
- **Epochs**: 6 |
|
- **Batch Size**: 16 |
|
- **Optimizer**: AdamW |
|
- **Training Time**: Approximately 2 hours on a GPU |
|
|
|
## Usage |
|
To use the model for inference, you can use the following code: |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
import torch |
|
|
|
# Load the fine-tuned model and tokenizer from Hugging Face |
|
model_name = "kmack/YELP-Review_Classifier" # Replace with your model name if different |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# List of reviews for prediction |
|
reviews = [ |
|
"The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!", |
|
"It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.", |
|
"I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon." |
|
] |
|
|
|
# Map prediction to star ratings |
|
label_map = { |
|
0: "1 Star", |
|
1: "2 Stars", |
|
2: "3 Stars", |
|
3: "4 Stars", |
|
4: "5 Stars" |
|
} |
|
|
|
# Iterate over each review and get the prediction |
|
for review in reviews: |
|
# Tokenize the input text |
|
inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True) |
|
|
|
# Get predictions |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# Get the predicted label (0 to 4 for star ratings) |
|
prediction = torch.argmax(outputs.logits, dim=-1).item() |
|
|
|
# Map prediction to star rating |
|
predicted_rating = label_map[prediction] |
|
|
|
print(f"Rating: {predicted_rating}\n") |
|
``` |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite the following: |
|
|
|
```@misc{YELP-Review_Classifier, |
|
author = {Kmack}, |
|
title = {YELP-Review_Classifier}, |
|
year = {2024}, |
|
url = {https://huggingface.co/kmack/YELP-Review_Classifier} |
|
} |
|
``` |