File size: 3,440 Bytes
f246f1e 87d9d01 f246f1e 87d9d01 f246f1e 87d9d01 f246f1e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
license: mit
datasets:
- Yelp/yelp_review_full
metrics:
- accuracy
base_model:
- distilbert/distilbert-base-uncased
library_name: transformers
tags:
- Sentiment Analysis
- Text Classification
- BERT
- Yelp Reviews
- Fine-tuned
---
# Yelp Review Classifier
This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is **star ratings (1 to 5 stars)**. The model was fine-tuned using the `distilbert-base-uncased` model architecture, based on the [DistilBERT model](https://huggingface.co/distilbert/distilbert-base-uncased) from Hugging Face, and trained on a Yelp reviews dataset.
## Model Details
- **Model Type**: DistilBERT-based model for sequence classification
- **Model Architecture**: `distilbert-base-uncased`
- **Number of Parameters**: Approximately 66M parameters
- **Training Dataset**: The model was trained on a curated Yelp reviews dataset, labeled for star ratings (1 to 5 stars).
- **Fine-Tuning Task**: Multi-class classification for Yelp reviews, predicting the star rating (from 1 to 5 stars) based on the content of the review.
## Training Data
- **Dataset**: Custom Yelp reviews dataset
- **Data Description**: The dataset consists of Yelp reviews, labeled for star ratings (1 to 5 stars).
- **Preprocessing**: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs.
## Training Details
- **Training Framework**: Hugging Face Transformers and PyTorch
- **Learning Rate**: 2e-5
- **Epochs**: 6
- **Batch Size**: 16
- **Optimizer**: AdamW
- **Training Time**: Approximately 2 hours on a GPU
## Usage
To use the model for inference, you can use the following code:
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the fine-tuned model and tokenizer from Hugging Face
model_name = "kmack/YELP-Review_Classifier" # Replace with your model name if different
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# List of reviews for prediction
reviews = [
"The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!",
"It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.",
"I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon."
]
# Map prediction to star ratings
label_map = {
0: "1 Star",
1: "2 Stars",
2: "3 Stars",
3: "4 Stars",
4: "5 Stars"
}
# Iterate over each review and get the prediction
for review in reviews:
# Tokenize the input text
inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
# Get the predicted label (0 to 4 for star ratings)
prediction = torch.argmax(outputs.logits, dim=-1).item()
# Map prediction to star rating
predicted_rating = label_map[prediction]
print(f"Rating: {predicted_rating}\n")
```
## Citation
If you use this model in your research, please cite the following:
```@misc{YELP-Review_Classifier,
author = {Kmack},
title = {YELP-Review_Classifier},
year = {2024},
url = {https://huggingface.co/kmack/YELP-Review_Classifier}
}
``` |