File size: 3,440 Bytes

---
license: mit
datasets:
- Yelp/yelp_review_full
metrics:
- accuracy
base_model:
- distilbert/distilbert-base-uncased
library_name: transformers
tags:
- Sentiment Analysis
- Text Classification
- BERT
- Yelp Reviews
- Fine-tuned
---
# Yelp Review Classifier

This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is **star ratings (1 to 5 stars)**. The model was fine-tuned using the `distilbert-base-uncased` model architecture, based on the [DistilBERT model](https://huggingface.co/distilbert/distilbert-base-uncased) from Hugging Face, and trained on a Yelp reviews dataset.

## Model Details
- **Model Type**: DistilBERT-based model for sequence classification
- **Model Architecture**: `distilbert-base-uncased`
- **Number of Parameters**: Approximately 66M parameters
- **Training Dataset**: The model was trained on a curated Yelp reviews dataset, labeled for star ratings (1 to 5 stars).
- **Fine-Tuning Task**: Multi-class classification for Yelp reviews, predicting the star rating (from 1 to 5 stars) based on the content of the review.

## Training Data
- **Dataset**: Custom Yelp reviews dataset
- **Data Description**: The dataset consists of Yelp reviews, labeled for star ratings (1 to 5 stars).
- **Preprocessing**: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs.

## Training Details
- **Training Framework**: Hugging Face Transformers and PyTorch
- **Learning Rate**: 2e-5
- **Epochs**: 6
- **Batch Size**: 16
- **Optimizer**: AdamW
- **Training Time**: Approximately 2 hours on a GPU

## Usage
To use the model for inference, you can use the following code:

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the fine-tuned model and tokenizer from Hugging Face
model_name = "kmack/YELP-Review_Classifier"  # Replace with your model name if different
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# List of reviews for prediction
reviews = [
    "The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!",
    "It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.",
    "I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon."
]

# Map prediction to star ratings
label_map = {
    0: "1 Star",
    1: "2 Stars",
    2: "3 Stars",
    3: "4 Stars",
    4: "5 Stars"
}

# Iterate over each review and get the prediction
for review in reviews:
    # Tokenize the input text
    inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)

    # Get predictions
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the predicted label (0 to 4 for star ratings)
    prediction = torch.argmax(outputs.logits, dim=-1).item()

    # Map prediction to star rating
    predicted_rating = label_map[prediction]

    print(f"Rating: {predicted_rating}\n")
```

## Citation

If you use this model in your research, please cite the following:

```@misc{YELP-Review_Classifier,
  author = {Kmack},
  title = {YELP-Review_Classifier},
  year = {2024},
  url = {https://huggingface.co/kmack/YELP-Review_Classifier}
}
```