kmack
/

YELP-Review_Classifier

Text Classification

Sentiment Analysis

Text Classification

Model card Files Files and versions Community

YELP-Review_Classifier / README.md

kmack's picture

Update README.md

87d9d01 verified 6 months ago

|

history blame contribute delete

3.44 kB

	---
	license: mit
	datasets:
	- Yelp/yelp_review_full
	metrics:
	- accuracy
	base_model:
	- distilbert/distilbert-base-uncased
	library_name: transformers
	tags:
	- Sentiment Analysis
	- Text Classification
	- BERT
	- Yelp Reviews
	- Fine-tuned
	---
	# Yelp Review Classifier

	This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is star ratings (1 to 5 stars). The model was fine-tuned using the `distilbert-base-uncased` model architecture, based on the [DistilBERT model](https://huggingface.co/distilbert/distilbert-base-uncased) from Hugging Face, and trained on a Yelp reviews dataset.

	## Model Details
	- Model Type: DistilBERT-based model for sequence classification
	- Model Architecture: `distilbert-base-uncased`
	- Number of Parameters: Approximately 66M parameters
	- Training Dataset: The model was trained on a curated Yelp reviews dataset, labeled for star ratings (1 to 5 stars).
	- Fine-Tuning Task: Multi-class classification for Yelp reviews, predicting the star rating (from 1 to 5 stars) based on the content of the review.

	## Training Data
	- Dataset: Custom Yelp reviews dataset
	- Data Description: The dataset consists of Yelp reviews, labeled for star ratings (1 to 5 stars).
	- Preprocessing: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs.

	## Training Details
	- Training Framework: Hugging Face Transformers and PyTorch
	- Learning Rate: 2e-5
	- Epochs: 6
	- Batch Size: 16
	- Optimizer: AdamW
	- Training Time: Approximately 2 hours on a GPU

	## Usage
	To use the model for inference, you can use the following code:

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load the fine-tuned model and tokenizer from Hugging Face
	model_name = "kmack/YELP-Review_Classifier" # Replace with your model name if different
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# List of reviews for prediction
	reviews = [
	"The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!",
	"It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.",
	"I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon."
	]

	# Map prediction to star ratings
	label_map = {
	0: "1 Star",
	1: "2 Stars",
	2: "3 Stars",
	3: "4 Stars",
	4: "5 Stars"
	}

	# Iterate over each review and get the prediction
	for review in reviews:
	# Tokenize the input text
	inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)

	# Get predictions
	with torch.no_grad():
	outputs = model(**inputs)

	# Get the predicted label (0 to 4 for star ratings)
	prediction = torch.argmax(outputs.logits, dim=-1).item()

	# Map prediction to star rating
	predicted_rating = label_map[prediction]

	print(f"Rating: {predicted_rating}\n")
	```

	## Citation

	If you use this model in your research, please cite the following:

	```@misc{YELP-Review_Classifier,
	author = {Kmack},
	title = {YELP-Review_Classifier},
	year = {2024},
	url = {https://huggingface.co/kmack/YELP-Review_Classifier}
	}
	```