|
# π Contract Sentiment Classifier (BERT) |
|
|
|
A fine-tuned BERT model for contract sentiment analysis, classifying legal or contractual text into positive, negative, or neutral sentiments. |
|
|
|
|
|
## π§ Model Details |
|
|
|
- π**Base Model**: bert-base-uncased |
|
- π§**Task**: Sentiment Classification (Contractual Text) |
|
- π **Labels**: `Negative (0)`, `Neutral (1)`, `Positive (2)` |
|
- πΎ **Quantized version available**: for faster inference |
|
- π§ **Framework**: PyTorch, Transformers (π€ Hugging Face) |
|
|
|
|
|
|
|
|
|
## π§ Intended Uses |
|
|
|
- β
Classifying product feedback and user reviews |
|
- β
Sentiment analysis for e-commerce platforms |
|
- β
Social media monitoring and customer opinion mining |
|
|
|
--- |
|
|
|
## π« Limitations |
|
|
|
- β Designed for English texts only |
|
- βNeeds further tuning and evaluation on larger, diverse contract. |
|
- β Not suitable for production use without robustness checks. |
|
|
|
--- |
|
|
|
## ποΈββοΈ Training Details |
|
|
|
- **Base Model**: `bert-base-uncased` |
|
- **Dataset**: Custom labeled Contract Sentiment dataset |
|
- **Epochs**: 3 |
|
- **Batch Size**: 5 |
|
- **Learning rate**: AdamW |
|
- **Hardware**: Trained on NVIDIA GPU (CUDA-enabled) |
|
|
|
--- |
|
|
|
## π Evaluation Metrics |
|
|
|
| Metric | Score | |
|
|------------|-------| |
|
| Accuracy | 0.98 | |
|
| F1 | 0.99 | |
|
| Precision | 0.99 | |
|
| Recall | 0.97 | |
|
|
|
--- |
|
|
|
## π Label Mapping |
|
|
|
| Label ID | Sentiment | |
|
|----------|-----------| |
|
| 0 | Negative | |
|
| 1 | Neutral | |
|
| 2 | Positive | |
|
|
|
--- |
|
|
|
## π Usage Example |
|
|
|
```python |
|
import pandas as pd |
|
from sklearn.model_selection import train_test_split |
|
from sklearn.preprocessing import LabelEncoder |
|
from sklearn.metrics import accuracy_score, precision_recall_fscore_support |
|
import torch |
|
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments |
|
from datasets import Dataset |
|
import torch.nn.functional as F |
|
|
|
# Load model and tokenizer |
|
model_name = "AventIQ-AI/Sentiment-Analysis-for-Contract-Sentiment" |
|
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') |
|
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3) |
|
model.eval() |
|
|
|
def tokenize_function(examples): |
|
return tokenizer(examples['text'], padding='max_length', truncation=True) |
|
|
|
|
|
# Inference |
|
def predict_sentiment(user_text): |
|
# Ensure input is a list for batch processing |
|
if isinstance(user_text, str): |
|
user_text = [user_text] |
|
|
|
# Tokenize input text |
|
inputs = tokenizer(user_text, return_tensors="pt", padding=True, truncation=True) |
|
|
|
# Predict using the model |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
preds = torch.argmax(outputs.logits, dim=1) |
|
|
|
# Decode predictions back to original sentiment labels |
|
decoded_preds = label_encoder.inverse_transform(preds.numpy()) |
|
|
|
# Print each prediction |
|
for text, sentiment in zip(user_text, decoded_preds): |
|
print(f"Text: '{text}' => Sentiment: {sentiment}") |
|
|
|
|
|
# Example |
|
predict_sentiment("The delivery scheduled") |
|
``` |
|
|
|
--- |
|
|
|
## π§ͺ Quantization |
|
|
|
- Applied **post-training dynamic quantization** using PyTorch to reduce model size and speed up inference. |
|
- Quantized model supports CPU-based deployments. |
|
|
|
--- |
|
|
|
## π Repository Structure |
|
|
|
``` |
|
. |
|
βββ model/ # Quantized model files |
|
βββ tokenizer/ # Tokenizer config and vocabulary |
|
βββ model.safetensors/ # Fine-tuned full-precision model |
|
βββ README.md # Model documentation |
|
``` |
|
|
|
--- |
|
|
|
|
|
|
|
## π€ Contributing |
|
|
|
We welcome contributions! Please feel free to raise an issue or submit a pull request if you find a bug or have a suggestion. |