AmanSengar's picture
Create README.md
2573acd verified
# πŸ“„ Contract Sentiment Classifier (BERT)
A fine-tuned BERT model for contract sentiment analysis, classifying legal or contractual text into positive, negative, or neutral sentiments.
## 🧠 Model Details
- πŸ“Œ**Base Model**: bert-base-uncased
- πŸ”§**Task**: Sentiment Classification (Contractual Text)
- πŸ” **Labels**: `Negative (0)`, `Neutral (1)`, `Positive (2)`
- πŸ’Ύ **Quantized version available**: for faster inference
- 🧠 **Framework**: PyTorch, Transformers (πŸ€— Hugging Face)
## 🧠 Intended Uses
- βœ… Classifying product feedback and user reviews
- βœ… Sentiment analysis for e-commerce platforms
- βœ… Social media monitoring and customer opinion mining
---
## 🚫 Limitations
- ❌ Designed for English texts only
- ❌Needs further tuning and evaluation on larger, diverse contract.
- ❌ Not suitable for production use without robustness checks.
---
## πŸ‹οΈβ€β™‚οΈ Training Details
- **Base Model**: `bert-base-uncased`
- **Dataset**: Custom labeled Contract Sentiment dataset
- **Epochs**: 3
- **Batch Size**: 5
- **Learning rate**: AdamW
- **Hardware**: Trained on NVIDIA GPU (CUDA-enabled)
---
## πŸ“Š Evaluation Metrics
| Metric | Score |
|------------|-------|
| Accuracy | 0.98 |
| F1 | 0.99 |
| Precision | 0.99 |
| Recall | 0.97 |
---
## πŸ”Ž Label Mapping
| Label ID | Sentiment |
|----------|-----------|
| 0 | Negative |
| 1 | Neutral |
| 2 | Positive |
---
## πŸš€ Usage Example
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import torch
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import Dataset
import torch.nn.functional as F
# Load model and tokenizer
model_name = "AventIQ-AI/Sentiment-Analysis-for-Contract-Sentiment"
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
model.eval()
def tokenize_function(examples):
return tokenizer(examples['text'], padding='max_length', truncation=True)
# Inference
def predict_sentiment(user_text):
# Ensure input is a list for batch processing
if isinstance(user_text, str):
user_text = [user_text]
# Tokenize input text
inputs = tokenizer(user_text, return_tensors="pt", padding=True, truncation=True)
# Predict using the model
with torch.no_grad():
outputs = model(**inputs)
preds = torch.argmax(outputs.logits, dim=1)
# Decode predictions back to original sentiment labels
decoded_preds = label_encoder.inverse_transform(preds.numpy())
# Print each prediction
for text, sentiment in zip(user_text, decoded_preds):
print(f"Text: '{text}' => Sentiment: {sentiment}")
# Example
predict_sentiment("The delivery scheduled")
```
---
## πŸ§ͺ Quantization
- Applied **post-training dynamic quantization** using PyTorch to reduce model size and speed up inference.
- Quantized model supports CPU-based deployments.
---
## πŸ“ Repository Structure
```
.
β”œβ”€β”€ model/ # Quantized model files
β”œβ”€β”€ tokenizer/ # Tokenizer config and vocabulary
β”œβ”€β”€ model.safetensors/ # Fine-tuned full-precision model
β”œβ”€β”€ README.md # Model documentation
```
---
## 🀝 Contributing
We welcome contributions! Please feel free to raise an issue or submit a pull request if you find a bug or have a suggestion.