File size: 3,626 Bytes

2573acd

# 📄 Contract Sentiment Classifier (BERT)

A fine-tuned BERT model for contract sentiment analysis, classifying legal or contractual text into positive, negative, or neutral sentiments.


## 🧠 Model Details

- 📌**Base Model**: bert-base-uncased
- 🔧**Task**: Sentiment Classification (Contractual Text)
- 🔍 **Labels**: `Negative (0)`, `Neutral (1)`, `Positive (2)`
- 💾 **Quantized version available**: for faster inference
- 🧠 **Framework**: PyTorch, Transformers (🤗 Hugging Face)




## 🧠 Intended Uses

- ✅ Classifying product feedback and user reviews
- ✅ Sentiment analysis for e-commerce platforms
- ✅ Social media monitoring and customer opinion mining

---

## 🚫 Limitations

- ❌ Designed for English texts only
- ❌Needs further tuning and evaluation on larger, diverse contract.
- ❌ Not suitable for production use without robustness checks.

---

## 🏋️‍♂️ Training Details

- **Base Model**: `bert-base-uncased`
- **Dataset**: Custom labeled Contract Sentiment dataset
- **Epochs**: 3
- **Batch Size**: 5
- **Learning rate**: AdamW
- **Hardware**: Trained on NVIDIA GPU (CUDA-enabled)

---

## 📊 Evaluation Metrics

| Metric     | Score |
|------------|-------|
| Accuracy   | 0.98  |
| F1         | 0.99  |
| Precision  | 0.99  |
| Recall     | 0.97  |

---

## 🔎 Label Mapping

| Label ID | Sentiment |
|----------|-----------|
| 0        | Negative  |
| 1        | Neutral   |
| 2        | Positive  |

---

## 🚀 Usage Example

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import torch
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import Dataset
import torch.nn.functional as F

# Load model and tokenizer
model_name = "AventIQ-AI/Sentiment-Analysis-for-Contract-Sentiment"
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)
model.eval()

def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True)


# Inference
def predict_sentiment(user_text):
    # Ensure input is a list for batch processing
    if isinstance(user_text, str):
        user_text = [user_text]

    # Tokenize input text
    inputs = tokenizer(user_text, return_tensors="pt", padding=True, truncation=True)

    # Predict using the model
    with torch.no_grad():
        outputs = model(**inputs)
    preds = torch.argmax(outputs.logits, dim=1)

    # Decode predictions back to original sentiment labels
    decoded_preds = label_encoder.inverse_transform(preds.numpy())

    # Print each prediction
    for text, sentiment in zip(user_text, decoded_preds):
        print(f"Text: '{text}' => Sentiment: {sentiment}")


# Example
predict_sentiment("The delivery scheduled")
```

---

## 🧪 Quantization

- Applied **post-training dynamic quantization** using PyTorch to reduce model size and speed up inference.
- Quantized model supports CPU-based deployments.

---

## 📁 Repository Structure

```
.
├── model/               # Quantized model files
├── tokenizer/           # Tokenizer config and vocabulary
├── model.safetensors/   # Fine-tuned full-precision model
├── README.md            # Model documentation
```

---



## 🤝 Contributing

We welcome contributions! Please feel free to raise an issue or submit a pull request if you find a bug or have a suggestion.