File size: 3,335 Bytes
f5381b5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
### **BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification**
This repository hosts a quantized version of the BERT model, fine-tuned for **Disaster SOS Message Classification**. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy.
## **Model Details**
- **Model Architecture:** BERT Base Uncased
- **Task:** Disaster SOS Message Classification
- **Dataset:** Disaster Response Messages Dataset
- **Quantization:** Float16
- **Fine-tuning Framework:** Hugging Face Transformers
## **Usage**
### **Installation**
```sh
pip install transformers torch
```
### **Loading the Model**
```python
from transformers import BertForSequenceClassification, BertTokenizer
import torch
# Load quantized model
quantized_model_path = "/kaggle/working/bert_finetuned_fp16"
quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path)
quantized_model.eval() # Set to evaluation mode
quantized_model.half() # Convert model to FP16
# Load tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
# Define a test SOS message
test_message = "There is a massive earthquake, and people need help immediately!"
# Tokenize input
inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128)
# Ensure input tensors are in correct dtype
inputs["input_ids"] = inputs["input_ids"].long()
inputs["attention_mask"] = inputs["attention_mask"].long()
# Make prediction
with torch.no_grad():
outputs = quantized_model(**inputs)
# Get predicted categories
probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten()
predictions = (probs > 0.5).astype(int)
# Category mapping (Example)
category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"]
predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1]
print(f"Message: {test_message}")
print(f"Predicted Categories: {predicted_labels}")
print(f"Confidence Scores: {probs}")
```
## **Performance Metrics**
- **Accuracy:** 0.85
- **F1 Score:** 0.83
## **Fine-Tuning Details**
### **Dataset**
The dataset is the **Disaster Response Messages Dataset**, which contains real-life messages from various disaster scenarios.
### **Training**
- Number of epochs: 3
- Batch size: 8
- Evaluation strategy: epoch
- Learning rate: 2e-5
### **Quantization**
Post-training quantization was applied using PyTorch’s built-in quantization framework, reducing model size and improving inference speed.
## **Repository Structure**
```
.
├── model/ # Contains the quantized model files
├── tokenizer_config/ # Tokenizer configuration and vocabulary files
├── model.safensors/ # Fine-tuned Model
├── README.md # Model documentation
```
## **Limitations**
- The model may not generalize well to unseen disaster types outside the training data.
- Minor accuracy degradation due to quantization.
## **Contributing**
Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
--- |