File size: 3,335 Bytes
f5381b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
 ### **BERT-Base-Uncased Quantized Model for Disaster SOS Message Classification**  

This repository hosts a quantized version of the BERT model, fine-tuned for **Disaster SOS Message Classification**. The model efficiently classifies emergency messages related to disasters, helping prioritize urgent cases. It has been optimized for deployment in resource-constrained environments while maintaining high accuracy.  

## **Model Details**  

- **Model Architecture:** BERT Base Uncased  
- **Task:** Disaster SOS Message Classification  
- **Dataset:** Disaster Response Messages Dataset  
- **Quantization:** Float16  
- **Fine-tuning Framework:** Hugging Face Transformers  

## **Usage**  

### **Installation**  

```sh
pip install transformers torch
```  

### **Loading the Model**  

```python
from transformers import BertForSequenceClassification, BertTokenizer
import torch

# Load quantized model
quantized_model_path = "/kaggle/working/bert_finetuned_fp16"
quantized_model = BertForSequenceClassification.from_pretrained(quantized_model_path)
quantized_model.eval()  # Set to evaluation mode
quantized_model.half()  # Convert model to FP16

# Load tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Define a test SOS message
test_message = "There is a massive earthquake, and people need help immediately!"

# Tokenize input
inputs = tokenizer(test_message, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Ensure input tensors are in correct dtype
inputs["input_ids"] = inputs["input_ids"].long()
inputs["attention_mask"] = inputs["attention_mask"].long()

# Make prediction
with torch.no_grad():
    outputs = quantized_model(**inputs)

# Get predicted categories
probs = torch.sigmoid(outputs.logits).cpu().numpy().flatten()
predictions = (probs > 0.5).astype(int)

# Category mapping (Example)
category_names = ["Earthquake", "Flood", "Medical Emergency", "Infrastructure Damage", "General Help"]
predicted_labels = [category_names[i] for i in range(len(predictions)) if predictions[i] == 1]

print(f"Message: {test_message}")
print(f"Predicted Categories: {predicted_labels}")
print(f"Confidence Scores: {probs}")
```  

## **Performance Metrics**  

- **Accuracy:** 0.85  
- **F1 Score:** 0.83  

## **Fine-Tuning Details**  

### **Dataset**  

The dataset is the **Disaster Response Messages Dataset**, which contains real-life messages from various disaster scenarios.  

### **Training**  

- Number of epochs: 3  
- Batch size: 8  
- Evaluation strategy: epoch  
- Learning rate: 2e-5  

### **Quantization**  

Post-training quantization was applied using PyTorch’s built-in quantization framework, reducing model size and improving inference speed.  

## **Repository Structure**  

```
.
├── model/               # Contains the quantized model files
├── tokenizer_config/    # Tokenizer configuration and vocabulary files
├── model.safensors/     # Fine-tuned Model
├── README.md            # Model documentation
```  

## **Limitations**  

- The model may not generalize well to unseen disaster types outside the training data.  
- Minor accuracy degradation due to quantization.  

## **Contributing**  

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.  

---