File size: 2,715 Bytes
37c4a3d
 
 
 
 
3f37597
37c4a3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f37597
37c4a3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# T5-Base Fine-Tuned Model for Question Answering

This repository hosts a fine-tuned version of the **T5-Base** model optimized for question-answering tasks using the [SQuAD] dataset. The model is designed to efficiently perform question answering while maintaining high accuracy.

## Model Details
- **Model Architecture**:t5-qa-chatbot
- **Task**: Question Answering (QA-Chatbot)
- **Dataset**: [SQuAD]
- **Quantization**: FP16
- **Fine-tuning Framework**: Hugging Face Transformers

## πŸš€ Usage

### Installation

```bash
pip install transformers torch
```

### Loading the Model

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = "AventIQ-AI/t5-qa-chatbot"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name).to(device)
```

### Chatbot Inference

```python
def answer_question(question, context):
    input_text = f"question: {question} context: {context}"
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding="max_length", max_length=512)
    
    # Move input tensors to the same device as the model
    inputs = {key: value.to(device) for key, value in inputs.items()}  
    
    # Generate answer
    with torch.no_grad():
        output = model.generate(**inputs, max_length=150)

    # Decode and return answer
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Test Case
question = "What is overfitting in machine learning?"
context = "Overfitting occurs when a model learns the training data too well, capturing noise instead of actual patterns.
predicted_answer = answer_question(question, context)
print(f"Predicted Answer: {predicted_answer}")

```

## ⚑ Quantization Details

Post-training quantization was applied using PyTorch's built-in quantization framework. The model was quantized to **Float16 (FP16)** to reduce model size and improve inference efficiency while balancing accuracy.

## πŸ“‚ Repository Structure

```
.
β”œβ”€β”€ model/               # Contains the quantized model files
β”œβ”€β”€ tokenizer_config/    # Tokenizer configuration and vocabulary files
β”œβ”€β”€ model.safetensors/   # Quantized Model
β”œβ”€β”€ README.md            # Model documentation
```

## ⚠️ Limitations

- The model may struggle with highly ambiguous sentences.
- Quantization may lead to slight degradation in accuracy compared to full-precision models.
- Performance may vary across different writing styles and sentence structures.

## 🀝 Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.