|
# Stock Market QA Chatbot with Text-to-Text Transfer Transformer(T5) |
|
|
|
## π Overview |
|
|
|
This repository hosts the quantized version of the T5 model fine-tuned for question-answer tasks related to stock market. The model has been trained on the stock_trading_QA dataset from Hugging Face. The model is quantized to Float16 (FP16) to optimize inference speed and efficiency while maintaining high performance. |
|
|
|
## π Model Details |
|
|
|
- **Model Architecture:** t5-base |
|
- **Task:** QA Chatbot for Stock Market |
|
- **Dataset:** Hugging Face's `stock_trading_QA` |
|
- **Quantization:** Float16 (FP16) for optimized inference |
|
- **Fine-tuning Framework:** Hugging Face Transformers |
|
|
|
## π Usage |
|
|
|
### Installation |
|
|
|
```bash |
|
pip install transformers torch |
|
``` |
|
|
|
### Loading the Model |
|
|
|
```python |
|
from transformers import T5Tokenizer, T5ForConditionalGeneration |
|
import torch |
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
model_name = "AventIQ-AI/t5-stockmarket-qa-chatbot" |
|
model = T5ForConditionalGeneration.from_pretrained(model_name).to(device) |
|
tokenizer = T5Tokenizer.from_pretrained(model_name) |
|
``` |
|
|
|
### Question Answer Example |
|
|
|
```python |
|
question = "How can I start investing in stocks?" |
|
input_text = "question: " + question |
|
input_ids = tokenizer.encode(input_text, return_tensors="pt").to(model.device) |
|
|
|
with torch.no_grad(): |
|
outputs = model.generate(input_ids, max_length=50) |
|
answer = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
print(f"Question: {question}") |
|
print(f"Predicted Answer: {answer}") |
|
``` |
|
|
|
## π Evaluation Metric: BLEU Score |
|
|
|
For question answer tasks, a high BLEU score indicates that the modelβs corrected sentences closely match human-annotated corrections. |
|
|
|
## **Interpreting Our BLEU Score** |
|
Our model achieved a **BLEU score of 0.7888**, which indicates: |
|
β
**Good answer generating ability** |
|
β
**Moderate sentence fluency** |
|
|
|
BLEU is computed by comparing the **1-gram, 2-gram, 3-gram, and 4-gram overlaps** between the modelβs output and the reference sentence while applying a **brevity penalty** if the model generates shorter sentences. |
|
|
|
### **BLEU Score Ranges for Chatbot** |
|
|
|
| BLEU Score | Interpretation | |
|
| --- | --- | |
|
| **0.8 - 1.0** | Near-perfect corrections, closely matching human annotations. | |
|
| **0.7 - 0.8** | High-quality corrections, minor variations in phrasing. | |
|
| **0.6 - 0.7** | Good corrections, but with some grammatical errors or missing words. | |
|
| **0.5 - 0.6** | Decent corrections, noticeable mistakes, lacks fluency. | |
|
| **Below 0.5** | Needs improvement, frequent incorrect corrections. | |
|
|
|
|
|
## β‘ Quantization Details |
|
|
|
Post-training quantization was applied using PyTorch's built-in quantization framework. The model was quantized to Float16 (FP16) to reduce model size and improve inference efficiency while balancing accuracy. |
|
|
|
## π Repository Structure |
|
|
|
``` |
|
. |
|
βββ model/ # Contains the quantized model files |
|
βββ tokenizer_config/ # Tokenizer configuration and vocabulary files |
|
βββ model.safetensors/ # Quantized Model |
|
βββ README.md # Model documentation |
|
``` |
|
|
|
## β οΈ Limitations |
|
|
|
- The model may struggle with highly ambiguous sentences. |
|
- Quantization may lead to slight degradation in accuracy compared to full-precision models. |
|
- Performance may vary across different writing styles and sentence structures. |
|
|
|
## π€ Contributing |
|
|
|
Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements. |
|
|