|
**π§ Q&AMODEL-SQUAD** |
|
|
|
A roberta-base-squad2 extractive Question Answering model fine-tuned on the SQuAD v2.0 dataset to predict precise answers from context passages, including handling unanswerable questions. |
|
|
|
--- |
|
|
|
β¨ **Model Highlights** |
|
|
|
- π Based on roberta-base-squad2 |
|
- π Fine-tuned on SQuAD v2.0 (or your custom QA dataset) |
|
- β‘ Supports extractive question answering finds precise answers from context passages |
|
- πΎ Suitable for real-time inference with minimal latency on both CPU and GPU |
|
- π οΈ Easily integrable into web apps, enterprise tools, and virtual assistants |
|
- π Handles unanswerable questions gracefully with no-answer detection (if trained on SQuAD v2) |
|
|
|
--- |
|
|
|
π§ Intended Uses |
|
|
|
- β
Customer support bots that extract answers from product manuals or FAQs |
|
- β
Educational tools that answer student queries based on textbooks or syllabus |
|
- β
Legal, financial, or technical document analysis |
|
- β
Search engines with context-aware question answering |
|
- β
Chatbots that require contextual comprehension for precise responses |
|
|
|
--- |
|
|
|
- π« Limitations |
|
|
|
- βTrained primarily on formal text performance may degrade on informal or slang-heavy input |
|
- βDoes not support multi-hop questions requiring reasoning across multiple paragraphs |
|
- β May struggle with ambiguous questions or context with multiple possible answers |
|
- β Not designed for very long documents (performance may drop for inputs >512 tokens) |
|
|
|
--- |
|
|
|
ποΈββοΈ Training Details |
|
|
|
| Field | Value | |
|
| -------------- | ------------------------------ | |
|
| **Base Model** | `roberta-base-squad2` | |
|
| **Dataset** | SQuAD v2.0 | |
|
| **Framework** | PyTorch with Transformers | |
|
| **Epochs** | 3 | |
|
| **Batch Size** | 16 | |
|
| **Optimizer** | AdamW | |
|
| **Loss** | CrossEntropyLoss (token-level) | |
|
| **Device** | Trained on CUDA-enabled GPU | |
|
|
|
--- |
|
|
|
π Evaluation Metrics |
|
|
|
| Metric | Score | |
|
| ----------------------------------------------- | ----- | |
|
| Accuracy | 0.80 | |
|
| F1-Score | 0.78 | |
|
| Precision | 0.79 | |
|
| Recall | 0.78 | |
|
|
|
--- |
|
|
|
π Usage |
|
```python |
|
from transformers import BertTokenizerFast, BertForTokenClassification |
|
from transformers import pipeline |
|
import torch |
|
|
|
model_name = "AventIQ-AI/QA-Squad-Model" |
|
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint) |
|
model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint) |
|
model.eval() |
|
|
|
|
|
|
|
#Inference |
|
|
|
|
|
qa_pipeline = pipeline("question-answering", model="./qa_model", tokenizer="./qa_model") |
|
|
|
# Provide a context and a question |
|
context = """ |
|
The Amazon rainforest, also known as Amazonia, is a moist broadleaf tropical rainforest in the Amazon biome |
|
that covers most of the Amazon basin of South America. This region includes territory belonging to nine nations. |
|
""" |
|
question = "What is the Amazon rainforest also known as?" |
|
|
|
# Run inference |
|
result = qa_pipeline(question=question, context=context) |
|
|
|
# Print the result |
|
print(f"Question: {question}") |
|
print(f"Answer: {result['answer']}") |
|
print(f"Score: {result['score']:.4f}") |
|
``` |
|
--- |
|
|
|
- π§© Quantization |
|
- Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices. |
|
|
|
---- |
|
|
|
π Repository Structure |
|
``` |
|
. |
|
βββ model/ # Quantized model files |
|
βββ tokenizer_config/ # Tokenizer and vocab files |
|
βββ model.safensors/ # Fine-tuned model in safetensors format |
|
βββ README.md # Model card |
|
|
|
``` |
|
--- |
|
π€ Contributing |
|
|
|
Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model. |
|
|
|
|
|
|