File size: 3,014 Bytes
f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 da54f70 f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e dfb60af f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e f21adb1 f6c202e 85b3e21 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
library_name: transformers
tags:
- document-question-answering
- layoutlmv3
- ocr
- document-understanding
- paddleocr
- multilingual
- layout-aware
- lakshya-singh
license: apache-2.0
language:
- en
base_model:
- microsoft/layoutlmv3-base
datasets:
- nielsr/docvqa_1200_examples
---
# Document QA Model
This is a fine-tuned **document question-answering model** based on `layoutlmv3-base`. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.
---
## Model Details
### Model Description
- **Model Name:** `document-qa-model`
- **Base Model:** [`microsoft/layoutlmv3-base`](https://huggingface.co/microsoft/layoutlmv3-base)
- **Fine-tuned by:** Lakshya Singh (solo contributor)
- **Languages:** English, Spanish, French, German, Italian
- **License:** Apache-2.0 (inherited from base model)
- **Intended Use:** Extract answers to structured queries from scanned documents
- **Not funded** — this project was completed independently.
---
## Model Sources
- **Repository:** [`Github Link`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)
- **Trained on:** Adapted version of [`nielsr/docvqa_1200_examples`](https://huggingface.co/datasets/nielsr/docvqa_1200_examples)
- **Model metrics:** See 
---
## Uses
### Direct Use
This model can be used for:
- Question Answering on document images (PDFs, invoices, utility bills)
- Information extraction tasks using OCR and layout-aware understanding
### Out-of-Scope Use
- Not suitable for conversational QA
- Not suitable for images with no OCR-processed text
---
## Training Details
### Dataset
The dataset consisted of:
- **Images** of utility bills and documents
- **OCR data** with bounding boxes (from PaddleOCR)
- **Queries** in English, Spanish, and Chinese
- **Answer spans** with match scores and positions
### Training Procedure
- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
- Model: LayoutLMv3-base
- Epochs: 4
- Learning rate schedule: Shown in image below
### Training Metrics
- **F1 Score** (validation): 
- **Loss & Learning Rate Chart**: 
---
## Evaluation
### Metrics Used
- F1 score
- Match score of predicted spans
- Token overlap vs ground truth
### Summary
The model performs well on document-style QA tasks, especially with:
- Clearly structured OCR results
- Document types similar to utility bills, invoices, and forms
---
## How to Use
- Available on my [`Github`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)
|