metadata
datasets:
- prithivMLmods/Document-Type-Detection
license: apache-2.0
language:
- en
base_model:
- google/siglip2-base-patch16-224
pipeline_tag: image-classification
library_name: transformers
tags:
- Document
- Classification
- finance
Document-Type-Detection
Document-Type-Detection is a multi-class image classification model based on
google/siglip2-base-patch16-224
, trained to detect and classify types of documents from scanned or photographed images. This model is helpful for automated document sorting, OCR pipelines, and digital archiving systems.
Classification Report:
precision recall f1-score support
Advertisement-Doc 0.8940 0.8940 0.8940 2000
Hand-Written-Doc 0.9168 0.9310 0.9238 2000
Invoice-Doc 0.9026 0.8940 0.8983 2000
Letter-Doc 0.8380 0.8820 0.8594 2000
News-Article-Doc 0.9258 0.8800 0.9023 2000
Resume-Doc 0.9425 0.9340 0.9382 2000
accuracy 0.9025 12000
macro avg 0.9033 0.9025 0.9027 12000
weighted avg 0.9033 0.9025 0.9027 12000
Label Classes
The model classifies images into the following document types:
0: Advertisement-Doc
1: Hand-Written-Doc
2: Invoice-Doc
3: Letter-Doc
4: News-Article-Doc
5: Resume-Doc
Installation
pip install transformers torch pillow gradio
Example Inference Code
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Document-Type-Detection"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# ID to label mapping
id2label = {
"0": "Advertisement-Doc",
"1": "Hand-Written-Doc",
"2": "Invoice-Doc",
"3": "Letter-Doc",
"4": "News-Article-Doc",
"5": "Resume-Doc"
}
def detect_doc_type(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=detect_doc_type,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=6, label="Document Type"),
title="Document-Type-Detection",
description="Upload a document image to classify it as one of: Advertisement, Hand-Written, Invoice, Letter, News Article, or Resume."
)
if __name__ == "__main__":
iface.launch()
Applications
- Automated Document Sorting
- Digital Libraries and Archives
- OCR Preprocessing
- Enterprise Document Management