|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- it |
|
- de |
|
- es |
|
base_model: |
|
- Ultralytics/YOLOv8 |
|
pipeline_tag: object-detection |
|
library_name: ultralytics |
|
library_version: 8.3.66 |
|
inference: false |
|
tags: |
|
- ultralytics |
|
- yolov8 |
|
- yolo |
|
- vision |
|
- object-detection |
|
- pytorch |
|
--- |
|
## MaterialSpecVision 🔎 |
|
|
|
**release 1.0** |
|
|
|
 |
|
|
|
--- |
|
|
|
### WHAT IS MaterialSpecVision? |
|
|
|
This is a very specific object detection model designed to **detect heat numbers in material |
|
certificates**, solving the challenge of locating them manually in poorly scanned or low-quality |
|
PDF documents. |
|
|
|
### WHY USE MaterialSpecVision? |
|
|
|
Heat numbers, which are critical for traceability in material documentation, can often be |
|
difficult to identify due to inconsistent formatting, low resolution, or cluttered layouts. |
|
|
|
### WHO SHOULD USE MaterialSpecVision |
|
|
|
Ideal for engineers and quality control teams, it streamlines the task of parsing certificates, |
|
even under challenging conditions. |
|
|
|
### TECHNICAL DETAILS |
|
|
|
**Trained on a dataset of over 2,000 material certificates** in German, Italian, Spanish, and Chinese, |
|
ensuring robust performance across a wide range of formats and languages. Model ensures high |
|
accuracy and reliability, effectively identifying and highlighting heat numbers even in low quality |
|
material certificates. |
|
|
|
```python |
|
"""Model Training Configuration:""" |
|
model = YOLO('yolov8n.pt') |
|
model.train(epochs=100, batch=16, imgsz=640) |
|
``` |
|
 |
|
|
|
### USAGE EXAMPLES |
|
|
|
* Single-page Material Certificate (JPG format) |
|
```python |
|
import matplotlib.pyplot as plt |
|
from ultralytics import YOLO |
|
|
|
model = YOLO("best.pt") # the current pre-trained model file you download from HF |
|
results = model("your_material_certificate_file.jpg") |
|
|
|
for r in results: |
|
filtered_boxes = r.boxes[r.boxes.conf > 0.4] # filtering confidence |
|
r.boxes = filtered_boxes |
|
im_bgr = r.plot(line_width=5, font_size=5, conf=True) |
|
plt.imshow(im_bgr) |
|
plt.axis('off') |
|
plt.show() |
|
``` |
|
* Multi-page Material Certificate (PDF format)</br> |
|
For PDFs, you need to convert each page to JPG images before processing. This requires installing |
|
**Tesseract** and **Poppler**. |
|
```python |
|
import pytesseract |
|
from pdf2image import convert_from_path |
|
import os |
|
|
|
import matplotlib.pyplot as plt |
|
from ultralytics import YOLO |
|
|
|
custom_config = r'--oem 3 --psm 4' |
|
pytesseract.pytesseract.tesseract_cmd = r'\...\Tesseract-OCR\tesseract.exe' # download and install Tesseract |
|
poppler_path=r'\...\poppler-24.08.0\Library\bin' # download and install poppler |
|
|
|
pdf_file_name = "your_material_certificate.pdf" |
|
pdf_file_path = rf":\...\{pdf_file_name}" # absolute path to your material certificate |
|
images_to_process = [] |
|
model = YOLO("best.pt") |
|
|
|
try: |
|
images = convert_from_path(pdf_file_path, |
|
poppler_path=poppler_path, |
|
dpi=300, |
|
use_cropbox=True) |
|
for i, image in enumerate(images): |
|
orientation_data = pytesseract.image_to_osd(image) |
|
rotation_angle = int(orientation_data.split("Rotate: ")[1].split("\n")[0]) |
|
full_path_img = f"{os.path.join(os.getcwd(), str(i) + pdf_file_name[:-4])}.jpg" |
|
images_to_process.append(full_path_img) |
|
if rotation_angle != 0: |
|
rotated_image = image.rotate(-rotation_angle, expand=True) |
|
rotated_image.save(full_path_img, "JPEG") |
|
else: |
|
image.save(full_path_img, "JPEG") |
|
except: |
|
print(f"PDF with file name {pdf_file_name} didn't processed ") |
|
|
|
for img_path in images_to_process: |
|
results = model(img_path) |
|
for r in results: |
|
filtered_boxes = r.boxes[r.boxes.conf > 0.1] # filtered confidence |
|
r.boxes = filtered_boxes |
|
im_bgr = r.plot(line_width=5, font_size=5, conf=True) |
|
plt.imshow(im_bgr) |
|
plt.axis('off') |
|
plt.show() |
|
``` |
|
|
|
### IF DETECTION ISN'T WORKING AS EXPECTED. |
|
If detection doesn't work as expected, I’d love to review your material certificates for |
|
further investigation and potential inclusion in the next training session. Feel free to reach me out via my linkedin account: http://linkedin.com/in/sergey-zhmaev-7a325896 |
|
|
|
### CITATION |
|
If you use this model, please cite it as follows: |
|
|
|
```yaml |
|
cff-version: 1.2.0 |
|
message: "If you use this model, please cite:" |
|
authors: |
|
- family-names: Zhmaev |
|
given-names: Sergey |
|
title: "Material Specifications AI Vision Model" |
|
version: 1.0 |
|
doi: 10.57967/hf/4255 |
|
``` |
|
|
|
### LICENSE |
|
|
|
**Apache 2.0** |