PaddleOCR-VL Split Vision Encoder
This repository contains the extracted PaddleOCR-VL split visual artifacts uploaded separately from the full VLM.
Contents
vision_tower_config.jsonvision_tower.safetensorsprojector_config.jsonprojector.safetensors
Architecture
- Vision tower hidden size:
1152 - Projector output hidden size:
1024 - Target repo:
acsfid/PaddleOCR-VL-VisionEncoder
Usage
from model.extracted_vision_encoder import PaddleOCRVLVisionTower, PaddleOCRVLProjector
artifact_dir = "."
vision_tower = PaddleOCRVLVisionTower.from_pretrained(artifact_dir)
projector = PaddleOCRVLProjector.from_pretrained(artifact_dir)
The intended split flow is:
image_processor -> vision_tower -> projector -> decoder-ready image embeddings
Included Python Source
This repo also includes the Python source files needed to load and use the split artifacts:
model/__init__.pymodel/configuration_paddleocr_vl.pymodel/image_processing_paddleocr_vl.pymodel/modeling_paddleocr_vl.pymodel/extracted_vision_encoder.pyrequirements.txt
That means after cloning or downloading this repo, you can directly import the split classes for inference or later training work.
- Downloads last month
- 28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support