PaddleOCR-VL Split Vision Encoder

This repository contains the extracted PaddleOCR-VL split visual artifacts uploaded separately from the full VLM.

Contents

  • vision_tower_config.json
  • vision_tower.safetensors
  • projector_config.json
  • projector.safetensors

Architecture

  • Vision tower hidden size: 1152
  • Projector output hidden size: 1024
  • Target repo: acsfid/PaddleOCR-VL-VisionEncoder

Usage

from model.extracted_vision_encoder import PaddleOCRVLVisionTower, PaddleOCRVLProjector

artifact_dir = "."
vision_tower = PaddleOCRVLVisionTower.from_pretrained(artifact_dir)
projector = PaddleOCRVLProjector.from_pretrained(artifact_dir)

The intended split flow is:

image_processor -> vision_tower -> projector -> decoder-ready image embeddings

Included Python Source

This repo also includes the Python source files needed to load and use the split artifacts:

  • model/__init__.py
  • model/configuration_paddleocr_vl.py
  • model/image_processing_paddleocr_vl.py
  • model/modeling_paddleocr_vl.py
  • model/extracted_vision_encoder.py
  • requirements.txt

That means after cloning or downloading this repo, you can directly import the split classes for inference or later training work.

Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support