historical-ocr / memory-bank /systemPatterns.md
milwright's picture
add memory
4c10be0

A newer version of the Streamlit SDK is available: 1.45.1

Upgrade

System Patterns: HOCR Processing Tool

1. High-Level Architecture

  • Modular Pipeline: The system appears structured as a pipeline with distinct modules for different stages of OCR processing. Key modules suggested by filenames include:
    • preprocessing.py: Handles initial image adjustments.
    • image_segmentation.py: Identifies regions of interest (text blocks).
    • ocr_processing.py: Manages the core OCR engine interaction.
    • language_detection.py: Determines the language of the text.
    • pdf_ocr.py: Specific handling for PDF inputs.
    • structured_ocr.py: Likely involved in formatting the output.
  • Configuration Driven: config.py suggests a centralized configuration management approach, allowing pipeline behavior to be customized.
  • Entry Point / Orchestration: app.py likely serves as the main entry point or orchestrator, possibly for a web UI or API, coordinating the pipeline execution based on user input and configuration. process_file.py might be an alternative entry point or a core processing function called by app.py.
  • UI Layer: The ui/ directory (ui/layout.py, ui/ui_components.py) indicates a dedicated user interface layer, possibly built with Streamlit or Flask (as suggested in projectbrief.md).
  • Utility Functions: The utils/ directory (utils/image_utils.py, utils/text_utils.py, etc.) points to a pattern of encapsulating reusable helper functions.
  • Error Handling: error_handler.py suggests a dedicated mechanism for managing and reporting errors during processing.

2. Key Design Patterns (Inferred)

  • Pipeline Pattern: The core processing flow seems to follow a pipeline pattern, where data (image/document) passes through sequential processing stages.
  • Configuration Management: Centralized configuration (config.py) allows for decoupling settings from code.
  • Separation of Concerns: Different functionalities (UI, core processing, utilities, configuration) appear to be separated into distinct modules/files.
  • Utility/Helper Modules: Common, reusable functions are grouped into utility modules.

3. Component Relationships (Initial Diagram - Mermaid)

graph TD
    subgraph User Interface / Entry Point
        A[app.py / UI Layer] --> B(process_file.py);
    end

    subgraph Configuration
        C[config.py];
    end

    subgraph Core OCR Pipeline
        B --> D(preprocessing.py);
        D --> E(image_segmentation.py);
        E --> F(ocr_processing.py);
        F --> G(language_detection.py);
        G --> H(structured_ocr.py);
    end

    subgraph Input Handling
        I[pdf_ocr.py] --> B;
        J[Image Input] --> B;
    end

    subgraph Utilities
        K[utils/];
        L[error_handler.py];
    end

    A --> C;
    B --> C;
    D --> K;
    E --> K;
    F --> K;
    G --> K;
    H --> K;
    I --> K;
    B --> L;

    style User Interface / Entry Point fill:#f9f,stroke:#333,stroke-width:2px
    style Configuration fill:#ccf,stroke:#333,stroke-width:2px
    style Core OCR Pipeline fill:#cfc,stroke:#333,stroke-width:2px
    style Input Handling fill:#ffc,stroke:#333,stroke-width:2px
    style Utilities fill:#eee,stroke:#333,stroke-width:2px

4. Critical Implementation Paths

  • Image Input -> Preprocessing -> Segmentation -> OCR -> Structured Output: The main flow for image files.
  • PDF Input -> PDF Extraction -> Image Conversion (per page) -> [Main Flow] -> Aggregated Output: The likely path for PDF documents.
  • Configuration Loading -> Pipeline Execution: How settings influence the process.

(This document outlines the observed structure. It will be refined as the codebase is analyzed in more detail.)