# System Patterns: HOCR Processing Tool ## 1. High-Level Architecture * **Modular Pipeline:** The system appears structured as a pipeline with distinct modules for different stages of OCR processing. Key modules suggested by filenames include: * `preprocessing.py`: Handles initial image adjustments. * `image_segmentation.py`: Identifies regions of interest (text blocks). * `ocr_processing.py`: Manages the core OCR engine interaction. * `language_detection.py`: Determines the language of the text. * `pdf_ocr.py`: Specific handling for PDF inputs. * `structured_ocr.py`: Likely involved in formatting the output. * **Configuration Driven:** `config.py` suggests a centralized configuration management approach, allowing pipeline behavior to be customized. * **Entry Point / Orchestration:** `app.py` likely serves as the main entry point or orchestrator, possibly for a web UI or API, coordinating the pipeline execution based on user input and configuration. `process_file.py` might be an alternative entry point or a core processing function called by `app.py`. * **UI Layer:** The `ui/` directory (`ui/layout.py`, `ui/ui_components.py`) indicates a dedicated user interface layer, possibly built with Streamlit or Flask (as suggested in `projectbrief.md`). * **Utility Functions:** The `utils/` directory (`utils/image_utils.py`, `utils/text_utils.py`, etc.) points to a pattern of encapsulating reusable helper functions. * **Error Handling:** `error_handler.py` suggests a dedicated mechanism for managing and reporting errors during processing. ## 2. Key Design Patterns (Inferred) * **Pipeline Pattern:** The core processing flow seems to follow a pipeline pattern, where data (image/document) passes through sequential processing stages. * **Configuration Management:** Centralized configuration (`config.py`) allows for decoupling settings from code. * **Separation of Concerns:** Different functionalities (UI, core processing, utilities, configuration) appear to be separated into distinct modules/files. * **Utility/Helper Modules:** Common, reusable functions are grouped into utility modules. ## 3. Component Relationships (Initial Diagram - Mermaid) ```mermaid graph TD subgraph User Interface / Entry Point A[app.py / UI Layer] --> B(process_file.py); end subgraph Configuration C[config.py]; end subgraph Core OCR Pipeline B --> D(preprocessing.py); D --> E(image_segmentation.py); E --> F(ocr_processing.py); F --> G(language_detection.py); G --> H(structured_ocr.py); end subgraph Input Handling I[pdf_ocr.py] --> B; J[Image Input] --> B; end subgraph Utilities K[utils/]; L[error_handler.py]; end A --> C; B --> C; D --> K; E --> K; F --> K; G --> K; H --> K; I --> K; B --> L; style User Interface / Entry Point fill:#f9f,stroke:#333,stroke-width:2px style Configuration fill:#ccf,stroke:#333,stroke-width:2px style Core OCR Pipeline fill:#cfc,stroke:#333,stroke-width:2px style Input Handling fill:#ffc,stroke:#333,stroke-width:2px style Utilities fill:#eee,stroke:#333,stroke-width:2px ``` ## 4. Critical Implementation Paths * **Image Input -> Preprocessing -> Segmentation -> OCR -> Structured Output:** The main flow for image files. * **PDF Input -> PDF Extraction -> Image Conversion (per page) -> [Main Flow] -> Aggregated Output:** The likely path for PDF documents. * **Configuration Loading -> Pipeline Execution:** How settings influence the process. *(This document outlines the observed structure. It will be refined as the codebase is analyzed in more detail.)*