milwright commited on
Commit
72f6723
·
verified ·
1 Parent(s): cfec41e

Delete .clinerules

Browse files
.clinerules/activeContext.md DELETED
@@ -1,18 +0,0 @@
1
- # Active Context
2
-
3
- ## Current Development Focus
4
- - Improving preprocessing pipeline for better OCR results
5
- - Enhancing image segmentation for complex documents
6
- - Building structured OCR capabilities
7
- - Testing and validation with different document types
8
-
9
- ## Recent Changes
10
- - Modularized code structure
11
- - Added helpers in utils directory
12
- - Fixed metadata field ordering and tag classification issues
13
- - Updated gitignore to exclude test files and output directories
14
-
15
- ## Next Steps
16
- - Continue refining the OCR processing pipeline
17
- - Improve handling of handwritten documents
18
- - Enhance the UI components for better user experience
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.clinerules/memory-bank.md DELETED
@@ -1,19 +0,0 @@
1
- # HOCR Project Memory Bank
2
-
3
- This memory bank is for the HOCR (OCR processing) project.
4
-
5
- ## Project Context
6
-
7
- This project appears to be focused on OCR (Optical Character Recognition) processing, with capabilities for image segmentation, preprocessing, and various text extraction techniques.
8
-
9
- ## System Information
10
-
11
- - Project directory: /Users/zacharymuhlbauer/Desktop/tools/hocr
12
- - Main Python files include app.py, preprocessing.py, ocr_processing.py, and various utility modules
13
- - Output directories for test results and processing stages
14
-
15
- ## Notes
16
-
17
- - The project handles various document types including handwritten documents, printed text, and mixed content
18
- - Contains preprocessing steps for image enhancement before OCR
19
- - Has testing directories for different document types and processing approaches
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.clinerules/productContext.md DELETED
@@ -1,11 +0,0 @@
1
- # Product Context
2
-
3
- This is an OCR processing tool for various document types including handwritten, printed, and mixed content documents. The system handles preprocessing, image segmentation, OCR processing, and text extraction with various enhancements.
4
-
5
- ## Features
6
- - Document preprocessing (deskewing, thresholding, etc.)
7
- - Image segmentation to identify text regions
8
- - OCR processing with different strategies for different document types
9
- - Language detection
10
- - Letterhead handling
11
- - Structured data extraction
 
 
 
 
 
 
 
 
 
 
 
 
.clinerules/progress.md DELETED
@@ -1,20 +0,0 @@
1
- # Progress
2
-
3
- ## Completed
4
- - Basic OCR processing pipeline
5
- - Image preprocessing capabilities
6
- - Text segmentation algorithm
7
- - Initial UI components
8
- - Testing framework for various document types
9
-
10
- ## In Progress
11
- - Improving preprocessing for handwritten documents
12
- - Enhancing segmentation accuracy
13
- - Building structured output formatting
14
- - Refining language detection
15
-
16
- ## Planned
17
- - Additional output formats
18
- - Performance optimization
19
- - More comprehensive testing
20
- - Enhanced UI features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.clinerules/project-brief.md DELETED
@@ -1,13 +0,0 @@
1
- # Project Brief
2
-
3
- ## Overview
4
- This project focuses on Optical Character Recognition (OCR) processing for various document types. It handles different document formats and qualities, applying specialized preprocessing and recognition techniques.
5
-
6
- ## Goals
7
- - Improve OCR accuracy for challenging document types
8
- - Support multiple input formats (images, PDFs)
9
- - Provide structured output of recognized text
10
- - Enable interactive usage with UI components
11
-
12
- ## Current Status
13
- The project has multiple components in place including preprocessing, segmentation, OCR processing, and utility functions. Testing infrastructure is available for different document types and processing approaches.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.clinerules/systemPatterns.md DELETED
@@ -1,19 +0,0 @@
1
- # System Patterns
2
-
3
- ## Code Organization
4
- - Main processing components in root directory
5
- - Utility functions in utils/ directory with specific submodules
6
- - UI components in ui/ directory
7
- - Test cases and samples in testing/ directory
8
- - Input/output directories for document processing
9
-
10
- ## Naming Conventions
11
- - Snake case for file names and functions
12
- - Module names reflect their purpose (e.g., ocr_processing.py, image_segmentation.py)
13
- - Consistent test output naming with descriptive prefixes
14
-
15
- ## Processing Pipeline
16
- 1. Preprocessing step (enhancement, cleaning)
17
- 2. Segmentation (identifying text regions)
18
- 3. OCR processing with context-specific strategies
19
- 4. Post-processing and output formatting
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.clinerules/techContext.md DELETED
@@ -1,16 +0,0 @@
1
- # Technical Context
2
-
3
- This project is a Python-based OCR solution with the following components:
4
-
5
- ## Tech Stack
6
- - Python
7
- - Image processing libraries (OpenCV, PIL)
8
- - OCR engines
9
- - UI components for interactive usage
10
- - File processing utilities
11
-
12
- ## Architecture
13
- - Modular design with separate components for preprocessing, OCR, and output formatting
14
- - Utility modules organized in utils/ directory
15
- - Testing framework for various document types
16
- - Configuration system for processing parameters