Spaces:
Running
Running
Delete .clinerules
Browse files- .clinerules/activeContext.md +0 -18
- .clinerules/memory-bank.md +0 -19
- .clinerules/productContext.md +0 -11
- .clinerules/progress.md +0 -20
- .clinerules/project-brief.md +0 -13
- .clinerules/systemPatterns.md +0 -19
- .clinerules/techContext.md +0 -16
.clinerules/activeContext.md
DELETED
@@ -1,18 +0,0 @@
|
|
1 |
-
# Active Context
|
2 |
-
|
3 |
-
## Current Development Focus
|
4 |
-
- Improving preprocessing pipeline for better OCR results
|
5 |
-
- Enhancing image segmentation for complex documents
|
6 |
-
- Building structured OCR capabilities
|
7 |
-
- Testing and validation with different document types
|
8 |
-
|
9 |
-
## Recent Changes
|
10 |
-
- Modularized code structure
|
11 |
-
- Added helpers in utils directory
|
12 |
-
- Fixed metadata field ordering and tag classification issues
|
13 |
-
- Updated gitignore to exclude test files and output directories
|
14 |
-
|
15 |
-
## Next Steps
|
16 |
-
- Continue refining the OCR processing pipeline
|
17 |
-
- Improve handling of handwritten documents
|
18 |
-
- Enhance the UI components for better user experience
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.clinerules/memory-bank.md
DELETED
@@ -1,19 +0,0 @@
|
|
1 |
-
# HOCR Project Memory Bank
|
2 |
-
|
3 |
-
This memory bank is for the HOCR (OCR processing) project.
|
4 |
-
|
5 |
-
## Project Context
|
6 |
-
|
7 |
-
This project appears to be focused on OCR (Optical Character Recognition) processing, with capabilities for image segmentation, preprocessing, and various text extraction techniques.
|
8 |
-
|
9 |
-
## System Information
|
10 |
-
|
11 |
-
- Project directory: /Users/zacharymuhlbauer/Desktop/tools/hocr
|
12 |
-
- Main Python files include app.py, preprocessing.py, ocr_processing.py, and various utility modules
|
13 |
-
- Output directories for test results and processing stages
|
14 |
-
|
15 |
-
## Notes
|
16 |
-
|
17 |
-
- The project handles various document types including handwritten documents, printed text, and mixed content
|
18 |
-
- Contains preprocessing steps for image enhancement before OCR
|
19 |
-
- Has testing directories for different document types and processing approaches
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.clinerules/productContext.md
DELETED
@@ -1,11 +0,0 @@
|
|
1 |
-
# Product Context
|
2 |
-
|
3 |
-
This is an OCR processing tool for various document types including handwritten, printed, and mixed content documents. The system handles preprocessing, image segmentation, OCR processing, and text extraction with various enhancements.
|
4 |
-
|
5 |
-
## Features
|
6 |
-
- Document preprocessing (deskewing, thresholding, etc.)
|
7 |
-
- Image segmentation to identify text regions
|
8 |
-
- OCR processing with different strategies for different document types
|
9 |
-
- Language detection
|
10 |
-
- Letterhead handling
|
11 |
-
- Structured data extraction
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.clinerules/progress.md
DELETED
@@ -1,20 +0,0 @@
|
|
1 |
-
# Progress
|
2 |
-
|
3 |
-
## Completed
|
4 |
-
- Basic OCR processing pipeline
|
5 |
-
- Image preprocessing capabilities
|
6 |
-
- Text segmentation algorithm
|
7 |
-
- Initial UI components
|
8 |
-
- Testing framework for various document types
|
9 |
-
|
10 |
-
## In Progress
|
11 |
-
- Improving preprocessing for handwritten documents
|
12 |
-
- Enhancing segmentation accuracy
|
13 |
-
- Building structured output formatting
|
14 |
-
- Refining language detection
|
15 |
-
|
16 |
-
## Planned
|
17 |
-
- Additional output formats
|
18 |
-
- Performance optimization
|
19 |
-
- More comprehensive testing
|
20 |
-
- Enhanced UI features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.clinerules/project-brief.md
DELETED
@@ -1,13 +0,0 @@
|
|
1 |
-
# Project Brief
|
2 |
-
|
3 |
-
## Overview
|
4 |
-
This project focuses on Optical Character Recognition (OCR) processing for various document types. It handles different document formats and qualities, applying specialized preprocessing and recognition techniques.
|
5 |
-
|
6 |
-
## Goals
|
7 |
-
- Improve OCR accuracy for challenging document types
|
8 |
-
- Support multiple input formats (images, PDFs)
|
9 |
-
- Provide structured output of recognized text
|
10 |
-
- Enable interactive usage with UI components
|
11 |
-
|
12 |
-
## Current Status
|
13 |
-
The project has multiple components in place including preprocessing, segmentation, OCR processing, and utility functions. Testing infrastructure is available for different document types and processing approaches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.clinerules/systemPatterns.md
DELETED
@@ -1,19 +0,0 @@
|
|
1 |
-
# System Patterns
|
2 |
-
|
3 |
-
## Code Organization
|
4 |
-
- Main processing components in root directory
|
5 |
-
- Utility functions in utils/ directory with specific submodules
|
6 |
-
- UI components in ui/ directory
|
7 |
-
- Test cases and samples in testing/ directory
|
8 |
-
- Input/output directories for document processing
|
9 |
-
|
10 |
-
## Naming Conventions
|
11 |
-
- Snake case for file names and functions
|
12 |
-
- Module names reflect their purpose (e.g., ocr_processing.py, image_segmentation.py)
|
13 |
-
- Consistent test output naming with descriptive prefixes
|
14 |
-
|
15 |
-
## Processing Pipeline
|
16 |
-
1. Preprocessing step (enhancement, cleaning)
|
17 |
-
2. Segmentation (identifying text regions)
|
18 |
-
3. OCR processing with context-specific strategies
|
19 |
-
4. Post-processing and output formatting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.clinerules/techContext.md
DELETED
@@ -1,16 +0,0 @@
|
|
1 |
-
# Technical Context
|
2 |
-
|
3 |
-
This project is a Python-based OCR solution with the following components:
|
4 |
-
|
5 |
-
## Tech Stack
|
6 |
-
- Python
|
7 |
-
- Image processing libraries (OpenCV, PIL)
|
8 |
-
- OCR engines
|
9 |
-
- UI components for interactive usage
|
10 |
-
- File processing utilities
|
11 |
-
|
12 |
-
## Architecture
|
13 |
-
- Modular design with separate components for preprocessing, OCR, and output formatting
|
14 |
-
- Utility modules organized in utils/ directory
|
15 |
-
- Testing framework for various document types
|
16 |
-
- Configuration system for processing parameters
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|