- 02 Apr, 2025 11 commits
-
-
myhloli authored
- Update the default configuration path in pytorchocr_utility.py - Add required dependencies for paddleocr2pytorch in setup.py: - shapely - pyclipper - omegaconf
-
myhloli authored
- Remove unused UniMERNet and LayoutLMv3 model configurations - Update OCR model path and dictionary path for PaddleOCR - Modify README to update system requirements and installation instructions - Update setup.py to include new package data
-
myhloli authored
- Remove unused imports for concurrent.futures, multiprocessing, and paddle - Delete commented-out code - Update numpy dependency to remove upper version limit - Remove InferenceResult import that was commented out
-
myhloli authored
- Remove paddleocr, paddlepaddle, rapidocr-paddle, and rapidocr-onnxruntime from requirements.txt files - Simplify pip install commands in Dockerfiles - Remove installation of paddlepaddle-gpu in china and global Dockerfiles - Update requirements.txt files across all Docker configurations
-
myhloli authored
- Commented out the code that copies the paddleocr model to user directory - This change affects both download_models.py and download_models_hf.py scripts
-
myhloli authored
- Update download_models.py and download_models_hf.py scripts - Change OCR model path from paddleocr to paddleocr_torch
-
myhloli authored
- Add newline at the beginning of arabic_dict.txt - Change mode of multiple dictionary files
-
myhloli authored
- Remove OCR utils, modified PaddleOCR, and StructEqTable model - Delete related import statements and model definitions - Update dependencies in setup.py to remove paddlepaddle and related OCR packages
-
myhloli authored
- Comment out print statements in base_ocr_v20.py and pytorch_paddle.py - Update table model initialization to use lang parameter instead of ocr_engine - Remove unused RapidOCR initialization in rapid_table.py
-
myhloli authored
-
myhloli authored
- Comment out OCR model initialization and execution for low-contrast spans - Add batch OCR processing for collected image spans - Adjust contrast threshold for OCR processing - Remove unnecessary OCR processing for high-contrast spans - Implement more efficient OCR workflow by processing multiple spans at once
-
- 01 Apr, 2025 5 commits
-
-
myhloli authored
-
myhloli authored
- Enhance the logging of execution times by adding more detailed function identification - Implement class name and module name inclusion for better traceability
-
myhloli authored
- Remove unused OCR dictionaries for Arabic, Belarusian, Bulgarian and Armenian languages - Update model configurations in arch_config.yaml: - Comment out 'out_channels' for various language models - Rename Arabic, Korean, Japanese, Tamil and Devanagari model configurations to use 'v3' instead of 'v4' - Delete ar_dict.txt, be_dict.txt and bg_dict.txt files - Update arabic_dict.txt to remove blank line at the start
-
myhloli authored
- Remove unused imports and code - Simplify model architecture by removing unnecessary components - Update initialization and forward pass logic - Rename variables for consistency
-
myhloli authored
- Added warnings module to import list - Implemented a warning catcher to ignore FutureWarning from the transformers module - This change prevents unnecessary warning messages during model inference
-
- 31 Mar, 2025 3 commits
-
-
myhloli authored
- Replace direct OCR model access with AtomModelSingleton for better model management - Round OCR scores to 2 decimal places for consistency - Improve error handling and logging in batch analysis - Simplify OCR result processing in pdf_parse_union_core_v2.py
-
myhloli authored
- Add support for multiple languages in OCR processing - Create separate lists for each language to improve processing efficiency - Update OCR model initialization to use PytorchPaddleOCR instead of ModifiedPaddleOCR - Modify get_ocr_result_list function to include language information- Improve logging for OCR detection and recognition
-
myhloli authored
- Split OCR process into detection and recognition stages - Update batch analysis and document analysis pipelines - Modify OCR result formatting and handling - Remove unused imports and optimize code structure
-
- 27 Mar, 2025 4 commits
-
-
Xiaomeng Zhao authored
-
myhloli authored
-
myhloli authored
- Add base model structure for OCR in pytorch - Implement data augmentation and transformation modules - Create utilities for dictionary handling and state dict conversion - Include post-processing modules for OCR - Add weight initialization and loading functions
-
Xiaomeng Zhao authored
feat: remove old inference code
-
- 26 Mar, 2025 3 commits
-
-
icecraft authored
-
Xiaomeng Zhao authored
feat: batch inference with ocr and lang flag
-
icecraft authored
-
- 24 Mar, 2025 10 commits
-
-
Xiaomeng Zhao authored
refactor(pdf_parse): adjust line calculation for block height
-
myhloli authored
- Remove unnecessary addition of 1 when calculating lines for block height - This change affects the logic for both potential double-column and triple-column structures
-
Xiaomeng Zhao authored
refactor(pdf_parse): adjust line calculation for block height
-
myhloli authored
- Remove unnecessary addition of 1 when calculating lines for block height - This change affects the logic for both potential double-column and triple-column structures
-
Xiaomeng Zhao authored
fix(pre_proc): improve character overlap handling in OCR processing
-
myhloli authored
- Add condition to check for identical or space characters when resolving overlaps - Skip non-conflicting character pairs to prevent unnecessary removals
-
Xiaomeng Zhao authored
fix: support auto method and auto lang
-
icecraft authored
-
Xiaomeng Zhao authored
fix(magic_pdf): improve image resizing and padding in UnimerSwinn model
-
myhloli authored
- Comment out margin cropping to prevent errors with broken files - Refactor image resizing to preserve aspect ratio - Update padding calculation and application using OpenCV
-
- 22 Mar, 2025 2 commits
-
-
Xiaomeng Zhao authored
refactor(ocr): improve ONNX model initialization and resource handling
-
myhloli authored
- Replace deprecated importlib.resources.path with importlib.resources.files - Simplify code structure and improve readability - Remove unnecessary comments and empty lines
-
- 21 Mar, 2025 2 commits
-
-
Xiaomeng Zhao authored
feat(pre_proc): add function to remove x-overlapping characters in spans
-
myhloli authored
-