- 09 Dec, 2024 1 commit
-
-
icecraft authored
-
- 07 Dec, 2024 1 commit
-
-
sawmice authored
-
- 06 Dec, 2024 9 commits
-
-
myhloli authored
- Remove concurrency limit logic from app.py - Update model initialization process in various modules - Remove unused VRAM check for concurrency limit - Refactor OCR model initialization in pdf_extract_kit.py - Update txt_spans_extract_v2 function to use lang parameter instead of ocr_model
-
myhloli authored
- Remove usage of AtomModelSingleton for OCR model creation - Add ocr_model_init function to initialize OCR model - Update OCR model initialization in pdf_extract_kit.py and pdf_parse_union_core_v2.py - Modify txt_spans_extract_v2 function to accept ocr_model as a parameter - Update parse_page_core function to use ocr_model instead of lang for OCR processing
-
myhloli authored
- Add threading support for OCR model initialization - Modify AtomModelSingleton to handle thread-specific instances - Update PDFExtractKit and PDFParseUnionCoreV2 to use new thread-safe OCR initialization
-
myhloli authored
- Remove threading.Lock import and usage - Delete unused model initialization comments and code- Simplify OCR model initialization in both pdf_extract_kit.py and pdf_parse_union_core_v2.py
-
myhloli authored
- Remove usage of AtomModelSingleton for OCR model initialization - Add import of ocr_model_init from model_init module - Update OCR model initialization process to use ocr_model_init function - Remove lock for OCR processing as it's no longer needed
-
myhloli authored
- Remove usage of ModelSingleton class - Initialize model directly using custom_model_init function - Add self._lock attribute to PDFExtractKit class for thread safety- Replace local lock with self._lock for OCR processing
-
myhloli authored
-
赵小蒙 authored
- Remove unnecessary threading.Lock in AtomModelSingleton - Add threading.Lock to CustomPEKModel for OCR processing - Simplify model initialization logic in AtomModelSingleton
-
myhloli authored
- Add condition to return existing model if already initialized - Improve efficiency by avoiding redundant model creation
-
- 05 Dec, 2024 1 commit
-
-
myhloli authored
- Introduce a lock to synchronize access to OCR model initialization- This change improves thread safety when multiple threads access the OCR model concurrently - The lock ensures that the OCR model is initialized only once, even in multi-threaded scenarios
-
- 03 Dec, 2024 5 commits
-
-
myhloli authored
- Update VRAM checking logic in app.py and model_utils.py - Add None and type checks for VRAM values - Adjust concurrency limit calculation in app.py - Modify clean_vram function to handle cases with no VRAM information
-
xu rui authored
-
icecraft authored
-
icecraft authored
-
myhloli authored
- Add get_concurrency_limit function to calculate concurrency limit based on VRAM - Update clean_vram function and rename to get_vram for better clarity - Apply concurrency limit to the to_markdown function in the Gradio app
-
- 29 Nov, 2024 1 commit
-
-
myhloli authored
-
- 28 Nov, 2024 1 commit
-
-
myhloli authored
-
- 27 Nov, 2024 2 commits
-
-
myhloli authored
- Remove unused function `calculate_angle_degrees`- Refactor `calculate_is_angle` to use directly in OCR processing - Eliminate unnecessary loop index `idx` in OCR processing loops
-
myhloli authored
- Remove unused imports from commons.py - Delete unused functions related to AWS and S3 operations - Update import statements in other modules to reflect changes in commons.py - Remove redundant code and improve code readability
-
- 26 Nov, 2024 2 commits
-
-
myhloli authored
- Decrease the maximum image size threshold from 9000 to 4500 pixels - This change aims to improve performance and reduce memory usage - Affects the custom model document analysis process
-
myhloli authored
- Add confidence score threshold to filter out low confidence OCR results - Improve OCR accuracy by ignoring less certain detections
-
- 24 Nov, 2024 2 commits
- 22 Nov, 2024 2 commits
-
-
myhloli authored
- Move page total time logging to doc_analyze_by_custom_model.py - Remove page total time logging from pdf_extract_kit.py - Add page_start timing variable to custom model analysis - Update logger output format for page total time
-
myhloli authored
- Add a null check for OCR result in the predict method - Return None values if OCR result is None to prevent further processing
-
- 21 Nov, 2024 2 commits
-
-
myhloli authored
- Implement new text extraction method (txt_spans_extract_v2) to enhance accuracy - Add character filling in spans for better text reconstruction - Introduce empty span handling using OCR for missed text - Optimize span filtering and overlap removal
-
myhloli authored
- Update OCR utils to handle different box formats and improve angle calculation - Modify PDF extraction kit to support OCR option and optimize processing flow - Enhance PPOCR model to sort and filter detection boxes, improving text splitting accuracy
-
- 19 Nov, 2024 1 commit
-
-
icecraft authored
-
- 18 Nov, 2024 2 commits
- 15 Nov, 2024 1 commit
-
-
myhloli authored
-
- 08 Nov, 2024 2 commits
-
-
myhloli authored
- Integrate RapidOCR with RapidTable model for table recognition - Improve memory management for devices with <= 8GB VRAM - Update table recognition process to use RapidOCR for RapidTable - Add rapidocr-paddle dependency in setup.py
-
myhloli authored
- Add RapidTable model support for table recognition - Update table model configuration and initialization - Modify table recognition process to use RapidTable when specified - Add RapidTable dependency to setup.py
-
- 07 Nov, 2024 1 commit
-
-
myhloli authored
- Implement xycut algorithm to sort blocks when layoutreader fails - Add recursive_xy_cut function to perform the xycut algorithm- Update pdf_parse_union_core_v2.py to use xycut when layoutreader fails - Modify draw_bbox.py to handle cases where layoutreader fails to sort blocks
-
- 06 Nov, 2024 1 commit
-
-
myhloli authored
- Remove unused code for copying detection and recognition models - Simplify OCR model initialization using atom_model_manager - Delete unnecessary comments and empty lines
-
- 05 Nov, 2024 1 commit
-
-
myhloli authored
- Replace np.array with np.asarray for better performance - Add image color conversion from RGB to BGR using OpenCV
-
- 04 Nov, 2024 2 commits
-
-
myhloli authored
- Import 're' module for regular expression operations - Implement HTML minification for 'output_format=html' - Add 'minify_html' method to remove unnecessary whitespace and format HTML
-
myhloli authored
- Comment out an unused code block in the ppTableModel.py file - Improve code readability and maintainability by removing unnecessary code
-