- 28 Apr, 2025 1 commit
-
-
myhloli authored
- Add support for \(\) and \[\] delimiters in addition to $$ and $$- Make LaTeX delimiter configuration more flexible and user-defined - Update configuration file to include LaTeX delimiter settings - Modify OCR content generation to use configurable delimiters
-
- 27 Apr, 2025 4 commits
-
-
myhloli authored
- Add \textunderscore to the list of LaTeX patterns - This allows the model to properly render underscore characters
-
myhloli authored
-
myhloli authored
- Improve \left and \right command handling in LaTeX formulas - Enhance environment type matching for array, matrix, and other structures - Refactor code for better readability and maintainability
-
myhloli authored
- Refactor LaTeX left/right pair fixing logic for better balance - Add environment detection and correction for common math environments - Implement more robust whitespace handling and command substitution - Optimize regex patterns for improved performance and readability
-
- 25 Apr, 2025 2 commits
-
-
myhloli authored
- Add functions to fix LaTeX left and right commands - Implement brace matching and repair in LaTeX formulas - Remove unnecessary whitespace and repair LaTeX code - Replace specific LaTeX commands with appropriate alternatives - Add logging for debugging purposes
-
myhloli authored
- Add functions to fix LaTeX left and right commands - Implement brace matching and repair in LaTeX formulas - Remove unnecessary whitespace and repair LaTeX code - Replace specific LaTeX commands with appropriate alternatives - Add logging for debugging purposes
-
- 24 Apr, 2025 1 commit
-
-
myhloli authored
- Preserve "\ " sequences during whitespace removal - Add temporary substitution to prevent incorrect processing of "\ " sequences - Restore "\ " sequences after removing unnecessary whitespace
-
- 23 Apr, 2025 3 commits
-
-
myhloli authored
-
myhloli authored
- Replace get_device() function call with direct 'device' variable usage - Simplify device configuration in OCR model initialization
-
myhloli authored
- Add new Chinese OCR model (ch_PP-OCRv4_rec_server_doc_infer) for server-side use - Update language support in app.py to include new Chinese model - Modify models_config.yml to add new model configuration
-
- 22 Apr, 2025 3 commits
-
-
myhloli authored
-
myhloli authored
- Automatically change to ch_lite model when using CPU for Chinese OCR - This modification improves performance on CPU devices
-
myhloli authored
- Remove OCR engine instantiation inside the loop - Pass language directly to the table model instead of OCR engine - Simplify code structure and improve readability
-
- 21 Apr, 2025 2 commits
- 17 Apr, 2025 2 commits
- 16 Apr, 2025 3 commits
-
-
myhloli authored
- Temporarily disable Chinese font check for Windows systems - This change allows bypassing the font check when the required fonts are not present
-
myhloli authored
-
myhloli authored
- Modify `ocr_detect_all_bboxes.py` to return footnote blocks - Update `pdf_parse_union_core_v2.py` to handle footnote blocks in line sorting and layout splitting - This change improves the accuracy of layout analysis by considering footnote blocks separately
-
- 15 Apr, 2025 2 commits
- 14 Apr, 2025 5 commits
-
-
Doge2077 authored
-
myhloli authored
-
Doge2077 authored
-
myhloli authored
- Update the range used to generate images_with_extra_info to match the number of images - This fixes a potential IndexError when the number of images differs from the dataset length
-
myhloli authored
- Change footnote detection threshold from 50% of page height to 30% - Improve accuracy of footnote identification in PDF processing
-
- 12 Apr, 2025 2 commits
- 11 Apr, 2025 2 commits
-
-
myhloli authored
- Remove unnecessary line breaks and adjust indentation - Update function call to use named arguments for better readability - Modify _do_parse function call to use MakeMode.MM_MD instead of
-
myhloli authored
- Update batch processing logic for improved efficiency - Refactor image analysis and inference methods - Optimize dataset handling and image retrieval - Improve error handling and logging in batch processes
-
- 10 Apr, 2025 1 commit
-
-
icecraft authored
-
- 09 Apr, 2025 7 commits
-
-
myhloli authored
- Comment out the line that updates det_count in batch_analyze.py - Add a new OCR model configuration for Chinese (ch_lite) in models_config.yml- Update the Chinese OCR model configuration to use a different recognition model
-
myhloli authored
- Change `bits` to `self._data_bits` for language detection - This fixes the TypeError when opening PDF files
-
myhloli authored
- Simplify aspect ratio calculation using direct coordinate subtraction - Remove unnecessary list append operation - Improve code readability and performance in table rotation detection
-
myhloli authored
- Implement table orientation detection to identify if a table is in portrait mode - Add rotation logic to turn portrait tables 90 degrees clockwise before OCR - Update OCR processing to work with potentially rotated images - Improve text box analysis to determine if a table is rotated
-
myhloli authored
- Update predict_rec.py to check for NaN values in recognition results - Replace NaN scores with 0.0 to ensure stability and consistency
-
myhloli authored
- Add functions to calculate IoU, check if tables are inside each other, and merge tables - Implement table merging for high IoU tables - Add filtering to remove nested tables that don't overlap but cover a large area - Update table_res_list and layout_res to reflect these changes
-
icecraft authored
-