- 17 Jan, 2025 4 commits
-
-
myhloli authored
- Added instructions for checking the reasonability of heading levels - Included guidelines for making fine adjustments based on context and logic - Emphasized the importance of aligning the final result with the document's actual structure
-
myhloli authored
- Commented out the original batch ratio calculation - Set a fixed batch ratio of 2 for GPUs with less than 8 GB memory - Increased batch ratio to 4 for GPUs with 8 GB or more memory
-
myhloli authored
- Update the version check in download_models.py and download_models_hf.py - Change the threshold from '1.1.0' to '1.1.1' for model configuration updates
-
myhloli authored
- Import get_device function from magic_pdf.libs.config_reader- Update RapidTableModel initialization to include device parameter for Unitable model
-
- 16 Jan, 2025 9 commits
-
-
myhloli authored
- Update WeChat group link in both README.md and README_zh-CN.md
-
myhloli authored
- Modify the batch analyze process to handle the rapid table model's output - Add logic_points variable to capture additional output from rapid table prediction
-
myhloli authored
- Update rapid-table version from ==0.3.0 to >=1.0.3,<2.0.0 in multiple requirements files - This change affects Ascend NPU, China, and Global Docker configurations
-
myhloli authored
- Update RapidTable dependency to version 1.0.3 - Add support for sub-models in RapidTable - Update magic-pdf configuration to include table sub-model - Modify table model initialization to support sub-models - Update table prediction logic to handle new output format
-
Xiaomeng Zhao authored
-
myhloli authored
- Update OpenDataLab badge to new design
-
myhloli authored
- Update OpenDataLab badge to new design
-
Xiaomeng Zhao authored
fix(magic_pdf): correct end page index and improve error handling
-
myhloli authored
- Adjust end_page_id calculation to prevent IndexError when accessing pages - Enhance error handling in LLM post-processing by specifically catching JSONDecodeError
-
- 15 Jan, 2025 14 commits
-
-
Xiaomeng Zhao authored
refactor(magic_pdf): improve title block merging logic
-
myhloli authored
- Rename and update merge_title_blocks function - Implement merge_two_bbox helper function - Refactor merging logic to preserve original block structure- Update function calls and integrate with existing pipeline
-
Xiaomeng Zhao authored
feat(model): improve batch analysis logic and support npu
-
myhloli authored
- Add support for NPU (Neural Processing Unit) when available - Implement batch analysis for GPU and NPU devices - Optimize memory usage and improve performance - Update logging and error handling
-
Xiaomeng Zhao authored
build(docker): update doclayout-yolo dependency
-
myhloli authored
- Remove doclayout_yolo==0.0.2b1 and doclayout-yolo==0.0.2 - Add doclayout-yolo==0.0.2b1 to all requirements files
-
Xiaomeng Zhao authored
update logo
-
myhloli authored
-
Xiaomeng Zhao authored
fix(language): remove invalid UTF-16 surrogate pairs from input text
-
myhloli authored
- Add `remove_invalid_surrogates` function to filter out invalid UTF-16 surrogate pairs - Integrate the new function into the `detect_lang` workflow - Include a test case with UTF-16 surrogates to verify the fix
-
Xiaomeng Zhao authored
docs(magic_pdf): update llm_aided.py prompt for title list optimization
-
myhloli authored
- Clarify the expected format for the optimized title list JSON output- Emphasize the need to return only the title levels in the specified format
-
Xiaomeng Zhao authored
refactor(pre_proc): adjust IOU threshold for character overlap detection
-
myhloli authored
- Modified the IOU threshold in ocr_span_list_modify.py from 0.9 to 0.35 - This change aims to improve the detection of overlapping characters in OCR processed PDFs
-
- 14 Jan, 2025 11 commits
-
-
Xiaomeng Zhao authored
feat(post_proc): enhance title block processing with average line height
-
myhloli authored
- Add average line height calculation for title blocks - Include page number in title dictionary - Improve title optimization prompt for better hierarchy- Implement retry mechanism for JSON decoding errors - Add error logging for title count mismatch
-
Xiaomeng Zhao authored
feat(layout): improve title block handling and layout detection
-
myhloli authored
-
myhloli authored
- Merge title blocks that are close to each other horizontally - Adjust line insertion logic for title blocks- Increase image size and decrease confidence threshold for layout detection - Update DocLayoutYOLO model weights - Refactor drawing of bounding boxes for different block types
-
Xiaomeng Zhao authored
build(deps): add upper version limit for PyMuPDF
-
myhloli authored
- Set PyMuPDF version to <= 1.24.14 in all requirements files - Prevent potential compatibility issues with future versions
-
Xiaomeng Zhao authored
Update pdf_parse_union_core_v2.py
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
docs/replace log
-
xu rui authored
-
- 13 Jan, 2025 2 commits
-
-
Xiaomeng Zhao authored
-
icecraft authored
-