- 16 Jan, 2025 6 commits
-
-
myhloli authored
- Update RapidTable dependency to version 1.0.3 - Add support for sub-models in RapidTable - Update magic-pdf configuration to include table sub-model - Modify table model initialization to support sub-models - Update table prediction logic to handle new output format
-
Xiaomeng Zhao authored
-
myhloli authored
- Update OpenDataLab badge to new design
-
myhloli authored
- Update OpenDataLab badge to new design
-
Xiaomeng Zhao authored
fix(magic_pdf): correct end page index and improve error handling
-
myhloli authored
- Adjust end_page_id calculation to prevent IndexError when accessing pages - Enhance error handling in LLM post-processing by specifically catching JSONDecodeError
-
- 15 Jan, 2025 14 commits
-
-
Xiaomeng Zhao authored
refactor(magic_pdf): improve title block merging logic
-
myhloli authored
- Rename and update merge_title_blocks function - Implement merge_two_bbox helper function - Refactor merging logic to preserve original block structure- Update function calls and integrate with existing pipeline
-
Xiaomeng Zhao authored
feat(model): improve batch analysis logic and support npu
-
myhloli authored
- Add support for NPU (Neural Processing Unit) when available - Implement batch analysis for GPU and NPU devices - Optimize memory usage and improve performance - Update logging and error handling
-
Xiaomeng Zhao authored
build(docker): update doclayout-yolo dependency
-
myhloli authored
- Remove doclayout_yolo==0.0.2b1 and doclayout-yolo==0.0.2 - Add doclayout-yolo==0.0.2b1 to all requirements files
-
Xiaomeng Zhao authored
update logo
-
myhloli authored
-
Xiaomeng Zhao authored
fix(language): remove invalid UTF-16 surrogate pairs from input text
-
myhloli authored
- Add `remove_invalid_surrogates` function to filter out invalid UTF-16 surrogate pairs - Integrate the new function into the `detect_lang` workflow - Include a test case with UTF-16 surrogates to verify the fix
-
Xiaomeng Zhao authored
docs(magic_pdf): update llm_aided.py prompt for title list optimization
-
myhloli authored
- Clarify the expected format for the optimized title list JSON output- Emphasize the need to return only the title levels in the specified format
-
Xiaomeng Zhao authored
refactor(pre_proc): adjust IOU threshold for character overlap detection
-
myhloli authored
- Modified the IOU threshold in ocr_span_list_modify.py from 0.9 to 0.35 - This change aims to improve the detection of overlapping characters in OCR processed PDFs
-
- 14 Jan, 2025 11 commits
-
-
Xiaomeng Zhao authored
feat(post_proc): enhance title block processing with average line height
-
myhloli authored
- Add average line height calculation for title blocks - Include page number in title dictionary - Improve title optimization prompt for better hierarchy- Implement retry mechanism for JSON decoding errors - Add error logging for title count mismatch
-
Xiaomeng Zhao authored
feat(layout): improve title block handling and layout detection
-
myhloli authored
-
myhloli authored
- Merge title blocks that are close to each other horizontally - Adjust line insertion logic for title blocks- Increase image size and decrease confidence threshold for layout detection - Update DocLayoutYOLO model weights - Refactor drawing of bounding boxes for different block types
-
Xiaomeng Zhao authored
build(deps): add upper version limit for PyMuPDF
-
myhloli authored
- Set PyMuPDF version to <= 1.24.14 in all requirements files - Prevent potential compatibility issues with future versions
-
Xiaomeng Zhao authored
Update pdf_parse_union_core_v2.py
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
docs/replace log
-
xu rui authored
-
- 13 Jan, 2025 5 commits
-
-
Xiaomeng Zhao authored
-
icecraft authored
-
Xiaomeng Zhao authored
-
Hui Kang authored
-
Xiaomeng Zhao authored
-
- 12 Jan, 2025 1 commit
-
-
Hui authored
-
- 11 Jan, 2025 3 commits
-
-
Xiaomeng Zhao authored
docs(faq): add troubleshooting guide for old GPUs encountering CUDA errors
-
myhloli authored
Added a new section in both English and Chinese FAQs addressing the issue where old GPUs like M40 encounter a RuntimeError due to unsupported BF16 precision. The guide includes steps to manually disable BF16 precision by modifying the relevant code in "pdf_parse_union_core_v2.py".
-
Xiaomeng Zhao authored
fix: update resource URLs to jsdelivr
-