- 13 Dec, 2024 1 commit
-
-
myhloli authored
- Move ligature replacement function to pdf_parse_union_core_v2.py - Optimize ligature replacement using a more efficient approach - Modify text extraction flags to preserve ligatures in PDF content - Remove unnecessary function from ocr_mkcontent.py
-
- 12 Dec, 2024 1 commit
-
-
myhloli authored
- Add initial setup for layout detection - Implement conditional cropping for tall images - Skip cropping for wide images to improve performance - Reuse Image object across layout detection steps
-
- 11 Dec, 2024 32 commits
-
-
Xiaomeng Zhao authored
-
xu rui authored
-
Xiaomeng Zhao authored
Docs/refactor en docs
-
Xiaomeng Zhao authored
master->dev
-
myhloli authored
-
Xiaomeng Zhao authored
Release 0.10.6
-
Xiaomeng Zhao authored
Dev->release
-
Xiaomeng Zhao authored
build(docker): add torch and torchvision dependencies
-
myhloli authored
- Add torch>=2.2.2,<=2.3.1 to requirements-docker.txt- Add torchvision>=0.17.2,<=0.18.1 to requirements-docker.txt
-
Xiaomeng Zhao authored
Release 0.10.6
-
Xiaomeng Zhao authored
refactor(draw_bbox): remove redundant '_line_sort' suffix from output filename
-
Xiaomeng Zhao authored
refactor(draw_bbox): remove redundant '_line_sort' suffix from output filename
-
myhloli authored
- Updated the filename generation logic in the draw_bbox function - Removed the unnecessary '_line_sort' suffix from the output PDF filename
-
myhloli authored
- Remove unused import of ocr_model_init from magic_pdf.model.sub_modules.model_init - Keep existing functionality and structure intact
-
Xiaomeng Zhao authored
fix: dup classify pdf type & improve layout detection for DocLayout_YOLO model
-
Xiaomeng Zhao authored
feat(layout): improve layout detection for DocLayout_YOLO model
-
myhloli authored
- Implement image cropping and pasting technique to enhance layout detection - Adjust detected polygons to original image coordinates - Add comments for better code readability
-
xu rui authored
-
icecraft authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
Xiaomeng Zhao authored
fix: dup classify pdf type
-
icecraft authored
-
Xiaomeng Zhao authored
build(deps): update torch and torchvision version requirements
-
Xiaomeng Zhao authored
build(deps): update torch and torchvision version requirements
-
myhloli authored
- Specify torch==2.3.1 and torchvision==0.18.1 for Windows CUDA installation - Add torch and torchvision version constraints in setup.py: - torch>=2.2.2,<=2.3.1 - torchvision>=0.17.2,<=0.18.1 - Update installation instructions in both English and Chinese README files
-
- 10 Dec, 2024 6 commits
-
-
Xiaomeng Zhao authored
fix(detect_invalid_chars):fix the stack error caused by multiple memory releases in PyMuPDF
-
myhloli authored
- Uncomment pdfminer.six in requirements.txt - Specify version 20231228 for pdfminer.six
-
myhloli authored
- Change import paths from paddleocr.ppocr to ppocr for utility functions - Update import paths for logging and utility modules in ppocr_273_mod.py- Modify import paths for tablemaster_paddle.py to use ppstructure instead of paddleocr.ppstructure
-
myhloli authored
- Replace MuPDF with pdfminer for detecting invalid characters in PDFs - Uncomment and update the detect_invalid_chars function to use pdfminer - Update the check_invalid_chars function in pdf_meta_scan.py to use the new implementation
-
myhloli authored
- Change import path for TableSystem from 'ppstructure.table.predict_table' to 'paddleocr.ppstructure.table.predict_table' - Change import path for init_args from 'ppstructure.utility' to 'paddleocr.ppstructure.utility'
-
myhloli authored
- Modify import paths for paddleocr utilities in ocr_utils.py and ppocr_273_mod.py - Change from `ppocr.utils.utility` to `paddleocr.ppocr.utils.utility` - Update related import statements in two files to reflect the new path
-