- 21 Nov, 2024 5 commits
-
-
myhloli authored
- Update OCR utils to handle different box formats and improve angle calculation - Modify PDF extraction kit to support OCR option and optimize processing flow - Enhance PPOCR model to sort and filter detection boxes, improving text splitting accuracy
-
myhloli authored
- Improve logic to skip dropped spans in overlap detection - Enhance efficiency by avoiding unnecessary comparisons
-
myhloli authored
- fix the bug where hyphens in the middle of a line are being discarded
-
myhloli authored
- Add threshold parameter to merge_spans_to_line function - Make threshold configurable for y-axis overlap check - Improve flexibility and accuracy of line merging algorithm
-
myhloli authored
- Check if language string is empty and set it to None - This prevents potential errors when an empty language string is passed
-
- 18 Nov, 2024 12 commits
-
-
myhloli authored
- Introduce a variable threshold for right margin based on block width - Use 0.26 * block_weight for wider blocks (block_weight_radio >= 0.5) - Use 0.36 * block_weight for narrower blocks- This change aims to improve paragraph splitting accuracy for different block widths
-
myhloli authored
- Add albumentations package with version <=1.4.20 for old_linux - This version is compatible with Linux systems from 2019 and earlier - Version 1.4.21 and above introduced simsimd which is not supported on older Linux systems
-
myhloli authored
- Add page size information to blocks - Calculate block width ratio relative to page width - Adjust threshold for determining right side indentation - Implement additional checks for merging blocks across pages - Improve logic for identifying list structures
-
Xiaomeng Zhao authored
feat(ocr): improve handling of angled text boxes
-
myhloli authored
- Add calculate_is_angle function to detect angled text boxes - Update update_det_boxes and merge_det_boxes functions to handle angled text boxes - Modify angle detection logic in various parts of the code
-
Xiaomeng Zhao authored
refactor(tests): extract common test utilities into test_commons.py
-
myhloli authored
-
Xiaomeng Zhao authored
test(unitest): Restore unit test cases
-
myhloli authored
-
Xiaomeng Zhao authored
update ci
-
quyuan authored
-
quyuan authored
-
- 17 Nov, 2024 4 commits
- 15 Nov, 2024 16 commits
-
-
Xiaomeng Zhao authored
docs: update feature description for table conversion
-
myhloli authored
- Changed the description for table conversion feature in both English and Chinese README files - Specified that tables are automatically converted to HTML format instead of LaTeX or HTML
-
Xiaomeng Zhao authored
docs: improve GPU support list formatting in README_zh-CN.md
-
myhloli authored
- Adjust spacing and line breaks in the GPU support list for better readability
-
Xiaomeng Zhao authored
docs(README): update GPU hardware recommendations and table recognition options
-
myhloli authored
- Simplify GPU recommendations to 8GB or more, removing separate minimum and recommended configurations - Update list of supported GPUs for8GB and above - Update table recognition options in config settings
-
Xiaomeng Zhao authored
-
linfeng authored
-
houlinfeng authored
-
Xiaomeng Zhao authored
docs(README): update project references and translations
-
myhloli authored
-
Xiaomeng Zhao authored
docs:update docs for 0.9.3
-
myhloli authored
-
Xiaomeng Zhao authored
refactor(model): rename and restructure model modules
-
myhloli authored
- Rename ppTableModel to TableMasterPaddleModel in test_tablemaster.py
-
myhloli authored
-
- 14 Nov, 2024 2 commits
-
-
Xiaomeng Zhao authored
fix(parse_pipeline): Resolve post-processing exceptions caused by partial PDFs due to file corruption or non-standard format by forcing a re-print.
-
myhloli authored
fix(parse_pipeline): Resolve post-processing exceptions caused by partial PDFs due to file corruption or non-standard format by forcing a re-print.
-
- 13 Nov, 2024 1 commit
-
-
icecraft authored
Co-authored-by:xu rui <xurui1@pjlab.org.cn>
-