- 11 Dec, 2024 12 commits
-
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
xu rui authored
-
Xiaomeng Zhao authored
fix: dup classify pdf type
-
icecraft authored
-
Xiaomeng Zhao authored
build(deps): update torch and torchvision version requirements
-
myhloli authored
- Specify torch==2.3.1 and torchvision==0.18.1 for Windows CUDA installation - Add torch and torchvision version constraints in setup.py: - torch>=2.2.2,<=2.3.1 - torchvision>=0.17.2,<=0.18.1 - Update installation instructions in both English and Chinese README files
-
- 10 Dec, 2024 9 commits
-
-
Xiaomeng Zhao authored
fix(detect_invalid_chars):fix the stack error caused by multiple memory releases in PyMuPDF
-
myhloli authored
- Uncomment pdfminer.six in requirements.txt - Specify version 20231228 for pdfminer.six
-
myhloli authored
- Change import paths from paddleocr.ppocr to ppocr for utility functions - Update import paths for logging and utility modules in ppocr_273_mod.py- Modify import paths for tablemaster_paddle.py to use ppstructure instead of paddleocr.ppstructure
-
myhloli authored
- Replace MuPDF with pdfminer for detecting invalid characters in PDFs - Uncomment and update the detect_invalid_chars function to use pdfminer - Update the check_invalid_chars function in pdf_meta_scan.py to use the new implementation
-
myhloli authored
- Change import path for TableSystem from 'ppstructure.table.predict_table' to 'paddleocr.ppstructure.table.predict_table' - Change import path for init_args from 'ppstructure.utility' to 'paddleocr.ppstructure.utility'
-
myhloli authored
- Modify import paths for paddleocr utilities in ocr_utils.py and ppocr_273_mod.py - Change from `ppocr.utils.utility` to `paddleocr.ppocr.utils.utility` - Update related import statements in two files to reflect the new path
-
myhloli authored
- Remove commented-out call to clean_memory() function - This change simplifies the code by eliminating an unused code snippet
-
myhloli authored
- Import paddle module and disable its signal handler to prevent interference with other components - This change addresses potential conflicts between PaddlePaddle and other libraries or system signals
-
myhloli authored
- Remove the call to clean_memory() function from pdf_parse_union_core_v2.py - This change may affect memory usage and needs to be tested to ensure proper functionality
-
- 09 Dec, 2024 10 commits
-
-
Xiaomeng Zhao authored
docs(windows): update CUDA installation guide
-
myhloli authored
- Remove specific version requirements for torch and torchvision - Simplify installation command in both English and Chinese guides - Delete important note about version compatibility
-
Xiaomeng Zhao authored
refactor(magic_pdf): optimize environment setup and dependencies
-
myhloli authored
- Add environment variables to disable albumentations and yolo updates - Import torchtext and disable deprecation warnings - Update unimernet to 0.2.2 - Specify ultralytics version as >=8.3.48 - Remove upper version limit for torch
-
Xiaomeng Zhao authored
build(deps): update dependency versions
-
myhloli authored
- Update ultralytics to >=8.3.47
-
Xiaomeng Zhao authored
fix: unicode decode error
-
icecraft authored
-
Xiaomeng Zhao authored
fix: add parse_pdf_type and version
-
icecraft authored
-
- 07 Dec, 2024 4 commits
-
-
Xiaomeng Zhao authored
-
sawmice authored
-
Xiaomeng Zhao authored
fix(dict2md): add space for inline equations in CJK contexts
-
myhloli authored
- In Chinese, Japanese, and Korean (CJK) languages, no space is needed for line breaks within paragraphs. - However, if an inline equation is at the end of a line, a space should be added to separate it from the following text. - This change improves the formatting of documents containing both CJK text and inline equations.
-
- 06 Dec, 2024 5 commits
-
-
Xiaomeng Zhao authored
Refactor/add user api
-
Xiaomeng Zhao authored
refactor(magic-pdf): optimize model initialization and concurrency control
-
myhloli authored
- Remove concurrency limit logic from app.py - Update model initialization process in various modules - Remove unused VRAM check for concurrency limit - Refactor OCR model initialization in pdf_extract_kit.py - Update txt_spans_extract_v2 function to use lang parameter instead of ocr_model
-
Xiaomeng Zhao authored
refactor(ocr): replace AtomModelSingleton with ocr_model_init for OCR model instantiation
-
myhloli authored
- Remove usage of AtomModelSingleton for OCR model creation - Add ocr_model_init function to initialize OCR model - Update OCR model initialization in pdf_extract_kit.py and pdf_parse_union_core_v2.py - Modify txt_spans_extract_v2 function to accept ocr_model as a parameter - Update parse_page_core function to use ocr_model instead of lang for OCR processing
-