1. 05 Jun, 2025 1 commit
  2. 04 Jun, 2025 2 commits
  3. 29 May, 2025 1 commit
  4. 28 May, 2025 1 commit
  5. 24 May, 2025 2 commits
  6. 22 Apr, 2025 1 commit
  7. 11 Apr, 2025 1 commit
    • myhloli's avatar
      refactor(model): optimize batch processing and inference · d2fc9dab
      myhloli authored
      - Update batch processing logic for improved efficiency
      - Refactor image analysis and inference methods
      - Optimize dataset handling and image retrieval
      - Improve error handling and logging in batch processes
      d2fc9dab
  8. 09 Apr, 2025 2 commits
    • myhloli's avatar
      refactor(ocr): comment out det_count update and update OCR models · f8323ae0
      myhloli authored
      - Comment out the line that updates det_count in batch_analyze.py
      - Add a new OCR model configuration for Chinese (ch_lite) in models_config.yml- Update the Chinese OCR model configuration to use a different recognition model
      f8323ae0
    • myhloli's avatar
      feat(model): improve table recognition by merging and filtering tables · df7ae404
      myhloli authored
      - Add functions to calculate IoU, check if tables are inside each other, and merge tables
      - Implement table merging for high IoU tables
      - Add filtering to remove nested tables that don't overlap but cover a large area
      - Update table_res_list and layout_res to reflect these changes
      df7ae404
  9. 08 Apr, 2025 1 commit
  10. 03 Apr, 2025 3 commits
    • myhloli's avatar
      refactor(magic_pdf): optimize table recognition and layout detection · 1fd72f5f
      myhloli authored
      - Update table recognition logic to process each table individually
      - Refactor layout detection to use tqdm for progress tracking
      - Optimize OCR recognition by using a single tqdm wrapper
      - Improve MFR prediction with a more accurate progress bar
      - Simplify MFD prediction by removing unnecessary total calculation
      1fd72f5f
    • myhloli's avatar
      refactor(magic_pdf): optimize code and improve logging · 553f250f
      myhloli authored
      - Remove unused imports and comments
      - Increase MIN_BATCH_INFERENCE_SIZE from 100 to 200
      - Comment out VRAM cleaning and logging in batch_analyze.py
      - Simplify code in doc_analyze_by_custom_model.py- Add tqdm progress bar in pdf_parse_union_core_v2.py
      - Enable tqdm in OCR processing
      553f250f
    • myhloli's avatar
      feat(model): add tqdm progress bar to model prediction loops · 8e1c2339
      myhloli authored
      - Add tqdm progress bar to batch prediction loops in multiple model modules
      - Improve logging and error handling in batch analysis script
      - Update table model initialization to use default sub-model if none specified
      - Add tqdm dependency to requirements.txt
      8e1c2339
  11. 31 Mar, 2025 3 commits
    • myhloli's avatar
      refactor(model): integrate AtomModelSingleton for OCR and improve OCR result handling · 59d6b195
      myhloli authored
      - Replace direct OCR model access with AtomModelSingleton for better model management
      - Round OCR scores to 2 decimal places for consistency
      - Improve error handling and logging in batch analysis
      - Simplify OCR result processing in pdf_parse_union_core_v2.py
      59d6b195
    • myhloli's avatar
      feat(ocr): implement language-specific OCR processing · d7d85a28
      myhloli authored
      - Add support for multiple languages in OCR processing
      - Create separate lists for each language to improve processing efficiency
      - Update OCR model initialization to use PytorchPaddleOCR instead of ModifiedPaddleOCR
      - Modify get_ocr_result_list function to include language information- Improve logging for OCR detection and recognition
      d7d85a28
    • myhloli's avatar
      feat(ocr): implement separate detection and recognition processes · a330651d
      myhloli authored
      - Split OCR process into detection and recognition stages
      - Update batch analysis and document analysis pipelines
      - Modify OCR result formatting and handling
      - Remove unused imports and optimize code structure
      a330651d
  12. 26 Mar, 2025 1 commit
  13. 07 Mar, 2025 1 commit
    • myhloli's avatar
      refactor(magic_pdf): replace PIL with NumPy for image processing · 1b34f7e4
      myhloli authored
      - Remove PIL usage across multiple files
      - Convert image processing functions to use NumPy arrays
      - Update crop_img function to work with NumPy arrays
      - Modify image loading and resizing to use NumPy and OpenCV
      - Clean up unused imports and comments related to PIL
      1b34f7e4
  14. 21 Jan, 2025 1 commit
  15. 16 Jan, 2025 1 commit
  16. 15 Jan, 2025 1 commit
  17. 14 Jan, 2025 1 commit
  18. 26 Dec, 2024 1 commit
    • myhloli's avatar
      refactor(device): optimize memory cleaning and device selection · 50f48417
      myhloli authored
      - Update clean_memory function to support both CUDA and NPU devices
      - Implement get_device function to centralize device selection logic
      - Modify model initialization and memory cleaning to use the selected device
      - Update RapidTableModel to support both RapidOCR and PaddleOCR engines
      50f48417
  19. 18 Dec, 2024 1 commit
  20. 13 Dec, 2024 2 commits
  21. 12 Dec, 2024 2 commits