Commits · b8aab26933e1e2bfc0f931ae8e202de0c4eaa336 · wangsen / MinerU

06 Dec, 2024 22 commits
- Merge pull request #1211 from myhloli/dev · b8aab269
  Xiaomeng Zhao authored Dec 06, 2024
```
refactor(magic_pdf): remove unused threading lock and model initialization code
```
  b8aab269
- refactor(magic_pdf): remove unused threading lock and model initialization code · a1744b77
  myhloli authored Dec 06, 2024
```
- Remove threading.Lock import and usage
- Delete unused model initialization comments and code- Simplify OCR model initialization in both pdf_extract_kit.py and pdf_parse_union_core_v2.py
```
  a1744b77
- Merge pull request #1209 from dt-yy/dev · ebfd6fd9
  Xiaomeng Zhao authored Dec 06, 2024
```
feat: update test case
```
  ebfd6fd9
- feat: update test case · 1d6000e5
  dt-yy authored Dec 06, 2024
  
  1d6000e5
- Merge pull request #1208 from myhloli/dev · 92c10d1e
  Xiaomeng Zhao authored Dec 06, 2024
```
fix(multi-threading ):Enable multi-threading support for PaddleOCR.
```
  92c10d1e
- refactor(magic_pdf): replace AtomModelSingleton with ocr_model_init for OCR model instantiation · 30220233
  myhloli authored Dec 06, 2024
```
- Remove usage of AtomModelSingleton for OCR model initialization- Use ocr_model_init function for creating OCR model instance
- Update import statement to include ocr_model_init- Comment out old OCR model initialization code
```
  30220233
- refactor(model): replace AtomModelSingleton with ocr_model_init for OCR model initialization · 488660dd
  myhloli authored Dec 06, 2024
```
- Remove usage of AtomModelSingleton for OCR model initialization
- Add import of ocr_model_init from model_init module
- Update OCR model initialization process to use ocr_model_init function
- Remove lock for OCR processing as it's no longer needed
```
  488660dd
- refactor(model): replace ModelSingleton with direct model initialization and improve threading · 6f636b6e
  myhloli authored Dec 06, 2024
```
- Remove usage of ModelSingleton class
- Initialize model directly using custom_model_init function
- Add self._lock attribute to PDFExtractKit class for thread safety- Replace local lock with self._lock for OCR processing
```
  6f636b6e
- Merge pull request #1207 from myhloli/dev · 272014c4
  Xiaomeng Zhao authored Dec 06, 2024
```
fix(model): simplify model initialization logic
```
  272014c4
- fix(model): simplify model initialization logic · a9723c61
  myhloli authored Dec 06, 2024
  
  a9723c61
- Merge pull request #1201 from dt-yy/dev · dab07986
  Xiaomeng Zhao authored Dec 06, 2024
```
fix: update notify
```
  dab07986
- update runner env · cf09313b
  dt-yy authored Dec 06, 2024
  
  cf09313b
- update runner env · fc6ea7a3
  dt-yy authored Dec 06, 2024
  
  fc6ea7a3
- update runner env · 8327d9d3
  dt-yy authored Dec 06, 2024
  
  8327d9d3
- update runner env · e6748482
  dt-yy authored Dec 06, 2024
  
  e6748482
- update notify · c77bec7c
  dt-yy authored Dec 06, 2024
  
  c77bec7c
- update yml · 78e84f67
  dt-yy authored Dec 06, 2024
  
  78e84f67
- update yml · eb021e53
  dt-yy authored Dec 06, 2024
  
  eb021e53
- fix: update notify · 494859c5
  dt-yy authored Dec 06, 2024
  
  494859c5
- refactor(magic_pdf): optimize model initialization and threading · 878f3de0
  赵小蒙 authored Dec 06, 2024
```
- Remove unnecessary threading.Lock in AtomModelSingleton
- Add threading.Lock to CustomPEKModel for OCR processing
- Simplify model initialization logic in AtomModelSingleton
```
  878f3de0
- Merge pull request #1198 from myhloli/dev · 7ca7e599
  Xiaomeng Zhao authored Dec 06, 2024
```
perf(model): optimize model initialization
```
  7ca7e599
- perf(model): optimize model initialization · ce592f8b
  myhloli authored Dec 06, 2024
```
- Add condition to return existing model if already initialized
- Improve efficiency by avoiding redundant model creation
```
  ce592f8b
05 Dec, 2024 2 commits

Merge pull request #1193 from myhloli/dev · 92ad41ce
Xiaomeng Zhao authored Dec 05, 2024
```
perf(model): add threading lock for OCR model initialization
```
92ad41ce

perf(model): add threading lock for OCR model initialization · 04478095

myhloli authored Dec 05, 2024

- Introduce a lock to synchronize access to OCR model initialization- This change improves thread safety when multiple threads access the OCR model concurrently
- The lock ensures that the OCR model is initialized only once, even in multi-threaded scenarios

04478095

03 Dec, 2024 3 commits

Merge pull request #1177 from myhloli/dev · 41b9cbcd
Xiaomeng Zhao authored Dec 03, 2024
```
feat(gradio_app): implement dynamic concurrency limit based on VRAM
```
41b9cbcd

fix(vram): improve VRAM checking logic · 104273cc

myhloli authored Dec 03, 2024

- Update VRAM checking logic in app.py and model_utils.py
- Add None and type checks for VRAM values
- Adjust concurrency limit calculation in app.py
- Modify clean_vram function to handle cases with no VRAM information

104273cc

feat(gradio_app): implement dynamic concurrency limit based on VRAM · b1fe9d4f

myhloli authored Dec 03, 2024

- Add get_concurrency_limit function to calculate concurrency limit based on VRAM
- Update clean_vram function and rename to get_vram for better clarity
- Apply concurrency limit to the to_markdown function in the Gradio app

b1fe9d4f

02 Dec, 2024 8 commits
- Merge pull request #1170 from opendatalab/master · fdf47155
  Xiaomeng Zhao authored Dec 02, 2024
```
master->dev
```
  fdf47155
- Update version.py with new version · b9f3435c
  myhloli authored Dec 02, 2024
  
  b9f3435c
- Merge pull request #1165 from opendatalab/release-0.10.5 · c175001d
  Xiaomeng Zhao authored Dec 02, 2024
```
Release 0.10.5
```
  c175001d
- Merge pull request #1167 from opendatalab/dev · a35785b9
  Xiaomeng Zhao authored Dec 02, 2024
```
Dev -> 0.10.5
```
  a35785b9
- Merge pull request #1166 from myhloli/dev · a7296f78
  Xiaomeng Zhao authored Dec 02, 2024
```
fix(pre_proc): prevent errors when imageWriter is None
```
  a7296f78
- Merge pull request #1164 from myhloli/dev · ed822634
  Xiaomeng Zhao authored Dec 02, 2024
```
refactor(para): adjust line height multiplier for block splitting,fix(pre_proc): prevent errors when imageWriter is None
```
  ed822634
- fix: reduce maximum image size · b0529b6f
  myhloli authored Dec 02, 2024
```
- Decrease the maximum width and height from 9000 to 4500 pixels
- This change aims to prevent excessive resource usage when rendering PDFs
```
  b0529b6f
- fix(pre_proc): prevent errors when imageWriter is None · 7f8dc353
  myhloli authored Dec 02, 2024
```
- Updated cut_image.py to check for NoneType imageWriter
- Prevents AttributeError when imageWriter is not provided
```
  7f8dc353
30 Nov, 2024 4 commits
- Merge pull request #1156 from myhloli/dev · 384e0379
  Xiaomeng Zhao authored Dec 01, 2024
```
refactor(para): adjust line height multiplier for block splitting
```
  384e0379
- refactor(para): adjust line height multiplier for block splitting · 41545a13
  myhloli authored Dec 01, 2024
```
- Decrease the line height multiplier from 0.8 to 0.7 for both left and right sides
- This modification aims to improve the accuracy of paragraph splitting
```
  41545a13
- Merge pull request #1154 from LollipopsAndWine/dev · b17084fe
  Xiaomeng Zhao authored Nov 30, 2024
  
  b17084fe
- fix: 修复文件名错误 · f11f3d60
  houlinfeng authored Nov 30, 2024
  
  f11f3d60
29 Nov, 2024 1 commit
- Update version.py with new version · f8828be7
  myhloli authored Nov 29, 2024
  
  f8828be7