Commits · 113448903aac099e42522dc4e2e2d80d0f0dc45c · wangsen / MinerU

".github/vscode:/vscode.git/clone" did not exist on "e1aaa79ac9954c705f839e8304d29eac452ce04b"

09 Dec, 2024 3 commits
- fix: unicode decode error · 11344890
  icecraft authored Dec 09, 2024
  
  11344890
- Merge pull request #1228 from icecraft/fix/pipe_result · c5a4150e
  Xiaomeng Zhao authored Dec 09, 2024
```
fix: add parse_pdf_type and version
```
  c5a4150e
- fix: add parse_pdf_type and version · 57f9f9dc
  icecraft authored Dec 09, 2024
  
  57f9f9dc
07 Dec, 2024 4 commits
- Merge pull request #1224 from icecraft/fix/new_api · 8f266869
  Xiaomeng Zhao authored Dec 07, 2024
  
  8f266869
- fix: 1. ocr txt mode error 2. lose pdf_parse_type field · 87af738a
  sawmice authored Dec 07, 2024
  
  87af738a
- Merge pull request #1222 from myhloli/dev · f58a7a7d
  Xiaomeng Zhao authored Dec 07, 2024
```
fix(dict2md): add space for inline equations in CJK contexts
```
  f58a7a7d
- fix(dict2md): add space for inline equations in CJK contexts · 74ee428b
  myhloli authored Dec 07, 2024
```
- In Chinese, Japanese, and Korean (CJK) languages, no space is needed for line breaks within paragraphs.
- However, if an inline equation is at the end of a line, a space should be added to separate it from the following text.
- This change improves the formatting of documents containing both CJK text and inline equations.
```
  74ee428b
06 Dec, 2024 31 commits
- Merge pull request #1178 from icecraft/refactor/add_user_api · fa113b57
  Xiaomeng Zhao authored Dec 06, 2024
```
Refactor/add user api
```
  fa113b57
- Merge pull request #1218 from myhloli/dev · 1c10dc55
  Xiaomeng Zhao authored Dec 06, 2024
```
refactor(magic-pdf): optimize model initialization and concurrency control
```
  1c10dc55
- refactor(magic-pdf): optimize model initialization and concurrency control · 012a46e0
  myhloli authored Dec 06, 2024
```
- Remove concurrency limit logic from app.py
- Update model initialization process in various modules
- Remove unused VRAM check for concurrency limit
- Refactor OCR model initialization in pdf_extract_kit.py
- Update txt_spans_extract_v2 function to use lang parameter instead of ocr_model
```
  012a46e0
- Merge pull request #1215 from myhloli/dev · ef5cffcb
  Xiaomeng Zhao authored Dec 06, 2024
```
refactor(ocr): replace AtomModelSingleton with ocr_model_init for OCR model instantiation
```
  ef5cffcb
- refactor(ocr): replace AtomModelSingleton with ocr_model_init for OCR model instantiation · 47a83d28
  myhloli authored Dec 06, 2024
```
- Remove usage of AtomModelSingleton for OCR model creation
- Add ocr_model_init function to initialize OCR model
- Update OCR model initialization in pdf_extract_kit.py and pdf_parse_union_core_v2.py
- Modify txt_spans_extract_v2 function to accept ocr_model as a parameter
- Update parse_page_core function to use ocr_model instead of lang for OCR processing
```
  47a83d28
- Merge pull request #1214 from myhloli/dev · 0acfce29
  Xiaomeng Zhao authored Dec 06, 2024
```
refactor(model): implement thread-safe OCR model initialization
```
  0acfce29
- refactor(model): implement thread-safe OCR model initialization · f2a92d57
  myhloli authored Dec 06, 2024
```
- Add threading support for OCR model initialization
- Modify AtomModelSingleton to handle thread-specific instances
- Update PDFExtractKit and PDFParseUnionCoreV2 to use new thread-safe OCR initialization
```
  f2a92d57
- Merge pull request #1212 from myhloli/dev · ec5a09db
  Xiaomeng Zhao authored Dec 06, 2024
```
build(deps): specify minimum version for ultralytics
```
  ec5a09db
- build(deps): specify minimum version for ultralytics · 1f1335c2
  myhloli authored Dec 06, 2024
```
- Update `ultralytics` dependency to version >= 8.3.43
- This change ensures compatibility with yolov8 for formula detection
```
  1f1335c2
- Merge pull request #1211 from myhloli/dev · b8aab269
  Xiaomeng Zhao authored Dec 06, 2024
```
refactor(magic_pdf): remove unused threading lock and model initialization code
```
  b8aab269
- refactor(magic_pdf): remove unused threading lock and model initialization code · a1744b77
  myhloli authored Dec 06, 2024
```
- Remove threading.Lock import and usage
- Delete unused model initialization comments and code- Simplify OCR model initialization in both pdf_extract_kit.py and pdf_parse_union_core_v2.py
```
  a1744b77
- Merge pull request #1209 from dt-yy/dev · ebfd6fd9
  Xiaomeng Zhao authored Dec 06, 2024
```
feat: update test case
```
  ebfd6fd9
- feat: update test case · 1d6000e5
  dt-yy authored Dec 06, 2024
  
  1d6000e5
- Merge pull request #1208 from myhloli/dev · 92c10d1e
  Xiaomeng Zhao authored Dec 06, 2024
```
fix(multi-threading ):Enable multi-threading support for PaddleOCR.
```
  92c10d1e
- refactor(magic_pdf): replace AtomModelSingleton with ocr_model_init for OCR model instantiation · 30220233
  myhloli authored Dec 06, 2024
```
- Remove usage of AtomModelSingleton for OCR model initialization- Use ocr_model_init function for creating OCR model instance
- Update import statement to include ocr_model_init- Comment out old OCR model initialization code
```
  30220233
- refactor(model): replace AtomModelSingleton with ocr_model_init for OCR model initialization · 488660dd
  myhloli authored Dec 06, 2024
```
- Remove usage of AtomModelSingleton for OCR model initialization
- Add import of ocr_model_init from model_init module
- Update OCR model initialization process to use ocr_model_init function
- Remove lock for OCR processing as it's no longer needed
```
  488660dd
- refactor(model): replace ModelSingleton with direct model initialization and improve threading · 6f636b6e
  myhloli authored Dec 06, 2024
```
- Remove usage of ModelSingleton class
- Initialize model directly using custom_model_init function
- Add self._lock attribute to PDFExtractKit class for thread safety- Replace local lock with self._lock for OCR processing
```
  6f636b6e
- Merge pull request #1207 from myhloli/dev · 272014c4
  Xiaomeng Zhao authored Dec 06, 2024
```
fix(model): simplify model initialization logic
```
  272014c4
- fix(model): simplify model initialization logic · a9723c61
  myhloli authored Dec 06, 2024
  
  a9723c61
- Merge pull request #1201 from dt-yy/dev · dab07986
  Xiaomeng Zhao authored Dec 06, 2024
```
fix: update notify
```
  dab07986
- update runner env · cf09313b
  dt-yy authored Dec 06, 2024
  
  cf09313b
- update runner env · fc6ea7a3
  dt-yy authored Dec 06, 2024
  
  fc6ea7a3
- update runner env · 8327d9d3
  dt-yy authored Dec 06, 2024
  
  8327d9d3
- update runner env · e6748482
  dt-yy authored Dec 06, 2024
  
  e6748482
- update notify · c77bec7c
  dt-yy authored Dec 06, 2024
  
  c77bec7c
- update yml · 78e84f67
  dt-yy authored Dec 06, 2024
  
  78e84f67
- update yml · eb021e53
  dt-yy authored Dec 06, 2024
  
  eb021e53
- fix: update notify · 494859c5
  dt-yy authored Dec 06, 2024
  
  494859c5
- refactor(magic_pdf): optimize model initialization and threading · 878f3de0
  赵小蒙 authored Dec 06, 2024
```
- Remove unnecessary threading.Lock in AtomModelSingleton
- Add threading.Lock to CustomPEKModel for OCR processing
- Simplify model initialization logic in AtomModelSingleton
```
  878f3de0
- Merge pull request #1198 from myhloli/dev · 7ca7e599
  Xiaomeng Zhao authored Dec 06, 2024
```
perf(model): optimize model initialization
```
  7ca7e599
- perf(model): optimize model initialization · ce592f8b
  myhloli authored Dec 06, 2024
```
- Add condition to return existing model if already initialized
- Improve efficiency by avoiding redundant model creation
```
  ce592f8b
05 Dec, 2024 2 commits

Merge pull request #1193 from myhloli/dev · 92ad41ce
Xiaomeng Zhao authored Dec 05, 2024
```
perf(model): add threading lock for OCR model initialization
```
92ad41ce

perf(model): add threading lock for OCR model initialization · 04478095

myhloli authored Dec 05, 2024

- Introduce a lock to synchronize access to OCR model initialization- This change improves thread safety when multiple threads access the OCR model concurrently
- The lock ensures that the OCR model is initialized only once, even in multi-threaded scenarios

04478095