- 11 Jan, 2025 1 commit
-
-
myhloli authored
Added a new section in both English and Chinese FAQs addressing the issue where old GPUs like M40 encounter a RuntimeError due to unsupported BF16 precision. The guide includes steps to manually disable BF16 precision by modifying the relevant code in "pdf_parse_union_core_v2.py".
-
- 22 Nov, 2024 1 commit
-
-
Xiaomeng Zhao authored
-
- 08 Nov, 2024 2 commits
- 05 Nov, 2024 1 commit
-
-
myhloli authored
- Add information about AVX/AVX2 instruction set issues on Linux servers - Provide guidance for users encountering "Illegal instruction (core dumped)" error - Suggest contacting system administrator or changing servers as potential solutions - Include relevant issue references for context
-
- 28 Oct, 2024 1 commit
-
-
myhloli authored
- Update image path in README.md and README_zh-CN.md - Update chemical formula recognition link in README.md and README_zh-CN.md
-
- 14 Oct, 2024 1 commit
-
-
icecraft authored
* feat: manager docs with sphinx * fix: readthedocs configure * feat: support multiple language * fix: add .readthedocs.yaml * fix: requirments.txt path --------- Co-authored-by:icecraft <xurui1@pjlab.org.cn>
-
- 10 Sep, 2024 1 commit
-
-
drunkpig authored
* release: release 0.7.1 version (#526) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#493) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#508) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * Update cla.yml * Delete .github/workflows/gpu-ci.yml * Update Huggingface and ModelScope links to organization account * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
wangbinDL <wangbin_research@163.com> * feat<table model>: add tablemaster with paddleocr to detect and recognize table (#511) * Update cla.yml * Update bug_report.yml * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * Update cla.yml * feat: add tablemaster_paddle (#463) * Update README_zh-CN.md (#404) (#409) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * add dockerfile (#189) Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> * Update cla.yml * Update cla.yml --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> * <fix>(para_split_v2): index out of range issue of span_text first char (#396) Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> * @Matthijz98 has signed the CLA in opendatalab/MinerU#467 * Create download_models.py * Create requirements-docker.txt * feat<table model>: add tablemaster with paddleocr to detect and recognize table * @strongerfly has signed the CLA in opendatalab/MinerU#487 * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * Update cla.yml * Delete .github/workflows/gpu-ci.yml * Update Huggingface and ModelScope links to organization account * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table * feat<table model>: add tablemaster with paddleocr to detect and recognize table --------- Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
wangbinDL <wangbin_research@163.com> --------- Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
Kaiwen Liu <lkw_buaa@163.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
wangbinDL <wangbin_research@163.com> * Hotfix readme 0.7.1 (#528) * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md add HF、modelscope、colab url * Update README.md * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * Rename README.md to README_zh-CN.md * Create readme.md * Rename readme.md to README.md * Rename README.md to README_zh-CN.md * Update README_zh-CN.md * Create README.md * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * Create download_models_hf.py * Update README.md * Update README_zh-CN.md * Update README_zh-CN.md * Update README.md * Update README_zh-CN.md * Update FAQ_zh_cn.md * Update FAQ_en_us.md * Update FAQ_zh_cn.md * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 (#573) * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * fix: resolve inaccuracy of drawing layout box caused by paragraphs combination #384 * Update README_zh-CN.md * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * add rag data api * Update README_zh-CN.md update rag api image * Update README.md docs: remove RAG related release notes * Update README_zh-CN.md docs: remove RAG related release notes * Update README_zh-CN.md update 更新记录 --------- Co-authored-by:
yyy <102640628+dt-yy@users.noreply.github.com> Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
Aoyang Fang <222010547@link.cuhk.edu.cn> Co-authored-by:
Xiaomeng Zhao <moe@myhloli.com> Co-authored-by:
Kaiwen Liu <lkw_buaa@163.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn> Co-authored-by:
wangbinDL <wangbin_research@163.com>
-
- 06 Sep, 2024 2 commits
-
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
-
- 20 Aug, 2024 1 commit
-
-
Xiaomeng Zhao authored
* fix(ocr_mkcontent): revise table caption output - Ensuring that table captions are properly included in the output. - Remove the redundant `table_caption` variable。 * Update cla.yml * Update bug_report.yml * feat(cli): add debug option for detailed error handling Enable users to invoke the CLI command with a new debug flag to get detailed debugging information. * fix(pdf-extract-kit): adjust crop_paste parameters for better accuracyThe crop_paste_x and crop_paste_y values in the pdf_extract_kit.py have been modified to improve the accuracy and consistency of OCR processing. The new values are set to 25 to ensure more precise image cropping and pasting which leads to better OCR recognition results. * Update README_zh-CN.md (#404) correct FAQ url * Update README_zh-CN.md (#404) (#409) (#410) correct FAQ url Co-authored-by:
sfk <18810651050@163.com> * Update FAQ_zh_cn.md add new issue * Update FAQ_en_us.md * Update README_Windows_CUDA_Acceleration_zh_CN.md * Update README_zh-CN.md * @Thepathakarpit has signed the CLA in opendatalab/MinerU#418 * fix(pdf-extract-kit): increase crop_paste margin for OCR processingDouble the crop_paste margin from25 to 50 to ensure better OCR accuracy and handling of border cases. This change will help in improving the overall quality of OCR'ed text by providing more context around the detected text areas. * fix(common): deep copy model list before drawing model bbox Use a deep copy of the original model list in `drow_model_bbox` to avoid potential modifications to the source data. This ensures the integrity of the original models is maintained while generating the model bounding boxes visualization. --------- Co-authored-by:
sfk <18810651050@163.com> Co-authored-by:
drunkpig <60862764+drunkpig@users.noreply.github.com> Co-authored-by:
github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
-
- 13 Aug, 2024 1 commit
-
-
Xiaomeng Zhao authored
add new issue
-
- 10 Aug, 2024 1 commit
-
-
myhloli authored
Add FAQ entries in both English and Chinese to address the issue where the libGL.so.1 library is missing on Ubuntu22.04 when running under WSL2. The FAQ now includes instructions on how to install the missing library, resolvingthe corresponding ImportError.Closes https://github.com/opendatalab/MinerU/issues/388
-
- 09 Aug, 2024 1 commit
-
-
Xiaomeng Zhao authored
-
- 06 Aug, 2024 1 commit
-
-
myhloli authored
- Note the fix in version 0.6.2b1 for the network error during the first run of offline deployment and clarify the model download requirement. - Update the dependency installation guide for users on macOS with Intel CPUs. - Indicate the resolution in version 0.6.2b1 for compatibility issues with paddlepaddle version 2.6.1 on certain Linux systems. This change aims to make the FAQ more informative and easier to navigate for users experiencing similar issues, providing direct solutions and links where applicable.
-
- 02 Aug, 2024 1 commit
-
-
myhloli authored
Update the FAQ to clarify the dependency installation issue when using magic-pdf. Ensure users are directed to install the specific version of magic-pdf that resolves the dependency error, rather than listing all individual dependencies. This simplifies the troubleshooting process and provides a direct solution for users encountering the "Required dependency not installed" error.
-
- 01 Aug, 2024 1 commit
-
-
icecraft authored
* feat: refractor cli command * feat: add docs to describe the output files of cli * feat: resove review comments * feat: updat docs about middle.json --------- Co-authored-by:shenguanlin <shenguanlin@pjlab.org.cn>
-
- 29 Jul, 2024 1 commit
-
-
Xiaomeng Zhao authored
-
- 24 Jul, 2024 2 commits
-
-
Xiaomeng Zhao authored
-
myhloli authored
-
- 17 Jul, 2024 5 commits