- 20 Aug, 2024 1 commit
-
-
Xiaomeng Zhao authored
Optimize the language detection logic to enhance content formatting. This change addresses issues with long word segmentation. Language detection now uses a threshold to determine the language of a text based on the proportion of English characters. Formatting rules for content have been updated to consider a list of languages (initially including Chinese, Japanese, and Korean) where no space is added between content segments for inline equations and text spans, improving the handling of Asian languages. The impact of these changes includes improved accuracy in language detection, better segmentation of long words, and more appropriate spacing in content formatting for multiple languages.
-
- 05 Aug, 2024 1 commit
-
-
liukaiwen authored
-
- 04 Aug, 2024 1 commit
-
-
myhloli authored
Ensure proper formatting of inline equations by adding spaces outside the equation delimitersto prevent markdown from interpreting the equation content as part of a link. This addresses the issue where inline OCR equations appear without the correct markdown formatting.
-
- 02 Aug, 2024 1 commit
-
-
Kaiwen Liu authored
* # add table recognition using struct-eqtable ## Changelog 31/07/20204 - Support table recognition. Table images will be converted into html. ### how to use the new feature: set the attribute 'table-mode' to 'true' in magic-pdf.json ### caution: it takes 200s to 500s to convert a single table image using cpu * # add table recognition using struct-eqtable ## Changelog 31/07/20204 - Support table recognition. Table images will be converted into LaTex. ### how to use the new feature: set the attribute 'table-mode' to 'true' in magic-pdf.json ### caution: it takes 200s to 500s to convert a single table image using cpu * # feat(model inference): add table recognition and convertion to LaTeX # What's Changed ### New Features - Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex. ### Instruction - pip install pypandoc struct-eqtable==0.1.0 - Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec ) and put it under models/ directory. - Edit 'table-mode' value to turn on table recognition function which is turned off by default. - If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。 * add table recognition and convertion to LaTeX * add table recognition and conversion to LaTeX * add table recognition and conversion to LaTeX * add table recognition and conversion to LaTeX --------- Co-authored-by:
liukaiwen <liukaiwen@pjlab.org.cn>
-
- 01 Aug, 2024 3 commits
-
-
liukaiwen authored
-
liukaiwen authored
-
liukaiwen authored
# What's Changed ### New Features - Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex. ### Instruction - pip install pypandoc struct-eqtable==0.1.0 - Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec) and put it under models/ directory. - Edit 'table-mode' value to turn on table recognition function which is turned off by default. - If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。
-
- 31 Jul, 2024 2 commits
-
-
liukaiwen authored
## Changelog 31/07/20204 - Support table recognition. Table images will be converted into LaTex. ### how to use the new feature: set the attribute 'table-mode' to 'true' in magic-pdf.json ### caution: it takes 200s to 500s to convert a single table image using cpu
-
liukaiwen authored
## Changelog 31/07/20204 - Support table recognition. Table images will be converted into html. ### how to use the new feature: set the attribute 'table-mode' to 'true' in magic-pdf.json ### caution: it takes 200s to 500s to convert a single table image using cpu
-
- 30 Jul, 2024 1 commit
-
-
myhloli authored
-
- 13 Jul, 2024 1 commit
-
-
myhloli authored
-
- 19 Jun, 2024 1 commit
-
-
赵小蒙 authored
-
- 30 Apr, 2024 1 commit
-
-
赵小蒙 authored
-
- 29 Apr, 2024 4 commits
- 25 Apr, 2024 2 commits
- 23 Apr, 2024 1 commit
-
-
赵小蒙 authored
-
- 22 Apr, 2024 3 commits
- 16 Apr, 2024 1 commit
-
-
赵小蒙 authored
-
- 15 Apr, 2024 2 commits
- 11 Apr, 2024 1 commit
-
-
赵小蒙 authored
2、实现UNIPipe
-
- 10 Apr, 2024 1 commit
-
-
赵小蒙 authored
-
- 08 Apr, 2024 3 commits
- 07 Apr, 2024 1 commit
-
-
赵小蒙 authored
-
- 26 Mar, 2024 1 commit
-
-
赵小蒙 authored
-
- 25 Mar, 2024 1 commit
-
-
赵小蒙 authored
-
- 24 Mar, 2024 2 commits
- 22 Mar, 2024 4 commits
-
-
赵小蒙 authored
-
赵小蒙 authored
-
kernel.h@qq.com authored
-
赵小蒙 authored
-