README.md 2.45 KB
Newer Older
wangsen's avatar
wangsen committed
1
2
# MinerU(国产 DCU 环境)完整部署与使用指南  
(已适配华为昇腾 DCU,删除 torch 编译依赖 & sglang,推荐 pipeline 后端)
徐超's avatar
徐超 committed
3

wangsen's avatar
wangsen committed
4
5
6
7
8
### 基础镜像信息
- **镜像名称**
  `image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy`
- **镜像来源(光源地址)**
  https://sourcefind.cn/#/image/dcu/pytorch?activeName=overview
9

wangsen's avatar
wangsen committed
10
### 1. 启动 DCU 容器(推荐命令)
11

12
```bash
wangsen's avatar
wangsen committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
docker run -id \
    --name mineru-dcu \
    --shm-size=256G \
    --ipc=host \
    --network=host \
    --privileged \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --device=/dev/kfd \
    --device=/dev/mkfd \
    --device=/dev/dri \
    -v /opt/hyhal:/opt/hyhal \
    -v /data:/data \
    -v $(pwd):/workspace \
    image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy \
    /bin/bash
30
```
31

wangsen's avatar
wangsen committed
32
33
### 2. 进入容器并安装 MinerU(已精简依赖)

34
```bash
wangsen's avatar
wangsen committed
35
# 克隆代码
36
37
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
38

wangsen's avatar
wangsen committed
39
40
# 【重要】删除以下三处不需要的 torch 相关依赖(镜像已预装 DCU 版 PyTorch)
# 编辑 setup.py 或 pyproject.toml,把以下内容删除或注释掉:
wangsen's avatar
wangsen committed
41
#   - pipeline_old_linux   pipeline  vlm 中关于torchvision和torch的依赖
wangsen's avatar
wangsen committed
42
#   - sglang(整个依赖删掉)
43

wangsen's avatar
wangsen committed
44
45
# 推荐使用阿里源加速安装
pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple/
46
47
```

wangsen's avatar
wangsen committed
48
### 3. 命令行单文件快速测试
myhloli's avatar
myhloli committed
49

wangsen's avatar
wangsen committed
50
51
52
```bash
# 设置模型默认从 ModelScope 下载(国内最快)
export MINERU_MODEL_SOURCE=modelscope
53

wangsen's avatar
wangsen committed
54
55
56
# 转换 PDF(自动识别中英文,输出 Markdown)
mineru -p demo/pdfs/demo1.pdf -o ./output_demo --source modelscope
```
赵小蒙's avatar
赵小蒙 committed
57

wangsen's avatar
wangsen committed
58
### 4. 启动 API 服务(推荐)
赵小蒙's avatar
赵小蒙 committed
59

wangsen's avatar
wangsen committed
60
61
```bash
export MINERU_MODEL_SOURCE=modelscope
62

wangsen's avatar
wangsen committed
63
mineru-api --host 0.0.0.0 --port 8000 --backend pipeline
赵小蒙's avatar
赵小蒙 committed
64
65
```

wangsen's avatar
wangsen committed
66
### 5. 调用 API 示例(curl)
qiangqiang199's avatar
qiangqiang199 committed
67

wangsen's avatar
wangsen committed
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
```bash
curl -X 'POST' \
  'http://localhost:8000/file_parse' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@./demo/pdfs/demo1.pdf;type=application/pdf' \
  -F 'parse_method=auto' \
  -F 'lang_list=ch' \
  -F 'start_page_id=0' \
  -F 'end_page_id=99999' \
  -F 'backend=pipeline' \
  -F 'return_md=true' \
  -F 'return_middle_json=false' \
  -F 'return_model_output=false' \
  -F 'return_images=false' \
  -F 'return_content_list=false' \
  -F 'table_enable=true' \
  -F 'formula_enable=true' \
  -F 'output_dir=./output'
```
xuchao's avatar
xuchao committed
88