README.md 2.42 KB
Newer Older
wangsen's avatar
wangsen committed
1
2
# MinerU(国产 DCU 环境)完整部署与使用指南  
(已适配华为昇腾 DCU,删除 torch 编译依赖 & sglang,推荐 pipeline 后端)
徐超's avatar
徐超 committed
3

wangsen's avatar
wangsen committed
4
5
6
7
8
### 基础镜像信息
- **镜像名称**
  `image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy`
- **镜像来源(光源地址)**
  https://sourcefind.cn/#/image/dcu/pytorch?activeName=overview
9

wangsen's avatar
wangsen committed
10
### 1. 启动 DCU 容器(推荐命令)
11

12
```bash
wangsen's avatar
wangsen committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
docker run -id \
    --name mineru-dcu \
    --shm-size=256G \
    --ipc=host \
    --network=host \
    --privileged \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --device=/dev/kfd \
    --device=/dev/mkfd \
    --device=/dev/dri \
    -v /opt/hyhal:/opt/hyhal \
    -v /data:/data \
    -v $(pwd):/workspace \
    image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy \
    /bin/bash
30
```
31

wangsen's avatar
wangsen committed
32
33
### 2. 进入容器并安装 MinerU(已精简依赖)

34
```bash
wangsen's avatar
wangsen committed
35
# 克隆代码
36
37
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
38

wangsen's avatar
wangsen committed
39
40
41
42
43
44
# 【重要】删除以下三处不需要的 torch 相关依赖(镜像已预装 DCU 版 PyTorch)
# 编辑 setup.py 或 pyproject.toml,把以下内容删除或注释掉:
#   - pipeline_old_linux
#   - pipeline
#   - vlm
#   - sglang(整个依赖删掉)
45

wangsen's avatar
wangsen committed
46
47
# 推荐使用阿里源加速安装
pip install -e .[core] -i https://mirrors.aliyun.com/pypi/simple/
48
49
```

wangsen's avatar
wangsen committed
50
### 3. 命令行单文件快速测试
myhloli's avatar
myhloli committed
51

wangsen's avatar
wangsen committed
52
53
54
```bash
# 设置模型默认从 ModelScope 下载(国内最快)
export MINERU_MODEL_SOURCE=modelscope
55

wangsen's avatar
wangsen committed
56
57
58
# 转换 PDF(自动识别中英文,输出 Markdown)
mineru -p demo/pdfs/demo1.pdf -o ./output_demo --source modelscope
```
赵小蒙's avatar
赵小蒙 committed
59

wangsen's avatar
wangsen committed
60
### 4. 启动 API 服务(推荐)
赵小蒙's avatar
赵小蒙 committed
61

wangsen's avatar
wangsen committed
62
63
```bash
export MINERU_MODEL_SOURCE=modelscope
64

wangsen's avatar
wangsen committed
65
mineru-api --host 0.0.0.0 --port 8000 --backend pipeline
赵小蒙's avatar
赵小蒙 committed
66
67
```

wangsen's avatar
wangsen committed
68
### 5. 调用 API 示例(curl)
qiangqiang199's avatar
qiangqiang199 committed
69

wangsen's avatar
wangsen committed
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
```bash
curl -X 'POST' \
  'http://localhost:8000/file_parse' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@./demo/pdfs/demo1.pdf;type=application/pdf' \
  -F 'parse_method=auto' \
  -F 'lang_list=ch' \
  -F 'start_page_id=0' \
  -F 'end_page_id=99999' \
  -F 'backend=pipeline' \
  -F 'return_md=true' \
  -F 'return_middle_json=false' \
  -F 'return_model_output=false' \
  -F 'return_images=false' \
  -F 'return_content_list=false' \
  -F 'table_enable=true' \
  -F 'formula_enable=true' \
  -F 'output_dir=./output'
```
xuchao's avatar
xuchao committed
90