Unverified Commit 3a42ebbf authored by Xiaomeng Zhao's avatar Xiaomeng Zhao Committed by GitHub
Browse files

Merge pull request #838 from opendatalab/release-0.9.0

Release 0.9.0
parents 765c6d77 14024793
## 项目简介
本项目提供基于 LitServe 的多 GPU 并行处理方案。LitServe 是一个简便且灵活的 AI 模型服务引擎,基于 FastAPI 构建。它为 FastAPI 增强了批处理、流式传输和 GPU 自动扩展等功能,无需为每个模型单独重建 FastAPI 服务器。
## 环境配置
请使用以下命令配置所需的环境:
```bash
pip install -U litserve python-multipart filetype
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118
```
## 快速使用
### 1. 启动服务端
以下示例展示了如何启动服务端,支持自定义设置:
```python
server = ls.LitServer(
MinerUAPI(output_dir='/tmp'), # 可自定义输出文件夹
accelerator='cuda', # 启用 GPU 加速
devices='auto', # "auto" 使用所有 GPU
workers_per_device=1, # 每个 GPU 启动一个服务实例
timeout=False # 设置为 False 以禁用超时
)
server.run(port=8000) # 设定服务端口为 8000
```
启动服务端命令:
```bash
python server.py
```
### 2. 启动客户端
以下代码展示了客户端的使用方式,可根据需求修改配置:
```python
files = ['demo/small_ocr.pdf'] # 替换为文件路径,支持 jpg/jpeg、png、pdf 文件
n_jobs = np.clip(len(files), 1, 8) # 设置并发线程数,此处最大为 8,可根据自身修改
results = Parallel(n_jobs, prefer='threads', verbose=10)(
delayed(do_parse)(p) for p in files
)
print(results)
```
启动客户端命令:
```bash
python client.py
```
好了,你的文件会自动在多个 GPU 上并行处理!🍻🍻🍻
import base64
import requests
import numpy as np
from loguru import logger
from joblib import Parallel, delayed
def to_b64(file_path):
try:
with open(file_path, 'rb') as f:
return base64.b64encode(f.read()).decode('utf-8')
except Exception as e:
raise Exception(f'File: {file_path} - Info: {e}')
def do_parse(file_path, url='http://127.0.0.1:8000/predict', **kwargs):
try:
response = requests.post(url, json={
'file': to_b64(file_path),
'kwargs': kwargs
})
if response.status_code == 200:
output = response.json()
output['file_path'] = file_path
return output
else:
raise Exception(response.text)
except Exception as e:
logger.error(f'File: {file_path} - Info: {e}')
if __name__ == '__main__':
files = ['small_ocr.pdf']
n_jobs = np.clip(len(files), 1, 8)
results = Parallel(n_jobs, prefer='threads', verbose=10)(
delayed(do_parse)(p) for p in files
)
print(results)
import os
import fitz
import torch
import base64
import litserve as ls
from uuid import uuid4
from fastapi import HTTPException
from filetype import guess_extension
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton
class MinerUAPI(ls.LitAPI):
def __init__(self, output_dir='/tmp'):
self.output_dir = output_dir
def setup(self, device):
if device.startswith('cuda'):
os.environ['CUDA_VISIBLE_DEVICES'] = device.split(':')[-1]
if torch.cuda.device_count() > 1:
raise RuntimeError("Remove any CUDA actions before setting 'CUDA_VISIBLE_DEVICES'.")
model_manager = ModelSingleton()
model_manager.get_model(True, False)
model_manager.get_model(False, False)
print(f'Model initialization complete on {device}!')
def decode_request(self, request):
file = request['file']
file = self.to_pdf(file)
opts = request.get('kwargs', {})
opts.setdefault('debug_able', False)
opts.setdefault('parse_method', 'auto')
return file, opts
def predict(self, inputs):
try:
do_parse(self.output_dir, pdf_name := str(uuid4()), inputs[0], [], **inputs[1])
return pdf_name
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
finally:
self.clean_memory()
def encode_response(self, response):
return {'output_dir': response}
def clean_memory(self):
import gc
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
gc.collect()
def to_pdf(self, file_base64):
try:
file_bytes = base64.b64decode(file_base64)
file_ext = guess_extension(file_bytes)
with fitz.open(stream=file_bytes, filetype=file_ext) as f:
if f.is_pdf: return f.tobytes()
return f.convert_to_pdf()
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
if __name__ == '__main__':
server = ls.LitServer(
MinerUAPI(output_dir='/tmp'),
accelerator='cuda',
devices='auto',
workers_per_device=1,
timeout=False
)
server.run(port=8000)
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
# MinerU web
## Table of Contents
- [Local Frontend Development](#local-frontend-development)
- [Technology Stack](#technology-stack)
## Local Frontend Development
### Prerequisites
- Node.js 18.x
- pnpm
### Installation Steps
1. Install Node.js 18
- Visit the [Node.js official website](https://nodejs.org/) to download and install Node.js version 18.x
2. Install pnpm
```bash
npm install -g pnpm
3. Clone the repository
```git clone https://github.com/opendatalab/MinerU
cd ./projects/web
```
4. Install dependencies
```
pnpm install
```
5. Run the development server
```
pnpm run dev
```
6. ⚠️ Note: This command is for local development only, do not use for deployment!
Open your browser and visit http://localhost:5173 (or another address output in the console)
7. Ensure that the backend service in ./projects/web_demo is running
8. If you encounter an error when executing `pnpm install`, you can switch to an alternative package manager.
```
npm install -g yarn
yarn
yarn start
```
## Building the Project
```
pnpm run build
```
## Technology Stack
- React
- Tailwind CSS
- typeScript
- zustand
- ahooks
# MinerU web
## 目录
- [前端本地开发](#前端本地开发)
- [技术栈](#技术栈)
## 前端本地开发
### 前置条件
- Node.js 18.x
- pnpm
### 安装步骤
1. 安装 Node.js 18
- 访问 [Node.js 官网](https://nodejs.org/) 下载并安装 Node.js 18.x 版本
2. 安装 pnpm
```bash
npm install -g pnpm
```
3. 克隆仓库
```
1. git clone https://github.com/opendatalab/MinerU
2. cd ./projects/web
```
4. 安装依赖
```
pnpm install
```
5. 运行开发服务器
```
pnpm run dev
```
6. ⚠️ 注意:此命令仅用于本地开发,不要用于部署!
打开浏览器访问 http://localhost:5173(或控制台输出的其他地址)
构建项目
要构建生产版本,请执行以下命令:
```
pnpm run build
```
7. 请确保./projects/web_demo后端服务启动
8. 如果pnpm install执行error,可更换包管理器
```
npm install -g yarn
yarn
yarn start
```
## 技术栈
- React
- Tailwind CSS
- typeScript
- zustand
- ahooks
import js from '@eslint/js'
import globals from 'globals'
import reactHooks from 'eslint-plugin-react-hooks'
import reactRefresh from 'eslint-plugin-react-refresh'
import tseslint from 'typescript-eslint'
export default tseslint.config(
{ ignores: ['dist'] },
{
extends: [js.configs.recommended, ...tseslint.configs.recommended],
files: ['**/*.{ts,tsx}'],
languageOptions: {
ecmaVersion: 2020,
globals: globals.browser,
},
plugins: {
'react-hooks': reactHooks,
'react-refresh': reactRefresh,
},
rules: {
...reactHooks.configs.recommended.rules,
'react-refresh/only-export-components': [
'warn',
{ allowConstantExport: true },
],
},
},
)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/logo.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>MinerU</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"name": "my-react-app",
"private": true,
"version": "0.0.0",
"type": "module",
"scripts": {
"dev": "vite --host ",
"build": "tsc --noEmit && vite build",
"lint": "eslint .",
"preview": "vite preview"
},
"dependencies": {
"@ant-design/icons": "^5.4.0",
"@codemirror/view": "^6.33.0",
"@tanstack/react-query": "^5.56.2",
"@types/lodash": "^4.17.7",
"@types/qs": "^6.9.15",
"@types/react-copy-to-clipboard": "^5.0.7",
"@types/react-syntax-highlighter": "^15.5.13",
"@uiw/codemirror-extensions-langs": "^4.23.0",
"@uiw/react-codemirror": "^4.23.0",
"ahooks": "^3.8.1",
"antd": "^5.20.3",
"axios": "^1.7.5",
"canvas": "^2.11.2",
"classnames": "^2.5.1",
"js-cookie": "^3.0.5",
"lodash": "^4.17.21",
"path2d": "^0.2.1",
"qs": "^6.13.0",
"react": "^18.3.1",
"react-copy-to-clipboard": "^5.1.0",
"react-dom": "^18.3.1",
"react-intl": "^6.6.8",
"react-markdown": "^9.0.1",
"react-query": "^3.39.3",
"react-router-dom": "^6.26.1",
"react-syntax-highlighter": "^15.5.0",
"rehype-katex": "^7.0.1",
"rehype-raw": "^7.0.0",
"remark-gfm": "^4.0.0",
"remark-math": "^6.0.0",
"zustand": "^4.5.5"
},
"devDependencies": {
"@eslint/js": "^9.9.0",
"@types/js-cookie": "^3.0.6",
"@types/node": "^22.5.1",
"@types/react": "^18.3.3",
"@types/react-dom": "^18.3.0",
"@vitejs/plugin-react": "^4.3.1",
"autoprefixer": "^10.4.20",
"eslint": "^9.9.0",
"eslint-plugin-react-hooks": "^5.1.0-rc.0",
"eslint-plugin-react-refresh": "^0.4.9",
"globals": "^15.9.0",
"less": "^4.2.0",
"postcss": "^8.4.41",
"sass-embedded": "^1.77.8",
"tailwindcss": "^3.4.10",
"ts-prune": "^0.10.3",
"typescript": "^5.5.3",
"typescript-eslint": "^8.0.1",
"vite": "^5.4.1"
}
}
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
export default {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
}
This source diff could not be displayed because it is too large. You can view the blob instead.
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<g clip-path="url(#clip0_1924_30132)">
<path opacity="0.4" d="M2.1688 9.68828C2.14583 9.59322 2.16383 9.49288 2.21842 9.41173L6.04222 3.7273C6.09089 3.65495 6.1649 3.60346 6.24966 3.58298L16.6007 1.08163C16.7879 1.03639 16.9763 1.15147 17.0216 1.33868L19.658 12.2487C20.7217 16.6504 18.0157 21.0808 13.6141 22.1445C9.21248 23.2082 4.782 20.5022 3.71834 16.1006L2.1688 9.68828Z" fill="url(#paint0_linear_1924_30132)"/>
<path d="M5.19531 7.24425C5.19531 7.14645 5.23638 7.05315 5.3085 6.98709L10.3605 2.35987C10.4248 2.30098 10.5089 2.26831 10.5961 2.26831H21.2451C21.4377 2.26831 21.5938 2.42444 21.5938 2.61703V13.8411C21.5938 18.3694 17.9229 22.0404 13.3946 22.0404C8.86624 22.0404 5.19531 18.3694 5.19531 13.8411V7.24425Z" fill="url(#paint1_linear_1924_30132)"/>
<path d="M9.87906 7.30192L5.28333 7.01537L10.4111 2.32143L10.626 6.5632C10.6475 6.98306 10.2987 7.32808 9.87906 7.30192Z" fill="#5D76FF"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M15.7002 11.8158V13.0479C15.7002 14.0038 14.9253 14.7787 13.9694 14.7787C13.0135 14.7787 12.2386 14.0038 12.2386 13.0479V9.96771H10.2145V13.3413C10.2145 15.4151 11.8956 17.0962 13.9694 17.0962C16.0432 17.0962 17.7243 15.4151 17.7243 13.3413V9.96771L17.7243 11.8158H15.7002Z" fill="#0028FD"/>
<path d="M17.7243 10.9944H18.5457V11.8158L17.7243 11.8158L17.7243 10.9944Z" fill="#0028FD"/>
<path d="M15.7002 10.4957H17.0203V11.8158L15.7002 11.8158L15.7002 10.4957Z" fill="#0028FD"/>
<path d="M17.0203 9.7917H17.7243V10.4957L17.0203 10.4957L17.0203 9.7917Z" fill="#0028FD"/>
<path d="M18.135 8.61828H18.5751V9.05831H18.135V8.61828Z" fill="#0028FD"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M15.4627 11.7367V12.9688C15.4627 13.9246 14.6878 14.6995 13.7319 14.6995C12.776 14.6995 12.0011 13.9246 12.0011 12.9688V9.88854H9.97697V13.2621C9.97697 15.3359 11.6581 17.017 13.7319 17.017C15.8057 17.017 17.4868 15.3359 17.4868 13.2621V9.88854L17.4868 11.7367H15.4627Z" fill="white"/>
<path d="M17.4868 10.9153H18.3082V11.7367L17.4868 11.7367L17.4868 10.9153Z" fill="white"/>
<path d="M15.4627 10.4166H16.7828V11.7367L15.4627 11.7367L15.4627 10.4166Z" fill="white"/>
<path d="M16.7828 9.71253H17.4868V10.4166L16.7828 10.4166L16.7828 9.71253Z" fill="white"/>
<path d="M17.8975 8.53912H18.3376V8.97915H17.8975V8.53912Z" fill="white"/>
</g>
<defs>
<linearGradient id="paint0_linear_1924_30132" x1="0.16149" y1="7.29712" x2="19.8967" y2="12.4412" gradientUnits="userSpaceOnUse">
<stop stop-color="#1543FE"/>
<stop offset="1" stop-color="#8C46FF"/>
</linearGradient>
<linearGradient id="paint1_linear_1924_30132" x1="3.80582" y1="4.44849" x2="21.7806" y2="14.0843" gradientUnits="userSpaceOnUse">
<stop stop-color="#1543FE"/>
<stop offset="1" stop-color="#8C46FF"/>
</linearGradient>
<clipPath id="clip0_1924_30132">
<rect width="24" height="24" fill="white"/>
</clipPath>
</defs>
</svg>
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment