Unverified Commit ece7f8d5 authored by Kaiwen Liu's avatar Kaiwen Liu Committed by GitHub
Browse files

Merge pull request #6 from opendatalab/dev

Dev
parents 98362a6e 702b6ac9
magic-pdf[full]>=0.8.0
gradio
gradio-pdf
\ No newline at end of file
## 安装
<details open="open">
<summary><h2 style="display: inline-block">目录</h2></summary>
<li><a href="#介绍">介绍</a></li>
<li><a href="#安装">安装</a></li>
<li><a href="#示例">示例</a></li>
<li><a href="#开发">开发</a></li>
</ol>
</details>
MinerU
## 介绍
```bash
git clone https://github.com/opendatalab/MinerU.git
cd MinerU
`MinerU` 提供数据 `API接口` 以支持用户导入数据到 `RAG` 系统。本项目将基于`通义千问`展示如何构建一个轻量级的 `RAG` 系统。
<p align="center">
<img src="rag_data_api.png" width="300px" style="vertical-align:middle;">
</p>
## 安装
conda create -n MinerU python=3.10
conda activate MinerU
pip install .[full] --extra-index-url https://wheels.myhloli.com
环境要求
```text
NVIDIA A100 80GB,
Centos 7 3.10.0-957.el7.x86_64
Client: Docker Engine - Community
Version: 24.0.5
API version: 1.43
Go version: go1.20.6
Git commit: ced0996
Built: Fri Jul 21 20:39:02 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.5
API version: 1.43 (minimum version 1.12)
Go version: go1.20.6
Git commit: a61e2b4
Built: Fri Jul 21 20:38:05 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.25
GitCommit: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
runc:
Version: 1.1.10
GitCommit: v1.1.10-0-g18a0cb0
docker-init:
Version: 0.19.0
GitCommit: de40ad0
```
请参考[文档](../../README_zh-CN.md) 安装 MinerU
第三方软件
```bash
# install
pip install modelscope==1.14.0
pip install llama-index-vector-stores-elasticsearch==0.2.0
pip install llama-index-embeddings-dashscope==0.2.0
pip install llama-index-core==0.10.68
......@@ -26,39 +70,12 @@ pip install accelerate==0.33.0
pip uninstall transformer-engine
```
## 环境配置
```
export DASHSCOPE_API_KEY={some_key}
export ES_USER={some_es_user}
export ES_PASSWORD={some_es_password}
export ES_URL=http://{es_url}:9200
```
DASHSCOPE_API_KEY 开通参考[文档](https://help.aliyun.com/zh/dashscope/opening-service)
## 使用
### 导入数据
```bash
python data_ingestion.py -p some.pdf # load data from pdf
or
python data_ingestion.py -p /opt/data/some_pdf_directory/ # load data from multiples pdf which under the directory of {some_pdf_directory}
```
### 查询
```bash
python query.py --question '{the_question_you_want_to_ask}'
```
## 示例
````bash
# 启动 es 服务
cd projects/llama_index_rag
docker compose up -d
or
......@@ -67,17 +84,41 @@ docker-compose up -d
# 配置环境变量
export ES_USER=elastic
export ES_PASSWORD=llama_index
export ES_URL=http://127.0.0.1:9200
export DASHSCOPE_API_KEY={some_key}
DASHSCOPE_API_KEY 开通参考[文档](https://help.aliyun.com/zh/dashscope/opening-service)
# 未导入数据,查询问题。返回通义千问默认答案
python query.py -q 'how about the rights of men'
## outputs
question: how about the rights of men
answer: The topic of men's rights often refers to discussions around legal, social, and political issues that affect men specifically or differently from women. Movements related to men's rights advocate for addressing areas where men face discrimination or unique challenges, such as:
Child Custody: Ensuring that men have equal opportunities for custody of their children following divorce or separation.
Domestic Violence: Recognizing that men can also be victims of domestic abuse and ensuring they have access to support services.
Mental Health and Suicide Rates: Addressing the higher rates of suicide among men and providing mental health resources.
Military Conscription: In some countries, only men are required to register for military service, which is seen as a gender-based obligation.
Workplace Safety: Historically, more men than women have been employed in high-risk occupations, leading to higher workplace injury and death rates.
Parental Leave: Advocating for paternity leave policies that allow men to take time off work for family care.
Men's rights activism often intersects with broader discussions on gender equality and aims to promote fairness and equity across genders. It's important to note that while advocating for these issues, it should be done in a way that does not detract from or oppose the goals of gender equality and the rights of other groups. The focus should be on creating a fair society where everyone has equal opportunities and protections under the law.
# 导入数据
python data_ingestion.py example/data/declaration_of_the_rights_of_man_1789.pdf
python data_ingestion.py -p example/data/
or
python data_ingestion.py -p example/data/declaration_of_the_rights_of_man_1789.pdf
# 导入数据后,查询问题。通义千问模型会根据 RAG 系统的检索结果,结合上下文,给出答案。
# 查询问题
python query.py -q 'how about the rights of men'
## outputs
......
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
# MinerU web
## Table of Contents
- [Local Frontend Development](#local-frontend-development)
- [Technology Stack](#technology-stack)
## Local Frontend Development
### Prerequisites
- Node.js 18.x
- pnpm
### Installation Steps
1. Install Node.js 18
- Visit the [Node.js official website](https://nodejs.org/) to download and install Node.js version 18.x
2. Install pnpm
```bash
npm install -g pnpm
3. Clone the repository
```git clone https://github.com/opendatalab/MinerU
cd ./projects/web
```
4. Install dependencies
```
pnpm install
```
5. Run the development server
```
pnpm run dev
```
6. ⚠️ Note: This command is for local development only, do not use for deployment!
Open your browser and visit http://localhost:5173 (or another address output in the console)
7. Ensure that the backend service in ./projects/web_demo is running
8. If you encounter an error when executing `pnpm install`, you can switch to an alternative package manager.
```
npm install -g yarn
yarn
yarn start
```
## Building the Project
```
pnpm run build
```
## Technology Stack
- React
- Tailwind CSS
- typeScript
- zustand
- ahooks
# MinerU web
## 目录
- [前端本地开发](#前端本地开发)
- [技术栈](#技术栈)
## 前端本地开发
### 前置条件
- Node.js 18.x
- pnpm
### 安装步骤
1. 安装 Node.js 18
- 访问 [Node.js 官网](https://nodejs.org/) 下载并安装 Node.js 18.x 版本
2. 安装 pnpm
```bash
npm install -g pnpm
```
3. 克隆仓库
```
1. git clone https://github.com/opendatalab/MinerU
2. cd ./projects/web
```
4. 安装依赖
```
pnpm install
```
5. 运行开发服务器
```
pnpm run dev
```
6. ⚠️ 注意:此命令仅用于本地开发,不要用于部署!
打开浏览器访问 http://localhost:5173(或控制台输出的其他地址)
构建项目
要构建生产版本,请执行以下命令:
```
pnpm run build
```
7. 请确保./projects/web_demo后端服务启动
8. 如果pnpm install执行error,可更换包管理器
```
npm install -g yarn
yarn
yarn start
```
## 技术栈
- React
- Tailwind CSS
- typeScript
- zustand
- ahooks
import js from '@eslint/js'
import globals from 'globals'
import reactHooks from 'eslint-plugin-react-hooks'
import reactRefresh from 'eslint-plugin-react-refresh'
import tseslint from 'typescript-eslint'
export default tseslint.config(
{ ignores: ['dist'] },
{
extends: [js.configs.recommended, ...tseslint.configs.recommended],
files: ['**/*.{ts,tsx}'],
languageOptions: {
ecmaVersion: 2020,
globals: globals.browser,
},
plugins: {
'react-hooks': reactHooks,
'react-refresh': reactRefresh,
},
rules: {
...reactHooks.configs.recommended.rules,
'react-refresh/only-export-components': [
'warn',
{ allowConstantExport: true },
],
},
},
)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/logo.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>MinerU</title>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"name": "my-react-app",
"private": true,
"version": "0.0.0",
"type": "module",
"scripts": {
"dev": "vite --host ",
"build": "tsc --noEmit && vite build",
"lint": "eslint .",
"preview": "vite preview"
},
"dependencies": {
"@ant-design/icons": "^5.4.0",
"@codemirror/view": "^6.33.0",
"@tanstack/react-query": "^5.56.2",
"@types/lodash": "^4.17.7",
"@types/qs": "^6.9.15",
"@types/react-copy-to-clipboard": "^5.0.7",
"@types/react-syntax-highlighter": "^15.5.13",
"@uiw/codemirror-extensions-langs": "^4.23.0",
"@uiw/react-codemirror": "^4.23.0",
"ahooks": "^3.8.1",
"antd": "^5.20.3",
"axios": "^1.7.5",
"canvas": "^2.11.2",
"classnames": "^2.5.1",
"js-cookie": "^3.0.5",
"lodash": "^4.17.21",
"path2d": "^0.2.1",
"qs": "^6.13.0",
"react": "^18.3.1",
"react-copy-to-clipboard": "^5.1.0",
"react-dom": "^18.3.1",
"react-intl": "^6.6.8",
"react-markdown": "^9.0.1",
"react-query": "^3.39.3",
"react-router-dom": "^6.26.1",
"react-syntax-highlighter": "^15.5.0",
"rehype-katex": "^7.0.1",
"rehype-raw": "^7.0.0",
"remark-gfm": "^4.0.0",
"remark-math": "^6.0.0",
"zustand": "^4.5.5"
},
"devDependencies": {
"@eslint/js": "^9.9.0",
"@types/js-cookie": "^3.0.6",
"@types/node": "^22.5.1",
"@types/react": "^18.3.3",
"@types/react-dom": "^18.3.0",
"@vitejs/plugin-react": "^4.3.1",
"autoprefixer": "^10.4.20",
"eslint": "^9.9.0",
"eslint-plugin-react-hooks": "^5.1.0-rc.0",
"eslint-plugin-react-refresh": "^0.4.9",
"globals": "^15.9.0",
"less": "^4.2.0",
"postcss": "^8.4.41",
"sass-embedded": "^1.77.8",
"tailwindcss": "^3.4.10",
"ts-prune": "^0.10.3",
"typescript": "^5.5.3",
"typescript-eslint": "^8.0.1",
"vite": "^5.4.1"
}
}
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
export default {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
}
This source diff could not be displayed because it is too large. You can view the blob instead.
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<g clip-path="url(#clip0_1924_30132)">
<path opacity="0.4" d="M2.1688 9.68828C2.14583 9.59322 2.16383 9.49288 2.21842 9.41173L6.04222 3.7273C6.09089 3.65495 6.1649 3.60346 6.24966 3.58298L16.6007 1.08163C16.7879 1.03639 16.9763 1.15147 17.0216 1.33868L19.658 12.2487C20.7217 16.6504 18.0157 21.0808 13.6141 22.1445C9.21248 23.2082 4.782 20.5022 3.71834 16.1006L2.1688 9.68828Z" fill="url(#paint0_linear_1924_30132)"/>
<path d="M5.19531 7.24425C5.19531 7.14645 5.23638 7.05315 5.3085 6.98709L10.3605 2.35987C10.4248 2.30098 10.5089 2.26831 10.5961 2.26831H21.2451C21.4377 2.26831 21.5938 2.42444 21.5938 2.61703V13.8411C21.5938 18.3694 17.9229 22.0404 13.3946 22.0404C8.86624 22.0404 5.19531 18.3694 5.19531 13.8411V7.24425Z" fill="url(#paint1_linear_1924_30132)"/>
<path d="M9.87906 7.30192L5.28333 7.01537L10.4111 2.32143L10.626 6.5632C10.6475 6.98306 10.2987 7.32808 9.87906 7.30192Z" fill="#5D76FF"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M15.7002 11.8158V13.0479C15.7002 14.0038 14.9253 14.7787 13.9694 14.7787C13.0135 14.7787 12.2386 14.0038 12.2386 13.0479V9.96771H10.2145V13.3413C10.2145 15.4151 11.8956 17.0962 13.9694 17.0962C16.0432 17.0962 17.7243 15.4151 17.7243 13.3413V9.96771L17.7243 11.8158H15.7002Z" fill="#0028FD"/>
<path d="M17.7243 10.9944H18.5457V11.8158L17.7243 11.8158L17.7243 10.9944Z" fill="#0028FD"/>
<path d="M15.7002 10.4957H17.0203V11.8158L15.7002 11.8158L15.7002 10.4957Z" fill="#0028FD"/>
<path d="M17.0203 9.7917H17.7243V10.4957L17.0203 10.4957L17.0203 9.7917Z" fill="#0028FD"/>
<path d="M18.135 8.61828H18.5751V9.05831H18.135V8.61828Z" fill="#0028FD"/>
<path fill-rule="evenodd" clip-rule="evenodd" d="M15.4627 11.7367V12.9688C15.4627 13.9246 14.6878 14.6995 13.7319 14.6995C12.776 14.6995 12.0011 13.9246 12.0011 12.9688V9.88854H9.97697V13.2621C9.97697 15.3359 11.6581 17.017 13.7319 17.017C15.8057 17.017 17.4868 15.3359 17.4868 13.2621V9.88854L17.4868 11.7367H15.4627Z" fill="white"/>
<path d="M17.4868 10.9153H18.3082V11.7367L17.4868 11.7367L17.4868 10.9153Z" fill="white"/>
<path d="M15.4627 10.4166H16.7828V11.7367L15.4627 11.7367L15.4627 10.4166Z" fill="white"/>
<path d="M16.7828 9.71253H17.4868V10.4166L16.7828 10.4166L16.7828 9.71253Z" fill="white"/>
<path d="M17.8975 8.53912H18.3376V8.97915H17.8975V8.53912Z" fill="white"/>
</g>
<defs>
<linearGradient id="paint0_linear_1924_30132" x1="0.16149" y1="7.29712" x2="19.8967" y2="12.4412" gradientUnits="userSpaceOnUse">
<stop stop-color="#1543FE"/>
<stop offset="1" stop-color="#8C46FF"/>
</linearGradient>
<linearGradient id="paint1_linear_1924_30132" x1="3.80582" y1="4.44849" x2="21.7806" y2="14.0843" gradientUnits="userSpaceOnUse">
<stop stop-color="#1543FE"/>
<stop offset="1" stop-color="#8C46FF"/>
</linearGradient>
<clipPath id="clip0_1924_30132">
<rect width="24" height="24" fill="white"/>
</clipPath>
</defs>
</svg>
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment