Text Generation Inference
## 简介 Text Generation Inference(TGI)是一个用 Rust 和 Python 编写的框架,用于部署和提供LLM模型的推理服务。TGI为很多大模型提供了高性能的推理服务,如LLama,Falcon,BLOOM,Baichuan,Qwen等。 ## 支持模型列表 - [Deepseek V2](https://huggingface.co/deepseek-ai/DeepSeek-V2) - [Idefics 2](https://huggingface.co/HuggingFaceM4/idefics2-8b) (Multimodal) - [Llava Next (1.6)](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) (Multimodal) - [Llama](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f) - [Phi 3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) - [Granite](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct) - [Gemma](https://huggingface.co/google/gemma-7b) - [PaliGemma](https://huggingface.co/google/paligemma-3b-pt-224) - [Gemma2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) - [Cohere](https://huggingface.co/CohereForAI/c4ai-command-r-plus) - [Dbrx](https://huggingface.co/databricks/dbrx-instruct) - [Mamba](https://huggingface.co/state-spaces/mamba-2.8b-slimpj) - [Mistral](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) - [Mixtral](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) - [Gpt Bigcode](https://huggingface.co/bigcode/gpt_bigcode-santacoder) - [Phi](https://huggingface.co/microsoft/phi-1_5) - [PhiMoe](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) - [Baichuan](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat) - [Falcon](https://huggingface.co/tiiuae/falcon-7b-instruct) - [StarCoder 2](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1) - [Qwen 2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f) - [Qwen 2 VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) - [Opt](https://huggingface.co/facebook/opt-6.7b) - [T5](https://huggingface.co/google/flan-t5-xxl) - [Galactica](https://huggingface.co/facebook/galactica-120b) - [SantaCoder](https://huggingface.co/bigcode/santacoder) - [Bloom](https://huggingface.co/bigscience/bloom-560m) - [Mpt](https://huggingface.co/mosaicml/mpt-7b-instruct) - [Gpt2](https://huggingface.co/openai-community/gpt2) - [Gpt Neox](https://huggingface.co/EleutherAI/gpt-neox-20b) - [Gptj](https://huggingface.co/EleutherAI/gpt-j-6b) - [Idefics](https://huggingface.co/HuggingFaceM4/idefics-9b) (Multimodal) - [Mllama](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) (Multimodal) ## 环境要求 + Python 3.10 + DTK 25.04 + torch 2.4.1 ### 使用源码编译方式安装 #### 编译环境准备 有两种方式安装准备环境 ##### 方式一: ### **TODO** ##### 方式二: 基于光源pytorch2.4.1基础镜像环境:镜像下载:docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.4.1-ubuntu22.04-dtk25.04-py3.10-fixpy 1. 安装Rust ```shell curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh rustup update rustup override set 1.85.0 ``` 2. 安装Protoc ```shell PROTOC_ZIP=protoc-21.12-linux-x86_64.zip curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*' rm -f $PROTOC_ZIP ``` 3. 安装TGI Service ```bash git clone http://developer.hpccube.com/codes/OpenDAS/text-generation-inference.git # 分支进行切换-b v3.0.0 cd text-generation-inference #安装exllama cd server/exllamav2_kernels python setup.py install #安装exllmav2 kernels cd ../.. #回到项目根目录 source $HOME/.cargo/env BUILD_EXTENSIONS=True make install #安装text-generation服务 ``` 4. 安装benchmark ```bash cd text-generation-inference make install-benchmark ``` 注意:若安装过程过慢,可以通过如下命令修改默认源提速。 ```bash pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple ``` 另外,`cargo install` 太慢也可以通过在`~/.cargo/config.toml`中添加源来提速。 ## 查看安装的版本号 ```bash text-generation-launcher -V #版本号与官方版本同步 ``` ## 使用前 ```bash export PYTORCH_TUNABLEOP_ENABLED=0 export ATTENTION=paged ``` ## 使用 ```bash #启动tgi服务 HIP_VISIBLE_DEVICES=2 text-generation-launcher --dtype=float16 --model-id /path/to/model --trust-remote-code --port 3001 #请求服务 curl 127.0.0.1:3001/generate \ -X POST \ -d '{"inputs":"What is deep learning?","parameters":{"max_new_tokens":100,"temperature":0.7}}' \ -H 'Content-Type: application/json' #通过python调用的方式 import requests headers = { "Content-Type": "application/json", } data = { 'inputs': 'What is Deep Learning?', 'parameters': { 'max_new_tokens': 20, }, } response = requests.post('http://127.0.0.1:3001/generate', headers=headers, json=data) print(response.json()) # {'generated_text': ' Deep Learning is a subset of machine learning where neural networks are trained deep within a hierarchy of layers instead'} ``` ## Known Issue - 无 ## 参考资料 - [README_ORIGIN](README_ORIGIN.md) - [https://github.com/huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference)