Unverified Commit e58ad6dd authored by Merve Noyan's avatar Merve Noyan Committed by GitHub
Browse files

Added CLI docs (#799)

Added docs for CLI 
parent 7dbaef3f
...@@ -15,4 +15,6 @@ ...@@ -15,4 +15,6 @@
title: Preparing Model for Serving title: Preparing Model for Serving
- local: basic_tutorials/gated_model_access - local: basic_tutorials/gated_model_access
title: Serving Private & Gated Models title: Serving Private & Gated Models
- local: basic_tutorials/using_cli
title: Using TGI CLI
title: Tutorials title: Tutorials
# Using TGI CLI
You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters. To install the CLI, please refer to [the installation section](./installation#install-cli).
`text-generation-server` lets you download the model with `download-weights` command like below 👇
```bash
text-generation-server download-weights MODEL_HUB_ID
```
You can also use it to quantize models like below 👇
```bash
text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR
```
You can use `text-generation-launcher` to serve models.
```bash
text-generation-launcher --model-id MODEL_HUB_ID --port 8080
```
There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running
```bash
text-generation-launcher --help
```
You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/).
Same documentation can be found for `text-generation-server`.
```bash
text-generation-server --help
```
...@@ -4,8 +4,20 @@ This section explains how to install the CLI tool as well as installing TGI from ...@@ -4,8 +4,20 @@ This section explains how to install the CLI tool as well as installing TGI from
## Install CLI ## Install CLI
TODO You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters.
To install the CLI, you need to first clone the TGI repository and then run `make`.
```bash
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
make install
```
If you would like to serve models with custom kernels, run
```bash
BUILD_EXTENSIONS=True make install
```
## Local Installation from Source ## Local Installation from Source
...@@ -44,7 +56,8 @@ brew install protobuf ...@@ -44,7 +56,8 @@ brew install protobuf
Then run to install Text Generation Inference: Then run to install Text Generation Inference:
```bash ```bash
BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
BUILD_EXTENSIONS=True make install
``` ```
<Tip warning={true}> <Tip warning={true}>
...@@ -64,9 +77,3 @@ make run-falcon-7b-instruct ...@@ -64,9 +77,3 @@ make run-falcon-7b-instruct
``` ```
This will serve Falcon 7B Instruct model from the port 8080, which we can query. This will serve Falcon 7B Instruct model from the port 8080, which we can query.
To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
```bash
text-generation-launcher --help
```
...@@ -4,7 +4,7 @@ The easiest way of getting started is using the official Docker container. Insta ...@@ -4,7 +4,7 @@ The easiest way of getting started is using the official Docker container. Insta
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that: Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
```shell ```bash
model=tiiuae/falcon-7b-instruct model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
...@@ -31,4 +31,4 @@ To see all possible flags and options, you can use the `--help` flag. It's possi ...@@ -31,4 +31,4 @@ To see all possible flags and options, you can use the `--help` flag. It's possi
docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help
``` ```
</Tip> </Tip>
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment