You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters. To install the CLI, please refer to [the installation section](./installation#install-cli).
`text-generation-server` lets you download the model with `download-weights` command like below 👇
There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running
```bash
text-generation-launcher --help
```
You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/).
Same documentation can be found for `text-generation-server`.
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
@@ -4,7 +4,7 @@ The easiest way of getting started is using the official Docker container. Insta
...
@@ -4,7 +4,7 @@ The easiest way of getting started is using the official Docker container. Insta
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
```shell
```bash
model=tiiuae/falcon-7b-instruct
model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run