Added CLI docs (#799)

Added docs for CLI

Added CLI docs (#799)
Added docs for CLI
e58ad6dd · Merve Noyan · GitHub · 7dbaef3f · e58ad6dd · e58ad6dd
Unverified Commit e58ad6dd authored Aug 10, 2023 by Merve Noyan Committed by GitHub Aug 10, 2023
4 changed files
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -15,4 +15,6 @@
    title: Preparing Model for Serving
  - local: basic_tutorials/gated_model_access
    title: Serving Private & Gated Models
+  - local: basic_tutorials/using_cli
+    title: Using TGI CLI
  title: Tutorials
--- a/docs/source/basic_tutorials/using_cli.md
+++ b/docs/source/basic_tutorials/using_cli.md
+# Using TGI CLI
+You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters. To install the CLI, please refer to [the installation section](./installation#install-cli).
+`text-generation-server` lets you download the model with `download-weights` command like below 👇 
+```bash
+text-generation-server download-weights MODEL_HUB_ID
+```
+You can also use it to quantize models like below 👇 
+```bash
+text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR 
+```
+You can use `text-generation-launcher` to serve models. 
+```bash
+text-generation-launcher --model-id MODEL_HUB_ID --port 8080
+```
+There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running 
+```bash
+text-generation-launcher --help
+``` 
+You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/).
+Same documentation can be found for `text-generation-server`.
+```bash
+text-generation-server --help
+```
--- a/docs/source/installation.md
+++ b/docs/source/installation.md
@@ -4,8 +4,20 @@ This section explains how to install the CLI tool as well as installing TGI from
 ## Install CLI
-TODO
+You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters. 
+To install the CLI, you need to first clone the TGI repository and then run `make`.
+```bash
+git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
+make install
+```
+If you would like to serve models with custom kernels, run
+```bash
+BUILD_EXTENSIONS=True make install
+```
 ## Local Installation from Source
@@ -44,7 +56,8 @@ brew install protobuf
 Then run to install Text Generation Inference:
 ```bash
-BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels
+git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
+BUILD_EXTENSIONS=True make install
 ```
 <Tip warning={true}>
@@ -64,9 +77,3 @@ make run-falcon-7b-instruct
 ```
 This will serve Falcon 7B Instruct model from the port 8080, which we can query.
-To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
-```bash
-text-generation-launcher --help
-```
--- a/docs/source/quicktour.md
+++ b/docs/source/quicktour.md
@@ -4,7 +4,7 @@ The easiest way of getting started is using the official Docker container. Insta
 Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
-```shell
+```bash
 model=tiiuae/falcon-7b-instruct
 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run