You can simply install `huggingface-hub` package with pip.
You can simply install `huggingface-hub` package with pip.
```python
```bash
pip install huggingface-hub
pip install huggingface-hub
```
```
Once you start the TGI server, instantiate `InferenceClient()` with the URL to the endpoint serving the model. You can then call `text_generation()` to hit the endpoint through Python.
Once you start the TGI server, instantiate `InferenceClient()` with the URL to the endpoint serving the model. You can then call `text_generation()` to hit the endpoint through Python.
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
@@ -19,7 +19,7 @@ To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvi
...
@@ -19,7 +19,7 @@ To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvi
Once TGI is running, you can use the `generate` endpoint by doing requests. To learn more about how to query the endpoints, check the [Consuming TGI](./basic_tutorials/consuming_tgi) section.
Once TGI is running, you can use the `generate` endpoint by doing requests. To learn more about how to query the endpoints, check the [Consuming TGI](./basic_tutorials/consuming_tgi) section.
```shell
```bash
curl 127.0.0.1:8080/generate -X POST -d'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'-H'Content-Type: application/json'
curl 127.0.0.1:8080/generate -X POST -d'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}'-H'Content-Type: application/json'
```
```
...
@@ -27,7 +27,7 @@ curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","par
...
@@ -27,7 +27,7 @@ curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","par
To see all possible flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
To see all possible flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
```shell
```bash
docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help
docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help