Unverified Commit 5e5e9d4b authored by lewtun's avatar lewtun Committed by GitHub
Browse files

feat: Add note about NVIDIA drivers (#64)


Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>
parent 603e20b5
...@@ -83,6 +83,7 @@ volume=$PWD/data # share a volume with the Docker container to avoid downloading ...@@ -83,6 +83,7 @@ volume=$PWD/data # share a volume with the Docker container to avoid downloading
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --num-shard $num_shard
``` ```
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
You can then query the model using either the `/generate` or `/generate_stream` routes: You can then query the model using either the `/generate` or `/generate_stream` routes:
...@@ -119,8 +120,6 @@ for response in client.generate_stream("What is Deep Learning?", max_new_tokens= ...@@ -119,8 +120,6 @@ for response in client.generate_stream("What is Deep Learning?", max_new_tokens=
print(text) print(text)
``` ```
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
### API documentation ### API documentation
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment