volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
token=<your cli READ token>
token=<your cli READ token>
docker run --gpus all --shm-size 1g -eHUGGING_FACE_HUB_TOKEN=$token-p 8080:80 -v$volume:/data ghcr.io/huggingface/text-generation-inference:1.0.3--model-id$model
docker run --gpus all --shm-size 1g -eHUGGING_FACE_HUB_TOKEN=$token-p 8080:80 -v$volume:/data ghcr.io/huggingface/text-generation-inference:1.1.0--model-id$model
To see all possible deploy flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
To see all possible deploy flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
```bash
```bash
docker run ghcr.io/huggingface/text-generation-inference:1.0.3--help
docker run ghcr.io/huggingface/text-generation-inference:1.1.0--help