You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route.
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route.
The Swagger UI is also available at: [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference).
The Swagger UI is also available at: [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference).
### Distributed Tracing
`text-generation-inference` is instrumented with distributed tracing using OpenTelemetry. You can use this feature
by setting the address to an OTLP collector with the `--otlp-endpoint` argument.
### A note on Shared Memory (shm)
### A note on Shared Memory (shm)
[`NCCL`](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/index.html) is a communication framework used by
[`NCCL`](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/index.html) is a communication framework used by
.expect("ID not found in entries. This is a bug.");
.expect("ID not found in entries. This is a bug.");
// Create and enter a span to link this function back to the entry
let_generation_span=info_span!(parent:entry.temp_span.as_ref().expect("batch_span is None. This is a bug."),"send_generation",generation=?generation).entered();