And register the LLM as usual, adding the media configuration:
If `enable_image` or `enable_video` are not called, requests containing the corresponding modality will be rejected.
Register the LLM as usual, adding the media configuration:
```python
register_llm(
...
...
@@ -47,11 +49,15 @@ register_llm(
> [!WARNING]
> **Requires GPU node**: The frontend must run on a node with GPU access. During media processing, decoded tensors are written to GPU memory via NIXL, which requires `libcuda.so.1` to be available. Running the frontend on a CPU-only node will fail with something like: `Failed to initialize required backends: [UCX: No UCX plugin found]`.
> [!WARNING]
> **Video decoding**: Video decoding needs to be enabled via the `dynamo-llm/media-ffmpeg` rust feature. The following ffmpeg dynamic libraries must be available on the system: `libavcodec`, `libavdevice`, `libavfilter`, `libavformat`, `libswresample`, `libswscale`. These are available in dynamo containers built with `container/build.sh --enable-media-ffmpeg ...`
## Image decoding options
-**max_image_width** (uint32, > 0): If the image width exceeds this value, abort the decoding.
-**max_image_height** (uint32, > 0): If the image height exceeds this value, abort the decoding.
-**max_alloc** (uint64, > 0): Maximum allowed total allocation (RAM) of the decoder in bytes
### Limits (not overridable at runtime via `media_io_kwargs`)
-**limits.max_image_width** (uint32, > 0): If the image width exceeds this value, abort the decoding.
-**limits.max_image_height** (uint32, > 0): If the image height exceeds this value, abort the decoding.
-**limits.max_alloc** (uint64, > 0): Maximum allowed total allocation (RAM) of the decoder in bytes
## Video decoding options
### Sampling
...
...
@@ -63,9 +69,30 @@ There are two ways to configure video sampling: either with a fixed number of fr
### Others
-**strict** (bool): if strict mode is enabled, any failure to decode a requested frame will abort the whole video decoding and error out. When strict mode is disabled, it is possible that the decoding of some requested frame fails, and the resulting set of decoded frames might container fewer frames than expected.
-**max_alloc** (usize, > 0): If the total number of bytes in the decoded frames would exceed this value, abort the decoding.
### Limits (not overridable at runtime via `media_io_kwargs`)
-**limits.max_alloc** (usize, > 0): If the total number of bytes in the decoded frames would exceed this value, abort the decoding.
## Runtime media decoding options (`media_io_kwargs`)
Parameters of the decoders, can also be set at runtime via an extension to the OpenAI chat completions API. Limits defined in the MDC such as maximum image size, maximum RAM allocation, cannot be overridden at runtime.
This can be used for example to set the video sampling strategy for a request, that differs from the default one registered in the MDC:
```bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H"Content-Type: application/json"\
-d'{
"model": ...,
"messages": ...,
"media_io_kwargs": {
"video": {
"fps": 1.0,
"max_frames": 16
}
}
}'
```
## TODOs
...
...
@@ -80,7 +107,7 @@ There are two ways to configure video sampling: either with a fixed number of fr
- [x] Image SW decoding
- [ ] Video HW decoding (NVDEC)
- [ ] JPEG HW decoding (nvJPEG)
- [] Sparse video sampling (seek-forward)
- [x] Sparse video sampling (seek-forward)
- [ ] Memory slab pre-allocation/registration
### Memory management
...
...
@@ -89,4 +116,4 @@ There are two ways to configure video sampling: either with a fixed number of fr
### Misc
- [ ] Observability on performance, memory usage and input distributions