feat: Connect Library (#1478)

e0a51940 · J Wyman · GitHub · c6f12f64 · e0a51940 · e0a51940
Unverified Commit e0a51940 authored Jul 23, 2025 by J Wyman Committed by GitHub Jul 23, 2025
14 changed files
--- a/docs/API/nixl_connect/README.md
+++ b/docs/API/nixl_connect/README.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Dynamo NIXL Connect
+
+Dynamo connect provides utilities for using the NIXL base RDMA subsystem via a set of Python classes.
+The primary goal of this library to simplify the integration of NIXL based RDMA into inference applications.
+The `dynamo.nixl_connect` library can be imported by any Dynamo container hosted application.
+
+```python
+import dynamo.nixl_connect
+```
+
+All operations using the NIXL Connect library begin with the [`Connector`](connector.md) class and the type of operation required.
+There are four types of supported operations:
+
+ 1. **Register local readable memory**:
+
+    Register local memory buffer(s) with the RDMA subsystem to enable a remote worker to read from.
+
+ 2. **Register local writable memory**:
+
+    Register local memory buffer(s) with the RDMA subsystem to enable a remote worker to write to.
+
+ 3. **Read from registered, remote memory**:
+
+    Read remote memory buffer(s), registered by a remote worker to be readable, into local memory buffer(s).
+
+ 4. **Write to registered, remote memory**:
+
+    Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker to writable.
+
+By connecting correctly paired operations, high-throughput GPU Direct RDMA data transfers can be completed.
+Given the list above, the correct pairing of operations would be 1 & 3 or 2 & 4.
+Where one side is a "(read|write)-able operation" and the other is its correctly paired "(read|write) operation".
+Specifically, a read operation must be paired with a readable operation, and a write operation must be paired with a writable operation.
+
+```mermaid
+sequenceDiagram
+    participant LocalWorker
+    participant RemoteWorker
+    participant NIXL
+
+    LocalWorker ->> NIXL: Register memory (Descriptor)
+    RemoteWorker ->> NIXL: Register memory (Descriptor)
+    LocalWorker ->> LocalWorker: Create Readable/WritableOperation
+    LocalWorker ->> RemoteWorker: Send RDMA metadata (via HTTP/TCP+NATS)
+    RemoteWorker ->> NIXL: Begin Read/WriteOperation with metadata
+    NIXL -->> RemoteWorker: Data transfer (RDMA)
+    RemoteWorker -->> LocalWorker: Notify completion (unblock awaiter)
+```
+
+## Examples
+
+### Generic Example
+
+In the diagram below, Local creates a [`WritableOperation`](writable_operation.md) intended to receive data from Remote.
+Local then sends metadata about the requested RDMA operation to Remote.
+Remote then uses the metadata to create a [`WriteOperation`](write_operation.md) which will perform the GPU Direct RDMA memory transfer from Remote's GPU memory to Local's GPU memory.
+
+```mermaid
+---
+title: Write Operation Between Two Workers
+---
+flowchart LR
+  c1[Remote] --"3: .begin_write()"--- WriteOperation
+  WriteOperation e1@=="4: GPU Direct RDMA"==> WritableOperation
+  WritableOperation --"1: .create_writable()"--- c2[Local]
+  c2 e2@--"2: RDMA Metadata via HTTP"--> c1
+  e1@{ animate: true; }
+  e2@{ animate: true; }
+```
+
+### Multimodal Example
+
+In the case of the [Dynamo Multimodal Disaggregated Example](../../examples/multimodal/README.md):
+
+ 1. The HTTP frontend accepts a text prompt and a URL to an image.
+
+ 2. The prompt and URL are then enqueued with the Processor before being dispatched to the first available Decode Worker.
+
+ 3. Decode Worker then requests a Prefill Worker to provide key-value data for the LLM powering the Decode Worker.
+
+ 4. Prefill Worker then requests that the image be processed and provided as embeddings by the Encode Worker.
+
+ 5. Encode Worker acquires the image, processes it, performs inference on the image using a specialized vision model, and finally provides the embeddings to Prefill Worker.
+
+ 6. Prefill Worker receives the embeddings from Encode Worker and generates a key-value cache (KV$) update for Decode Worker's LLM and writes the update directly to the GPU memory reserved for the data.
+
+ 7. Finally, Decode Worker performs the requested inference.
+
+```mermaid
+---
+title: Multimodal Disaggregated Workflow
+---
+flowchart LR
+  p0[HTTP Frontend] i0@--"text prompt"-->p1[Processor]
+  p0 i1@--"url"-->p1
+  p1 i2@--"prompt"-->dw[Decode Worker]
+  p1 i3@--"url"-->dw
+  dw i4@--"prompt"-->pw[Prefill Worker]
+  dw i5@--"url"-->pw
+  pw i6@--"url"-->ew[Encode Worker]
+  ew o0@=="image embeddings"==>pw
+  pw o1@=="kv_cache updates"==>dw
+  dw o2@--"inference results"-->p0
+
+  i0@{ animate: true; }
+  i1@{ animate: true; }
+  i2@{ animate: true; }
+  i3@{ animate: true; }
+  i4@{ animate: true; }
+  i5@{ animate: true; }
+  i6@{ animate: true; }
+  o0@{ animate: true; }
+  o1@{ animate: true; }
+  o2@{ animate: true; }
+```
+
+> [!Note]
+> In this example, it is the data transfer between the Prefill Worker and the Encode Worker that utilizes the Dynamo NIXL Connect library.
+> The KV Cache transfer between Decode Worker and Prefill Worker utilizes the NIXL base RDMA subsystem directly without using the Dynamo NIXL Connect library.
+
+#### Code Examples
+
+See [prefill_worker](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal/components/prefill_worker.py#L199) or [decode_worker](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal/components/decode_worker.py#L239) from our Multimodal example,
+for how they coordinate directly with the Encode Worker by creating a [`WritableOperation`](writable_operation.md),
+sending the operation's metadata via Dynamo's round-robin dispatcher, and awaiting the operation for completion before making use of the transferred data.
+
+See [encode_worker](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal/components/encode_worker.py#L190) from our Multimodal example,
+for how the resulting embeddings are registered with the RDMA subsystem by creating a [`Descriptor`](descriptor.md),
+a [`WriteOperation`](write_operation.md) is created using the metadata provided by the requesting worker,
+and the worker awaits for the data transfer to complete for yielding a response.
+
+
+## Python Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [SerializedRequest](serialized_request.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
+
+
+## References
+
+  - [NVIDIA Dynamo](https://developer.nvidia.com/dynamo) @ [GitHub](https://github.com/ai-dynamo/dynamo)
+    - [NVIDIA Dynamo NIXL Connect](https://github.com/ai-dynamo/dynamo/tree/main/docs/runtime/nixl_connect)
+  - [NVIDIA Inference Transfer Library (NIXL)](https://developer.nvidia.com/blog/introducing-nvidia-dynamo-a-low-latency-distributed-inference-framework-for-scaling-reasoning-ai-models/#nvidia_inference_transfer_library_nixl_low-latency_hardware-agnostic_communication%C2%A0) @ [GitHub](https://github.com/ai-dynamo/nixl)
+  - [Dynamo Multimodal Example](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal)
+  - [NVIDIA GPU Direct](https://developer.nvidia.com/gpudirect)
--- a/docs/API/nixl_connect/connector.md
+++ b/docs/API/nixl_connect/connector.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.Connector
+
+Core class for managing the connection between workers in a distributed environment.
+Use this class to create readable and writable operations, or read and write data to remote workers.
+
+This class is responsible for interfacing with the NIXL-based RDMA subsystem and providing a "Pythonic" interface
+with which to utilize GPU Direct RDMA accelerated data transfers between models hosted by different workers in a Dynamo pipeline.
+The connector provides two methods of moving data between workers:
+
+  - Preparing local memory to be written to by a remote worker.
+
+  - Preparing local memory to be read by a remote worker.
+
+In both cases, local memory is registered with the NIXL-based RDMA subsystem via the [`Descriptor`](#descriptor) class and provided to the connector.
+The connector then configures the RDMA subsystem to expose the memory for the requested operation and returns an operation control object.
+The operation control object, either a [`ReadableOperation`](readable_operation.md) or a [`WritableOperation`](writable_operation.md),
+provides RDMA metadata ([RdmaMetadata](rdma_metadata.md)) via its `.metadata()` method, functionality to query the operation's current state, as well as the ability to cancel the operation prior to its completion.
+
+The RDMA metadata must be provided to the remote worker expected to complete the operation.
+The metadata contains required information (identifiers, keys, etc.) which enables the remote worker to interact with the provided memory.
+
+> [!Warning]
+> RDMA metadata contains a worker's address as well as security keys to access specific registered memory descriptors.
+> This data provides direct memory access between workers, and should be considered sensitive and therefore handled accordingly.
+
+
+## Example Usage
+
+```python
+    @async_on_start
+    async def async_init(self):
+      runtime = dynamo_context["runtime"]
+
+      self.connector = dynamo.nixl_connect.Connector(runtime=runtime)
+      await self.connector.initialize()
+```
+
+> [!Tip]
+> See [`ReadOperation`](read_operation.md#example-usage), [`ReadableOperation`](readable_operation.md#example-usage),
+> [`WritableOperation`](writable_operation.md#example-usage), and [`WriteOperation`](write_operation.md#example-usage)
+> for additional examples.
+
+
+## Methods
+
+### `begin_read`
+
+```python
+async def begin_read(
+    self,
+    remote_metadata: RdmaMetadata,
+    local_descriptors: Descriptor | list[Descriptor],
+) -> ReadOperation:
+```
+
+Creates a [`ReadOperation`](read_operation.md) for transferring data from a remote worker.
+
+To create the operation, the serialized request from a remote worker's [`ReadableOperation`](readable_operation.md)
+along with a matching set of local memory descriptors which reference memory intended to receive data from the remote worker
+must be provided.
+The serialized request must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Once created, data transfer will begin immediately.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+Use [`.wait_for_completion()`](read_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
+
+### `begin_write`
+
+```python
+async def begin_write(
+    self,
+    local_descriptors: Descriptor | list[Descriptor],
+    remote_metadata: RdmaMetadata,
+) -> WriteOperation:
+```
+
+Creates a [`WriteOperation`](write_operation.md) for transferring data to a remote worker.
+
+To create the operation, the serialized request from a remote worker's [`WritableOperation`](writable_operation.md)
+along with a matching set of local memory descriptors which reference memory to be transferred to the remote worker
+must be provided.
+The serialized request must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Once created, data transfer will begin immediately.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+Use [`.wait_for_completion()`](write_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
+
+### `create_readable`
+
+```python
+def create_readable(
+    self,
+    local_descriptors: Descriptor | list[Descriptor],
+) -> ReadableOperation:
+```
+
+Creates a [`ReadableOperation`](readable_operation.md) for transferring data to a remote worker.
+
+To create the operation, a set of local memory descriptors must be provided that reference memory intended to be transferred to a remote worker.
+Once created, the memory referenced by the provided descriptors becomes immediately readable by a remote worker with the necessary metadata.
+The metadata required to access the memory referenced by the provided descriptors is accessible via the operation's `.metadata()` method.
+Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+Use [`.wait_for_completion()`](readable_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
+
+### `create_writable`
+
+```python
+def create_writable(
+    self,
+    local_descriptors: Descriptor | list[Descriptor],
+) -> WritableOperation:
+```
+
+Creates a [`WritableOperation`](writable_operation.md) for transferring data from a remote worker.
+
+To create the operation, a set of local memory descriptors must be provided which reference memory intended to receive data from a remote worker.
+Once created, the memory referenced by the provided descriptors becomes immediately writable by a remote worker with the necessary metadata.
+The metadata required to access the memory referenced by the provided descriptors is accessible via the operation's `.metadata()` method.
+Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+Use [`.wait_for_completion()`](writable_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
+
+
+## Properties
+
+### `is_cuda_available`
+
+```python
+@cached_property
+def is_cuda_available(self) -> bool:
+```
+
+Gets `True` when CUDA is available for the selected array module (most likely CuPy); otherwise `False`.
+
+### `name`
+
+```python
+@property
+def name(self) -> str | None:
+```
+
+Gets the Dynamo component name used by the connector.
+
+### `namespace`
+
+```python
+@property
+def namespace(self) -> str:
+```
+
+Gets the Dynamo namespace used by the connector.
+
+### `runtime`
+
+```python
+def runtime(self) -> dynamo.runtime.DistributedRuntime:
+```
+
+Gets the Dynamo distributed runtime instance associated with the connector.
+
+## Related Classes
+
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/descriptor.md
+++ b/docs/API/nixl_connect/descriptor.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# dynamo.nixl_connect.Descriptor
+
+Memory descriptor that ensures memory is registered with the NIXL base RDMA subsystem.
+Memory must be registered with the RDMA subsystem to enable interaction with the memory.
+
+Descriptor objects are administrative and do not copy, move, or otherwise modify the registered memory.
+
+There are four ways to create a descriptor:
+
+ 1. From a `torch.Tensor` object. Device information will be derived from the provided object.
+
+ 2. From a `tuple` containing either a NumPy or CuPy `ndarray` and information describing where the memory resides (Host/CPU vs GPU).
+
+ 3. From a Python `bytes` object. Memory is assumed to reside in CPU addressable host memory.
+
+ 4. From a `tuple` comprised of the address of the memory, its size in bytes, and device information.
+    An optional reference to a Python object can be provided to avoid garbage collection issues.
+
+
+## Methods
+
+### `register_memory`
+
+```python
+def register_memory(self, connector: Connector) -> None:
+```
+
+Instructs the descriptor to register its memory buffer with the NIXL based RDMA subsystem.
+
+Calling this method more than once on the same descriptor has no effect.
+
+When the descriptor is assigned to an RDMA operation, it will be automatically registered if was not explicitly registered.
+
+
+## Properties
+
+### `device`
+
+```python
+@property
+def device(self) -> Device:
+```
+
+Gets a reference to the [`Device`](device.md) that contains the buffer the descriptor represents.
+
+### `size`
+
+```python
+@property
+def size(self) -> int:
+```
+
+Gets the size of the memory allocation the descriptor represents.
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/device.md
+++ b/docs/API/nixl_connect/device.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.Device
+
+`Device` class describes the device a given allocation resides in.
+Usually host (`"cpu"`) or GPU (`"cuda"`) memory.
+
+When a system contains multiple GPU devices, specific GPU devices can be identified by including their ordinal index number.
+For example, to reference the second GPU in a system `"cuda:1"` can be used.
+
+By default, when `"cuda"` is provided, it is assumed to be `"cuda:0"` or the first GPU enumerated by the system.
+
+
+## Properties
+
+### `id`
+
+```python
+@property
+def id(self) -> int:
+```
+
+Gets the identity, or ordinal, of the device.
+
+When the device is the [`HOST`](device_kind.md#host), this value is always `0`.
+
+When the device is a [`GPU`](device_kind.md#cuda), this value identifies a specific GPU.
+
+### `kind`
+
+```python
+@property
+def kind(self) -> DeviceKind:
+```
+
+Gets the [`DeviceKind`](device_kind.md) of device the instance references.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [OperationStatus](operation_status.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/device_kind.md
+++ b/docs/API/nixl_connect/device_kind.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.DeviceKind(IntEnum)
+
+Represents the kind of device a [`Device`](device.md) object represents.
+
+
+## Values
+
+### `CUDA`
+
+CUDA addressable device (GPU) memory.
+
+### `HOST`
+
+System (CPU) memory.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/operation_status.md
+++ b/docs/API/nixl_connect/operation_status.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.OperationStatus(IntEnum)
+
+Represents the current state or status of an operation.
+
+
+## Values
+
+### `CANCELLED`
+
+The operation has been cancelled by the user or system.
+
+### `COMPLETE`
+
+The operation has been completed successfully.
+
+### `ERRORED`
+
+The operation has encountered an error and cannot be completed.
+
+### `IN_PROGRESS`
+
+The operation has been initialized and is in-progress (not completed, errored, or cancelled).
+
+### `INITIALIZED`
+
+The operation has been initialized and is ready to be processed.
+
+### `UNINITIALIZED`
+
+The operation has not been initialized yet and is not in a valid state.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/rdma_metadata.md
+++ b/docs/API/nixl_connect/rdma_metadata.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.RdmaMetadata
+
+A Pydantic type intended to provide JSON serialized RDMA metadata about a [`ReadableOperation`](readable_operation.md) or [`WritableOperation`](writable_operation.md) object.
+RDMA metadata contains detailed information about a worker process and how to access memory descriptors registered with it.
+This data is required to perform data transfers using the NIXL based RDMA subsystem.
+
+> [!Warning]
+> RDMA metadata contains a worker's address as well as security keys to access specific registered memory descriptors.
+> This data provides direct memory access between workers, and should be considered sensitive and therefore handled accordingly.
+
+Use the respective class's `.metadata()` method to generate an `RdmaMetadata` object for an operation.
+
+> [!Tip]
+> Classes using `RdmaMetadata` objects must be paired correctly.
+> [`ReadableOperation`](readable_operation.md) with [`ReadOperation`](read_operation.md), and
+> [`WritableOperation`](write_operation.md) with [`WriteOperation`](write_operation.md).
+> Incorrect pairing will result in an error being raised.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/read_operation.md
+++ b/docs/API/nixl_connect/read_operation.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.ReadOperation
+
+An operation which transfers data from a remote worker to the local worker.
+
+To create the operation, RDMA metadata ([RdmaMetadata](rdma_metadata.md)) from a remote worker's [`ReadableOperation`](readable_operation.md)
+along with a matching set of local [`Descriptor`](descriptor.md) objects which reference memory intended to receive data from the remote worker must be provided.
+The RDMA metadata must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Once created, data transfer will begin immediately.
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+
+## Example Usage
+
+```python
+    async def read_from_remote(
+      self,
+      remote_metadata: dynamo.nixl_connect.RdmaMetadata,
+      local_tensor: torch.Tensor
+    ) -> None:
+      descriptor = dynamo.nixl_connect.Descriptor(local_tensor)
+
+      with self.connector.begin_read(descriptor, remote_metadata) as read_op:
+        # Wait for the operation to complete writing data from the remote worker to local_tensor.
+        await read_op.wait_for_completion()
+```
+
+
+## Methods
+
+### `cancel`
+
+```python
+def cancel(self) -> None:
+```
+
+Instructs the RDMA subsystem to cancel the operation.
+Completed operations cannot be cancelled.
+
+### `wait_for_completion`
+
+```python
+async def wait_for_completion(self) -> None:
+```
+
+Blocks the caller until the memory from the remote worker has been transferred to the provided buffers.
+
+
+## Properties
+
+### `status`
+
+```python
+@property
+def status(self) -> OperationStatus:
+```
+
+Returns [`OperationStatus`](operation_status.md) which provides the current state (aka. status) of the operation.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/readable_operation.md
+++ b/docs/API/nixl_connect/readable_operation.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.ReadableOperation
+
+An operation which enables a remote worker to read data from the local worker.
+
+To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided that reference memory intended to be transferred to a remote worker.
+Once created, the memory referenced by the provided descriptors becomes immediately readable by a remote worker with the necessary metadata.
+The RDMA metadata ([RdmaMetadata](rdma_metadata.md)) required to access the memory referenced by the provided descriptors is accessible via the operations `.metadata()` method.
+Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+
+
+## Example Usage
+
+```python
+    async def send_data(
+      self,
+      local_tensor: torch.Tensor
+    ) -> None:
+      descriptor = dynamo.nixl_connect.Descriptor(local_tensor)
+
+      with self.connector.create_readable(descriptor) as read_op:
+        op_metadata = read_op.metadata()
+
+        # Send the metadata to the remote worker via sideband communication.
+        await self.notify_remote_data(op_metadata)
+        # Wait for the remote worker to complete its read operation of local_tensor.
+        # AKA send data to remote worker.
+        await read_op.wait_for_completion()
+```
+
+
+## Methods
+
+### `metadata`
+
+```python
+def metadata(self) -> RdmaMetadata:
+```
+
+Generates and returns the RDMA metadata ([RdmaMetadata](rdma_metadata.md)) required for a remote worker to read from the operation.
+Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+### `wait_for_completion`
+
+```python
+async def wait_for_completion(self) -> None:
+```
+
+Blocks the caller until the operation has received a completion signal from a remote worker.
+
+
+## Properties
+
+### `status`
+
+```python
+@property
+def status(self) -> OperationStatus:
+```
+
+Returns [`OperationStatus`](operation_status.md) which provides the current state (aka. status) of the operation.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [WritableOperation](writable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/writable_operation.md
+++ b/docs/API/nixl_connect/writable_operation.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.WritableOperation
+
+An operation which enables a remote worker to write data to the local worker.
+
+To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided which reference memory intended to receive data from a remote worker.
+Once created, the memory referenced by the provided descriptors becomes immediately writable by a remote worker with the necessary metadata.
+The RDMA metadata ([RdmaMetadata](rdma_metadata.md)) required to access the memory referenced by the provided descriptors is accessible via the operations `.metadata()` method.
+Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+Cancellation is handled asynchronously.
+
+
+## Example Usage
+
+```python
+    async def recv_data(
+      self,
+      local_tensor: torch.Tensor
+    ) -> None:
+      descriptor = dynamo.nixl_connect.Descriptor(local_tensor)
+
+      with self.connector.create_writable(descriptor) as write_op:
+        op_metadata = write_op.metadata()
+
+        # Send the metadata to the remote worker via sideband communication.
+        await self.request_remote_data(op_metadata)
+        # Wait the remote worker to complete its write operation to local_tensor.
+        # AKA receive data from remote worker.
+        await write_op.wait_for_completion()
+```
+
+
+## Methods
+
+### `metadata`
+
+```python
+def metadata(self) -> RdmaMetadata:
+```
+
+Generates and returns the RDMA metadata ([RdmaMetadata](rdma_metadata.md)) required for a remote worker to write to the operation.
+Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+### `wait_for_completion`
+
+```python
+async def wait_for_completion(self) -> None:
+```
+
+Blocks the caller until the operation has received a completion signal from a remote worker.
+
+
+## Properties
+
+### `status`
+
+```python
+@property
+def status(self) -> OperationStatus:
+```
+
+Returns [`OperationStatus`](operation_status.md) which provides the current state (aka. status) of the operation.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WriteOperation](write_operation.md)
--- a/docs/API/nixl_connect/write_operation.md
+++ b/docs/API/nixl_connect/write_operation.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# dynamo.nixl_connect.WriteOperation
+
+An operation which transfers data from the local worker to a remote worker.
+
+To create the operation, RDMA metadata ([RdmaMetadata](rdma_metadata.md)) from a remote worker's [`WritableOperation`](writable_operation.md)
+along with a matching set of local [`Descriptor`](descriptor.md) objects which reference memory to be transferred to the remote worker must be provided.
+The RDMA metadata must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
+
+Once created, data transfer will begin immediately.
+Disposal of the object will instruct the RDMA subsystem to cancel the operation,
+therefore the operation should be awaited until completed unless cancellation is intended.
+Cancellation is handled asynchronously.
+
+
+## Example Usage
+
+```python
+    async def write_to_remote(
+      self,
+      remote_metadata: dynamo.nixl_connect.RdmaMetadata,
+      local_tensor: torch.Tensor
+    ) -> None:
+      descriptor = dynamo.nixl_connect.Descriptor(local_tensor)
+
+      with self.connector.begin_write(descriptor, remote_metadata) as write_op:
+        # Wait for the operation to complete writing local_tensor to the remote worker.
+        await write_op.wait_for_completion()
+```
+
+
+## Methods
+
+### `cancel`
+
+```python
+def cancel(self) -> None:
+```
+
+Instructs the RDMA subsystem to cancel the operation.
+Completed operations cannot be cancelled.
+
+### `wait_for_completion`
+
+```python
+async def wait_for_completion(self) -> None:
+```
+
+Blocks the caller until all provided buffers have been transferred to the remote worker.
+
+
+## Properties
+
+### `status`
+
+```python
+@property
+def status(self) -> OperationStatus:
+```
+
+Returns [`OperationStatus`](operation_status.md) which provides the current state (aka. status) of the operation.
+
+
+## Related Classes
+
+  - [Connector](connector.md)
+  - [Descriptor](descriptor.md)
+  - [Device](device.md)
+  - [OperationStatus](operation_status.md)
+  - [RdmaMetadata](rdma_metadata.md)
+  - [ReadOperation](read_operation.md)
+  - [ReadableOperation](readable_operation.md)
+  - [WritableOperation](writable_operation.md)
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -128,6 +128,7 @@ The examples below assume you build the latest image yourself from source. If us
   :caption: API

   Python API <API/python_bindings.md>
+   NIXL Connect API <API/nixl_connect/README.md>

 .. toctree::
   :hidden:

--- a/examples/multimodal/components/prefill_worker.py
+++ b/examples/multimodal/components/prefill_worker.py
@@ -120,7 +120,7 @@ class VllmPrefillWorker:
            device=EMBEDDINGS_DEVICE,
        )
        descriptor = connect.Descriptor(embeddings)
-        # Register the descriptor w/ NIXL (this is optional, if not done here the connect subsytem will take care of this automatically).
+        # Register the descriptor w/ NIXL (this is optional, if not done here the connect subsystem will take care of this automatically).
        descriptor.register_memory(self._connector)
        self._embeddings_descriptor = (embeddings, descriptor)

@@ -196,7 +196,7 @@ class VllmPrefillWorker:
        )

        # Extract the pre-allocated, reusable image embeddings tensor and its descriptor.
-        # Doing this avoids unnessesary memory de/registration with NIXL.
+        # Doing this avoids unnecessary memory de/registration with NIXL.
        embeddings, descriptor = self._embeddings_descriptor

        # Create a new writable operation from the descriptor.

--- a/lib/bindings/python/src/dynamo/nixl_connect/__init__.py
+++ b/lib/bindings/python/src/dynamo/nixl_connect/__init__.py