"lib/vscode:/vscode.git/clone" did not exist on "82eae1fdf53c4e88206696a570e8657e25b4aa87"
Unverified Commit fa4a7f1e authored by J Wyman's avatar J Wyman Committed by GitHub
Browse files

docs: Update nixl_connect README (#2320)

parent 5375af2c
......@@ -17,10 +17,23 @@ limitations under the License.
# Dynamo NIXL Connect
Dynamo connect provides utilities for using the NIXL base RDMA subsystem via a set of Python classes.
The primary goal of this library to simplify the integration of NIXL based RDMA into inference applications.
Dynamo NIXL Connect specializes in moving data between models/workers in a Dynamo Graph, and for the use cases where registration and memory regions need to be dynamic.
Dynamo connect provides utilities for such use cases, using the NIXL-based I/O subsystem via a set of Python classes.
The relaxed registration comes with some performance overheads, but simplifies the integration process.
Especially for larger data transfer operations, such as between models in a multi-model graph, the overhead would be marginal.
The `dynamo.nixl_connect` library can be imported by any Dynamo container hosted application.
> [!Note]
> Dynamo NIXL Connect will pick the best available method of data transfer available to it.
> The available methods depend on the hardware and software configuration of the machines and network running the graph.
> GPU Direct RDMA operations require that both ends of the operation have:
> - NIC and GPU capable of performing RDMA operations
> - Device drivers that support GPU-NIC direct interactions (aka "zero copy") and RDMA operations
> - Network that supports InfiniBand or RoCE
>
> With any of the above not satisfied, GPU Direct RDMA will not be available to the graph's workers, and less-optimal methods will be utilized to ensure basic functionality.
> For additional information, please read this [GPUDirect RDMA](https://docs.nvidia.com/cuda/pdf/GPUDirect_RDMA.pdf) document.
```python
import dynamo.nixl_connect
```
......@@ -30,11 +43,11 @@ There are four types of supported operations:
1. **Register local readable memory**:
Register local memory buffer(s) with the RDMA subsystem to enable a remote worker to read from.
Register local memory buffer(s) with the NIXL subsystem to enable a remote worker to read from.
2. **Register local writable memory**:
Register local memory buffer(s) with the RDMA subsystem to enable a remote worker to write to.
Register local memory buffer(s) with the NIXL subsystem to enable a remote worker to write to.
3. **Read from registered, remote memory**:
......@@ -44,7 +57,7 @@ There are four types of supported operations:
Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker to writable.
By connecting correctly paired operations, high-throughput GPU Direct RDMA data transfers can be completed.
When available, by connecting correctly paired operations, high-throughput GPU Direct RDMA data transfers can be completed.
Given the list above, the correct pairing of operations would be 1 & 3 or 2 & 4.
Where one side is a "(read|write)-able operation" and the other is its correctly paired "(read|write) operation".
Specifically, a read operation must be paired with a readable operation, and a write operation must be paired with a writable operation.
......@@ -58,9 +71,9 @@ sequenceDiagram
LocalWorker ->> NIXL: Register memory (Descriptor)
RemoteWorker ->> NIXL: Register memory (Descriptor)
LocalWorker ->> LocalWorker: Create Readable/WritableOperation
LocalWorker ->> RemoteWorker: Send RDMA metadata (via HTTP/TCP+NATS)
LocalWorker ->> RemoteWorker: Send NIXL metadata (via HTTP/TCP+NATS)
RemoteWorker ->> NIXL: Begin Read/WriteOperation with metadata
NIXL -->> RemoteWorker: Data transfer (RDMA)
NIXL -->> RemoteWorker: Data transfer
RemoteWorker -->> LocalWorker: Notify completion (unblock awaiter)
```
......@@ -69,12 +82,12 @@ sequenceDiagram
### Generic Example
In the diagram below, Local creates a [`WritableOperation`](writable_operation.md) intended to receive data from Remote.
Local then sends metadata about the requested RDMA operation to Remote.
Remote then uses the metadata to create a [`WriteOperation`](write_operation.md) which will perform the GPU Direct RDMA memory transfer from Remote's GPU memory to Local's GPU memory.
Local then sends metadata about the requested operation to Remote.
Remote then uses the metadata to create a [`WriteOperation`](write_operation.md) which will perform the GPU Direct RDMA memory transfer, when available, from Remote's GPU memory to Local's GPU memory.
```mermaid
---
title: Write Operation Between Two Workers
title: Write Operation Between Two Workers (RDMA available)
---
flowchart LR
c1[Remote] --"3: .begin_write()"--- WriteOperation
......@@ -85,6 +98,9 @@ flowchart LR
e2@{ animate: true; }
```
> [!Note]
> When RDMA isn't available, the NIXL data transfer will still complete using non-accelerated methods.
### Multimodal Example
In the case of the [Dynamo Multimodal Disaggregated Example](../../examples/multimodal/README.md):
......@@ -133,7 +149,7 @@ flowchart LR
> [!Note]
> In this example, it is the data transfer between the Prefill Worker and the Encode Worker that utilizes the Dynamo NIXL Connect library.
> The KV Cache transfer between Decode Worker and Prefill Worker utilizes the NIXL base RDMA subsystem directly without using the Dynamo NIXL Connect library.
> The KV Cache transfer between Decode Worker and Prefill Worker utilizes a different connector that also uses the NIXL-based I/O subsystem underneath.
#### Code Examples
......@@ -142,7 +158,7 @@ for how they coordinate directly with the Encode Worker by creating a [`Writable
sending the operation's metadata via Dynamo's round-robin dispatcher, and awaiting the operation for completion before making use of the transferred data.
See [encode_worker](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal/components/encode_worker.py#L190) from our Multimodal example,
for how the resulting embeddings are registered with the RDMA subsystem by creating a [`Descriptor`](descriptor.md),
for how the resulting embeddings are registered with the NIXL subsystem by creating a [`Descriptor`](descriptor.md),
a [`WriteOperation`](write_operation.md) is created using the metadata provided by the requesting worker,
and the worker awaits for the data transfer to complete for yielding a response.
......
......@@ -20,24 +20,24 @@ limitations under the License.
Core class for managing the connection between workers in a distributed environment.
Use this class to create readable and writable operations, or read and write data to remote workers.
This class is responsible for interfacing with the NIXL-based RDMA subsystem and providing a "Pythonic" interface
with which to utilize GPU Direct RDMA accelerated data transfers between models hosted by different workers in a Dynamo pipeline.
This class provides a "pythonic" interface using NIXL library to utilize GPU Direct RDMA accelerated, when available, data transfers between models hosted by different workers in a Dynamo graph.
The connector provides two methods of moving data between workers:
- Preparing local memory to be written to by a remote worker.
- Preparing local memory to be read by a remote worker.
In both cases, local memory is registered with the NIXL-based RDMA subsystem via the [`Descriptor`](#descriptor) class and provided to the connector.
The connector then configures the RDMA subsystem to expose the memory for the requested operation and returns an operation control object.
In both cases, local memory is registered with the NIXL-based I/O subsystem via the [`Descriptor`](#descriptor) class and provided to the connector.
When RDMA is available, the connector then configures the RDMA subsystem to expose the memory for the requested operation and returns an operation control object;
otherwise the connector will select the best available RDMA alternative.
The operation control object, either a [`ReadableOperation`](readable_operation.md) or a [`WritableOperation`](writable_operation.md),
provides RDMA metadata ([RdmaMetadata](rdma_metadata.md)) via its `.metadata()` method, functionality to query the operation's current state, as well as the ability to cancel the operation prior to its completion.
provides NIXL metadata ([RdmaMetadata](rdma_metadata.md)) via its `.metadata()` method, functionality to query the operation's current state, as well as the ability to cancel the operation prior to its completion.
The RDMA metadata must be provided to the remote worker expected to complete the operation.
The NIXL metadata must be provided to the remote worker expected to complete the operation.
The metadata contains required information (identifiers, keys, etc.) which enables the remote worker to interact with the provided memory.
> [!Warning]
> RDMA metadata contains a worker's address as well as security keys to access specific registered memory descriptors.
> NIXL metadata contains a worker's address as well as security keys to access specific registered memory descriptors.
> This data provides direct memory access between workers, and should be considered sensitive and therefore handled accordingly.
......@@ -79,7 +79,7 @@ The serialized request must be transferred from the remote to the local worker v
Once created, data transfer will begin immediately.
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
Disposal of the object will instruct the NIXL subsystem to cancel the operation,
therefore the operation should be awaited until completed unless cancellation is intended.
Use [`.wait_for_completion()`](read_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
......@@ -103,7 +103,7 @@ The serialized request must be transferred from the remote to the local worker v
Once created, data transfer will begin immediately.
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
Disposal of the object will instruct the NIXL subsystem to cancel the operation,
therefore the operation should be awaited until completed unless cancellation is intended.
Use [`.wait_for_completion()`](write_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
......@@ -124,7 +124,7 @@ Once created, the memory referenced by the provided descriptors becomes immediat
The metadata required to access the memory referenced by the provided descriptors is accessible via the operation's `.metadata()` method.
Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
Disposal of the object will instruct the NIXL subsystem to cancel the operation,
therefore the operation should be awaited until completed unless cancellation is intended.
Use [`.wait_for_completion()`](readable_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
......@@ -145,7 +145,7 @@ Once created, the memory referenced by the provided descriptors becomes immediat
The metadata required to access the memory referenced by the provided descriptors is accessible via the operation's `.metadata()` method.
Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
Disposal of the object will instruct the NIXL subsystem to cancel the operation,
therefore the operation should be awaited until completed unless cancellation is intended.
Use [`.wait_for_completion()`](writable_operation.md#wait_for_completion) to block the caller until the operation has completed or encountered an error.
......
......@@ -16,8 +16,8 @@ limitations under the License.
-->
# dynamo.nixl_connect.Descriptor
Memory descriptor that ensures memory is registered with the NIXL base RDMA subsystem.
Memory must be registered with the RDMA subsystem to enable interaction with the memory.
Memory descriptor that ensures memory is registered with the NIXL-base I/O subsystem.
Memory must be registered with the NIXL subsystem to enable interaction with the memory.
Descriptor objects are administrative and do not copy, move, or otherwise modify the registered memory.
......@@ -41,11 +41,11 @@ There are four ways to create a descriptor:
def register_memory(self, connector: Connector) -> None:
```
Instructs the descriptor to register its memory buffer with the NIXL based RDMA subsystem.
Instructs the descriptor to register its memory buffer with the NIXL-based I/O subsystem.
Calling this method more than once on the same descriptor has no effect.
When the descriptor is assigned to an RDMA operation, it will be automatically registered if was not explicitly registered.
When the descriptor is assigned to a NIXL operation, it will be automatically registered if was not explicitly registered.
## Properties
......
......@@ -17,12 +17,12 @@ limitations under the License.
# dynamo.nixl_connect.RdmaMetadata
A Pydantic type intended to provide JSON serialized RDMA metadata about a [`ReadableOperation`](readable_operation.md) or [`WritableOperation`](writable_operation.md) object.
RDMA metadata contains detailed information about a worker process and how to access memory descriptors registered with it.
This data is required to perform data transfers using the NIXL based RDMA subsystem.
A Pydantic type intended to provide JSON serialized NIXL metadata about a [`ReadableOperation`](readable_operation.md) or [`WritableOperation`](writable_operation.md) object.
NIXL metadata contains detailed information about a worker process and how to access memory regions registered with the corresponding agent.
This data is required to perform data transfers using the NIXL-based I/O subsystem.
> [!Warning]
> RDMA metadata contains a worker's address as well as security keys to access specific registered memory descriptors.
> NIXL metadata contains information to connect corresponding backends across agents, as well as identification keys to access specific registered memory regions.
> This data provides direct memory access between workers, and should be considered sensitive and therefore handled accordingly.
Use the respective class's `.metadata()` method to generate an `RdmaMetadata` object for an operation.
......
......@@ -19,12 +19,12 @@ limitations under the License.
An operation which transfers data from a remote worker to the local worker.
To create the operation, RDMA metadata ([RdmaMetadata](rdma_metadata.md)) from a remote worker's [`ReadableOperation`](readable_operation.md)
To create the operation, NIXL metadata ([RdmaMetadata](rdma_metadata.md)) from a remote worker's [`ReadableOperation`](readable_operation.md)
along with a matching set of local [`Descriptor`](descriptor.md) objects which reference memory intended to receive data from the remote worker must be provided.
The RDMA metadata must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
The NIXL metadata must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
Once created, data transfer will begin immediately.
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
Disposal of the object will instruct the NIXL subsystem to cancel the operation,
therefore the operation should be awaited until completed unless cancellation is intended.
......@@ -52,7 +52,7 @@ therefore the operation should be awaited until completed unless cancellation is
def cancel(self) -> None:
```
Instructs the RDMA subsystem to cancel the operation.
Instructs the NIXL subsystem to cancel the operation.
Completed operations cannot be cancelled.
### `wait_for_completion`
......
......@@ -21,10 +21,10 @@ An operation which enables a remote worker to read data from the local worker.
To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided that reference memory intended to be transferred to a remote worker.
Once created, the memory referenced by the provided descriptors becomes immediately readable by a remote worker with the necessary metadata.
The RDMA metadata ([RdmaMetadata](rdma_metadata.md)) required to access the memory referenced by the provided descriptors is accessible via the operations `.metadata()` method.
The NIXL metadata ([RdmaMetadata](rdma_metadata.md)) required to access the memory referenced by the provided descriptors is accessible via the operations `.metadata()` method.
Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
Disposal of the object will instruct the NIXL subsystem to cancel the operation,
therefore the operation should be awaited until completed unless cancellation is intended.
......@@ -56,7 +56,7 @@ therefore the operation should be awaited until completed unless cancellation is
def metadata(self) -> RdmaMetadata:
```
Generates and returns the RDMA metadata ([RdmaMetadata](rdma_metadata.md)) required for a remote worker to read from the operation.
Generates and returns the NIXL metadata ([RdmaMetadata](rdma_metadata.md)) required for a remote worker to read from the operation.
Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
### `wait_for_completion`
......
......@@ -21,10 +21,10 @@ An operation which enables a remote worker to write data to the local worker.
To create the operation, a set of local [`Descriptor`](descriptor.md) objects must be provided which reference memory intended to receive data from a remote worker.
Once created, the memory referenced by the provided descriptors becomes immediately writable by a remote worker with the necessary metadata.
The RDMA metadata ([RdmaMetadata](rdma_metadata.md)) required to access the memory referenced by the provided descriptors is accessible via the operations `.metadata()` method.
The NIXL metadata ([RdmaMetadata](rdma_metadata.md)) required to access the memory referenced by the provided descriptors is accessible via the operations `.metadata()` method.
Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
Disposal of the object will instruct the NIXL subsystem to cancel the operation,
therefore the operation should be awaited until completed unless cancellation is intended.
Cancellation is handled asynchronously.
......@@ -57,7 +57,7 @@ Cancellation is handled asynchronously.
def metadata(self) -> RdmaMetadata:
```
Generates and returns the RDMA metadata ([RdmaMetadata](rdma_metadata.md)) required for a remote worker to write to the operation.
Generates and returns the NIXL metadata ([RdmaMetadata](rdma_metadata.md)) required for a remote worker to write to the operation.
Once acquired, the metadata needs to be provided to a remote worker via a secondary channel, most likely HTTP or TCP+NATS.
### `wait_for_completion`
......
......@@ -19,12 +19,12 @@ limitations under the License.
An operation which transfers data from the local worker to a remote worker.
To create the operation, RDMA metadata ([RdmaMetadata](rdma_metadata.md)) from a remote worker's [`WritableOperation`](writable_operation.md)
To create the operation, NIXL metadata ([RdmaMetadata](rdma_metadata.md)) from a remote worker's [`WritableOperation`](writable_operation.md)
along with a matching set of local [`Descriptor`](descriptor.md) objects which reference memory to be transferred to the remote worker must be provided.
The RDMA metadata must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
The NIXL metadata must be transferred from the remote to the local worker via a secondary channel, most likely HTTP or TCP+NATS.
Once created, data transfer will begin immediately.
Disposal of the object will instruct the RDMA subsystem to cancel the operation,
Disposal of the object will instruct the NIXL subsystem to cancel the operation,
therefore the operation should be awaited until completed unless cancellation is intended.
Cancellation is handled asynchronously.
......@@ -53,7 +53,7 @@ Cancellation is handled asynchronously.
def cancel(self) -> None:
```
Instructs the RDMA subsystem to cancel the operation.
Instructs the NIXL subsystem to cancel the operation.
Completed operations cannot be cancelled.
### `wait_for_completion`
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment