Unverified Commit 2c3066bd authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: full migration of docs/ to fern format in fern/ (#6050)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
Co-authored-by: default avatarCursor <cursoragent@cursor.com>
parent d59b9d72
This diff is collapsed.
This diff is collapsed.
...@@ -16,8 +16,7 @@ Dynamo's coordination layer adapts to the deployment environment: ...@@ -16,8 +16,7 @@ Dynamo's coordination layer adapts to the deployment environment:
| **Kubernetes** (with operator) | Native K8s (CRDs, EndpointSlices) | NATS (optional) | TCP | | **Kubernetes** (with operator) | Native K8s (CRDs, EndpointSlices) | NATS (optional) | TCP |
| **Bare metal / Local** (default) | etcd | NATS (optional) | TCP | | **Bare metal / Local** (default) | etcd | NATS (optional) | TCP |
> [!NOTE] > **Note:** The runtime always defaults to `kv_store` (etcd) for service discovery. Kubernetes deployments must explicitly set `DYN_DISCOVERY_BACKEND=kubernetes` - the Dynamo operator handles this automatically.
> The runtime always defaults to `kv_store` (etcd) for service discovery. Kubernetes deployments must explicitly set `DYN_DISCOVERY_BACKEND=kubernetes` - the Dynamo operator handles this automatically.
``` ```
┌─────────────────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────────────────┐
...@@ -51,8 +50,7 @@ The operator explicitly sets: ...@@ -51,8 +50,7 @@ The operator explicitly sets:
DYN_DISCOVERY_BACKEND=kubernetes DYN_DISCOVERY_BACKEND=kubernetes
``` ```
> [!WARNING] > **Important:** This must be explicitly configured. The runtime defaults to `kv_store` in all environments.
> This must be explicitly configured. The runtime defaults to `kv_store` in all environments.
### How It Works ### How It Works
...@@ -461,5 +459,5 @@ This provides KV-aware routing with reduced accuracy but no NATS dependency. ...@@ -461,5 +459,5 @@ This provides KV-aware routing with reduced accuracy but no NATS dependency.
## Related Documentation ## Related Documentation
- [Distributed Runtime](distributed-runtime.md) - Runtime architecture - [Distributed Runtime](distributed-runtime.md) - Runtime architecture
- [Request Plane](../guides/request-plane.md) - Request transport configuration - [Request Plane](request-plane.md) - Request transport configuration
- [Fault Tolerance](../fault-tolerance/request-cancellation.md) - Failure handling - [Fault Tolerance](../fault-tolerance/README.md) - Failure handling
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -72,7 +72,6 @@ The `model_type` can be: ...@@ -72,7 +72,6 @@ The `model_type` can be:
- `model_name`: The name to call the model. Your incoming HTTP requests model name must match this. Defaults to the hugging face repo name or the folder name. - `model_name`: The name to call the model. Your incoming HTTP requests model name must match this. Defaults to the hugging face repo name or the folder name.
- `context_length`: Max model length in tokens. Defaults to the model's set max. Only set this if you need to reduce KV cache allocation to fit into VRAM. - `context_length`: Max model length in tokens. Defaults to the model's set max. Only set this if you need to reduce KV cache allocation to fit into VRAM.
- `kv_cache_block_size`: Size of a KV block for the engine, in tokens. Defaults to 16. - `kv_cache_block_size`: Size of a KV block for the engine, in tokens. Defaults to 16.
- `migration_limit`: Maximum number of times a request may be [migrated to another Instance](../fault-tolerance/request-migration.md). Defaults to 0.
- `user_data`: Optional dictionary containing custom metadata for worker behavior (e.g., LoRA configuration). Defaults to None. - `user_data`: Optional dictionary containing custom metadata for worker behavior (e.g., LoRA configuration). Defaults to None.
See `examples/backends` for full code examples. See `examples/backends` for full code examples.
......
...@@ -61,7 +61,7 @@ be operating within your distributed runtime. ...@@ -61,7 +61,7 @@ be operating within your distributed runtime.
The current examples use a hard-coded `namespace`. We will address the `namespace` collisions later. The current examples use a hard-coded `namespace`. We will address the `namespace` collisions later.
All examples require the `etcd` and `nats.io` pre-requisites to be running and available. Most examples require `etcd` for service discovery. `nats.io` is required for KV-aware routing with event tracking; for approximate mode (`--no-kv-events`), NATS is optional.
#### Rust `hello_world` #### Rust `hello_world`
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment