Unverified Commit 2c3066bd authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: full migration of docs/ to fern format in fern/ (#6050)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
Co-authored-by: default avatarCursor <cursoragent@cursor.com>
parent d59b9d72
This diff is collapsed.
This diff is collapsed.
......@@ -16,8 +16,7 @@ Dynamo's coordination layer adapts to the deployment environment:
| **Kubernetes** (with operator) | Native K8s (CRDs, EndpointSlices) | NATS (optional) | TCP |
| **Bare metal / Local** (default) | etcd | NATS (optional) | TCP |
> [!NOTE]
> The runtime always defaults to `kv_store` (etcd) for service discovery. Kubernetes deployments must explicitly set `DYN_DISCOVERY_BACKEND=kubernetes` - the Dynamo operator handles this automatically.
> **Note:** The runtime always defaults to `kv_store` (etcd) for service discovery. Kubernetes deployments must explicitly set `DYN_DISCOVERY_BACKEND=kubernetes` - the Dynamo operator handles this automatically.
```
┌─────────────────────────────────────────────────────────────────────┐
......@@ -51,8 +50,7 @@ The operator explicitly sets:
DYN_DISCOVERY_BACKEND=kubernetes
```
> [!WARNING]
> This must be explicitly configured. The runtime defaults to `kv_store` in all environments.
> **Important:** This must be explicitly configured. The runtime defaults to `kv_store` in all environments.
### How It Works
......@@ -461,5 +459,5 @@ This provides KV-aware routing with reduced accuracy but no NATS dependency.
## Related Documentation
- [Distributed Runtime](distributed-runtime.md) - Runtime architecture
- [Request Plane](../guides/request-plane.md) - Request transport configuration
- [Fault Tolerance](../fault-tolerance/request-cancellation.md) - Failure handling
- [Request Plane](request-plane.md) - Request transport configuration
- [Fault Tolerance](../fault-tolerance/README.md) - Failure handling
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -72,7 +72,6 @@ The `model_type` can be:
- `model_name`: The name to call the model. Your incoming HTTP requests model name must match this. Defaults to the hugging face repo name or the folder name.
- `context_length`: Max model length in tokens. Defaults to the model's set max. Only set this if you need to reduce KV cache allocation to fit into VRAM.
- `kv_cache_block_size`: Size of a KV block for the engine, in tokens. Defaults to 16.
- `migration_limit`: Maximum number of times a request may be [migrated to another Instance](../fault-tolerance/request-migration.md). Defaults to 0.
- `user_data`: Optional dictionary containing custom metadata for worker behavior (e.g., LoRA configuration). Defaults to None.
See `examples/backends` for full code examples.
......
......@@ -61,7 +61,7 @@ be operating within your distributed runtime.
The current examples use a hard-coded `namespace`. We will address the `namespace` collisions later.
All examples require the `etcd` and `nats.io` pre-requisites to be running and available.
Most examples require `etcd` for service discovery. `nats.io` is required for KV-aware routing with event tracking; for approximate mode (`--no-kv-events`), NATS is optional.
#### Rust `hello_world`
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment