feat: KV Block Manager Python bindings (#1022)

437cae0a · Jacky · GitHub · a6899da9 · 437cae0a · 437cae0a
Unverified Commit 437cae0a authored May 19, 2025 by Jacky Committed by GitHub May 19, 2025
13 changed files
--- a/ATTRIBUTIONS-Rust.md
+++ b/ATTRIBUTIONS-Rust.md
@@ -720,6 +720,184 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.

+```
+## dlpark - 0.5.x
+
+- **Repository URL**: https://github.com/SunDoge/dlpark
+- **License URL**: https://github.com/SunDoge/dlpark/blob/main/LICENSE
+### License Text:
+
+```
+Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "{}"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright 2017 by dlpack Contributors
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
 ```
 ## educe - 0.6.0


--- a/lib/bindings/python/Cargo.lock
+++ b/lib/bindings/python/Cargo.lock
@@ -419,6 +419,26 @@ version = "1.7.3"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "89e25b6adfb930f02d1981565a6e5d9c547ac15a96606256d3b59040e5cd4ca3"

+[[package]]
+name = "bindgen"
+version = "0.71.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5f58bf3d7db68cfbac37cfc485a8d711e87e064c3d0fe0435b92f7a407f9d6b3"
+dependencies = [
+ "bitflags 2.9.0",
+ "cexpr",
+ "clang-sys",
+ "itertools 0.11.0",
+ "log",
+ "prettyplease",
+ "proc-macro2",
+ "quote",
+ "regex",
+ "rustc-hash",
+ "shlex",
+ "syn 2.0.100",
+]
+
 [[package]]
 name = "bit-set"
 version = "0.8.0"
@@ -553,6 +573,15 @@ dependencies = [
 "shlex",
 ]

+[[package]]
+name = "cexpr"
+version = "0.6.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6fac387a98bb7c37292057cffc56d62ecb629900026402633ae9160df93a8766"
+dependencies = [
+ "nom",
+]
+
 [[package]]
 name = "cfg-expr"
 version = "0.15.8"
@@ -594,6 +623,17 @@ dependencies = [
 "windows-link",
 ]

+[[package]]
+name = "clang-sys"
+version = "1.8.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0b023947811758c97c59bf9d1c188fd619ad4718dcaa767947df1cadb14f39f4"
+dependencies = [
+ "glob",
+ "libc",
+ "libloading",
+]
+
 [[package]]
 name = "clap"
 version = "4.5.37"
@@ -777,6 +817,15 @@ dependencies = [
 "typenum",
 ]

+[[package]]
+name = "cudarc"
+version = "0.16.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f9574894139a982bf26fbb44473a9d416c015e779c51ef0fbc0789f1a1c17b25"
+dependencies = [
+ "libloading",
+]
+
 [[package]]
 name = "curve25519-dalek"
 version = "4.1.3"
@@ -986,6 +1035,16 @@ dependencies = [
 "syn 2.0.100",
 ]

+[[package]]
+name = "dlpark"
+version = "0.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "dc178fc3bf4ce54c26ccffcf271ff574954ac4b940f15121be3d69f277194537"
+dependencies = [
+ "half",
+ "pyo3",
+]
+
 [[package]]
 name = "dyn-stack"
 version = "0.10.0"
@@ -1044,6 +1103,7 @@ dependencies = [
 "bytes",
 "candle-core",
 "chrono",
+ "cudarc",
 "derive-getters",
 "derive_builder",
 "dynamo-runtime",
@@ -1058,6 +1118,8 @@ dependencies = [
 "memmap2",
 "minijinja",
 "minijinja-contrib",
+ "ndarray",
+ "nixl-sys",
 "oneshot",
 "prometheus",
 "rand 0.9.1",
@@ -1086,6 +1148,7 @@ dependencies = [
 name = "dynamo-py3"
 version = "0.2.1"
 dependencies = [
+ "dlpark",
 "dynamo-engine-python",
 "dynamo-llm",
 "dynamo-runtime",
@@ -1817,6 +1880,12 @@ version = "0.31.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "07e28edb80900c19c28f1072f2e8aeca7fa06b23cd4169cefe1af5aa3260783f"

+[[package]]
+name = "glob"
+version = "0.3.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a8d1add55171497b4705a648c6b583acafb01d58050a51727785f0b2c8e0a2b2"
+
 [[package]]
 name = "h2"
 version = "0.4.9"
@@ -2463,6 +2532,16 @@ version = "0.8.4"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "47e1ffaa40ddd1f3ed91f717a33c8c0ee23fff369e3aa8772b9605cc1d22f4c3"

+[[package]]
+name = "matrixmultiply"
+version = "0.3.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9380b911e3e96d10c1f415da0876389aaf1b56759054eeb0de7df940c456ba1a"
+dependencies = [
+ "autocfg",
+ "rawpointer",
+]
+
 [[package]]
 name = "memchr"
 version = "2.7.4"
@@ -2615,6 +2694,21 @@ version = "0.10.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "defc4c55412d89136f966bbb339008b474350e5e6e78d2714439c386b3137a03"

+[[package]]
+name = "ndarray"
+version = "0.16.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "882ed72dce9365842bf196bdeedf5055305f11fc8c03dee7bb0194a6cad34841"
+dependencies = [
+ "matrixmultiply",
+ "num-complex",
+ "num-integer",
+ "num-traits",
+ "portable-atomic",
+ "portable-atomic-util",
+ "rawpointer",
+]
+
 [[package]]
 name = "neli"
 version = "0.6.5"
@@ -2674,6 +2768,21 @@ dependencies = [
 "libc",
 ]

+[[package]]
+name = "nixl-sys"
+version = "0.2.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "84bf333c75733cad60b29873d84168f841c6bd5207ae9dfbda7490a99c1ebe94"
+dependencies = [
+ "bindgen",
+ "cc",
+ "libc",
+ "pkg-config",
+ "serde",
+ "thiserror 2.0.12",
+ "tracing",
+]
+
 [[package]]
 name = "nkeys"
 version = "0.4.4"
@@ -3043,6 +3152,15 @@ version = "1.11.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "350e9b48cbc6b0e028b0473b114454c6316e57336ee184ceab6e53f72c178b3e"

+[[package]]
+name = "portable-atomic-util"
+version = "0.2.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d8a2f0d8d040d7848a709caf78912debcc3f33ee4b3cac47d73d1e1069e83507"
+dependencies = [
+ "portable-atomic",
+]
+
 [[package]]
 name = "powerfmt"
 version = "0.2.0"
@@ -3491,6 +3609,12 @@ dependencies = [
 "bitflags 2.9.0",
 ]

+[[package]]
+name = "rawpointer"
+version = "0.2.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "60a357793950651c4ed0f3f52338f53b2f809f32d83a07f72909fa13e4c6c1e3"
+
 [[package]]
 name = "rayon"
 version = "1.10.0"

--- a/lib/bindings/python/Cargo.toml
+++ b/lib/bindings/python/Cargo.toml
@@ -33,8 +33,11 @@ name = "_core"
 # "rlib" is necessary to support doctests.
 crate-type = ["cdylib", "rlib"]

-[dependencies]
+[features]
+default = []
+block-manager = ["dynamo-llm/block-manager", "dep:dlpark"]

+[dependencies]
 dynamo-llm = { path = "../../llm" }
 dynamo-runtime = { path = "../../runtime" }
 dynamo-engine-python = { path = "../../engines/python" }
@@ -67,3 +70,5 @@ pyo3-async-runtimes = { version = "0.23.0", default-features = false, features =
 ] }

 pythonize = "0.23"
+
+dlpark = { version = "0.5", features = ["pyo3", "half"], optional = true }
--- a/lib/bindings/python/pyproject.toml
+++ b/lib/bindings/python/pyproject.toml
@@ -51,6 +51,7 @@ module-name = "dynamo._core"
 manifest-path = "Cargo.toml"
 python-packages = ["dynamo"]
 python-source = "src"
+features = ["block-manager"]

 [build-system]
 requires = ["maturin>=1.0,<2.0", "patchelf"]

--- a/lib/bindings/python/rust/lib.rs
+++ b/lib/bindings/python/rust/lib.rs
@@ -83,6 +83,9 @@ fn _core(m: &Bound<'_, PyModule>) -> PyResult<()> {

    engine::add_to_module(m)?;

+    #[cfg(feature = "block-manager")]
+    llm::block_manager::add_to_module(m)?;
+
    Ok(())
 }


--- a/lib/bindings/python/rust/llm.rs
+++ b/lib/bindings/python/rust/llm.rs
@@ -39,6 +39,7 @@
 use super::*;

 pub mod backend;
+pub mod block_manager;
 pub mod disagg_router;
 pub mod kv;
 pub mod model_card;

--- a/lib/bindings/python/rust/llm/block_manager.rs
+++ b/lib/bindings/python/rust/llm/block_manager.rs
+// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#![cfg(feature = "block-manager")]
+// Silence warnings about deprecated features (like pyo3::IntoPy::into_py)
+#![allow(deprecated)]
+
+use super::*;
+use pyo3::PyResult;
+use tokio;
+
+mod block;
+mod block_list;
+
+/// Add bingings from this crate to the provided module
+pub fn add_to_module(m: &Bound<'_, PyModule>) -> PyResult<()> {
+    m.add_class::<block::Block>()?;
+    m.add_class::<block_list::BlockList>()?;
+    m.add_class::<BlockManager>()?;
+    Ok(())
+}
+
+#[pyclass]
+pub struct BlockManager {
+    // TODO: Can this be implicitly created and referenced?
+    tokio_runtime: tokio::runtime::Runtime,
+    // Block manager
+    inner: Arc<dynamo_llm::block_manager::ReferenceBlockManager>,
+    // TODO: Metadata should be stored in the block manager?
+    dtype: dynamo_llm::common::dtype::DType,
+    device_id: usize,
+}
+
+#[pymethods]
+impl BlockManager {
+    #[new]
+    #[pyo3(signature = (worker_id, num_layer, page_size, inner_dim, dtype=None, host_num_blocks=None, device_num_blocks=None, device_id=0))]
+    fn new(
+        worker_id: u64,
+        num_layer: usize,
+        page_size: usize,
+        inner_dim: usize,
+        dtype: Option<String>,
+        host_num_blocks: Option<usize>,
+        device_num_blocks: Option<usize>,
+        device_id: usize,
+    ) -> PyResult<Self> {
+        let mut config = dynamo_llm::block_manager::KvBlockManagerConfig::builder().runtime(
+            dynamo_llm::block_manager::KvManagerRuntimeConfig::builder()
+                .worker_id(worker_id)
+                .build()
+                .unwrap(),
+        );
+        let mut model_config = dynamo_llm::block_manager::KvManagerModelConfig::builder()
+            .num_layers(num_layer)
+            .page_size(page_size)
+            .inner_dim(inner_dim);
+        let mut dtype_ = dynamo_llm::common::dtype::DType::FP16; // Default in block_manager config
+        if let Some(dtype_str) = dtype {
+            dtype_ = match dtype_str.as_str() {
+                "fp8" | "FP8" => dynamo_llm::common::dtype::DType::FP8,
+                "fp16" | "FP16" => dynamo_llm::common::dtype::DType::FP16,
+                "bf16" | "BF16" => dynamo_llm::common::dtype::DType::BF16,
+                "fp32" | "FP32" => dynamo_llm::common::dtype::DType::FP32,
+                "u8" | "U8" => dynamo_llm::common::dtype::DType::U8,
+                "u16" | "U16" => dynamo_llm::common::dtype::DType::U16,
+                "u32" | "U32" => dynamo_llm::common::dtype::DType::U32,
+                "u64" | "U64" => dynamo_llm::common::dtype::DType::U64,
+                "i8" | "I8" => dynamo_llm::common::dtype::DType::I8,
+                "i16" | "I16" => dynamo_llm::common::dtype::DType::I16,
+                "i32" | "I32" => dynamo_llm::common::dtype::DType::I32,
+                "i64" | "I64" => dynamo_llm::common::dtype::DType::I64,
+                _ => {
+                    return Err(pyo3::exceptions::PyValueError::new_err(format!(
+                        "Unsupported dtype: {}",
+                        dtype_str
+                    )))
+                }
+            };
+        }
+        model_config = model_config.dtype(dtype_.clone());
+        config = config.model(model_config.build().unwrap());
+        if let Some(host_num_blocks) = host_num_blocks {
+            config = config.host_layout(
+                dynamo_llm::block_manager::KvManagerLayoutConfig::builder()
+                    .num_blocks(host_num_blocks)
+                    .allocator(dynamo_llm::block_manager::storage::PinnedAllocator::new().unwrap())
+                    .build()
+                    .unwrap(),
+            );
+        }
+        if let Some(device_num_blocks) = device_num_blocks {
+            config = config.device_layout(
+                dynamo_llm::block_manager::KvManagerLayoutConfig::builder()
+                    .num_blocks(device_num_blocks)
+                    .allocator(
+                        dynamo_llm::block_manager::storage::DeviceAllocator::new(device_id)
+                            .unwrap(),
+                    )
+                    .build()
+                    .unwrap(),
+            );
+        }
+        let config = config.build().unwrap();
+        let tokio_runtime = tokio::runtime::Builder::new_multi_thread()
+            .enable_all()
+            .build()
+            .unwrap();
+        let block_manager = tokio_runtime.block_on(async {
+            dynamo_llm::block_manager::ReferenceBlockManager::new(config).unwrap()
+        });
+        Ok(BlockManager {
+            tokio_runtime: tokio_runtime,
+            inner: Arc::from(block_manager),
+            dtype: dtype_,
+            device_id: device_id,
+        })
+    }
+
+    fn allocate_host_blocks_blocking(&self, count: usize) -> PyResult<block_list::BlockList> {
+        let blocks = self
+            .inner
+            .host()
+            .unwrap()
+            .allocate_blocks_blocking(count)
+            .unwrap();
+        // Wrap each block in an enum accounting for Pinned & Device block
+        let blocks = blocks
+            .into_iter()
+            .map(|b| block::BlockType::Pinned(b))
+            .collect();
+        Ok(block_list::BlockList::from_rust(
+            blocks,
+            self.dtype.clone(),
+            self.device_id,
+        ))
+    }
+
+    fn allocate_device_blocks_blocking(&self, count: usize) -> PyResult<block_list::BlockList> {
+        let blocks = self
+            .inner
+            .device()
+            .unwrap()
+            .allocate_blocks_blocking(count)
+            .unwrap();
+        // Wrap each block in an enum accounting for Pinned & Device block
+        let blocks = blocks
+            .into_iter()
+            .map(|b| block::BlockType::Device(b))
+            .collect();
+        Ok(block_list::BlockList::from_rust(
+            blocks,
+            self.dtype.clone(),
+            self.device_id,
+        ))
+    }
+}
--- a/lib/bindings/python/rust/llm/block_manager/block.rs
+++ b/lib/bindings/python/rust/llm/block_manager/block.rs
+// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#![cfg(feature = "block-manager")]
+// Silence warnings about deprecated features (like pyo3::IntoPy::into_py)
+#![allow(deprecated)]
+
+use super::*;
+
+use dlpark::prelude::{DataType, Device, ManagerCtx, ShapeAndStrides, ToTensor};
+use pyo3::{ffi::c_str, prelude::IntoPy, types::PyTuple, PyObject, PyResult, Python};
+use std::sync::{Arc, Mutex};
+
+use dynamo_llm::block_manager::block::BlockDataExt;
+
+pub enum BlockType {
+    Pinned(
+        dynamo_llm::block_manager::block::MutableBlock<
+            dynamo_llm::block_manager::storage::PinnedStorage,
+            dynamo_llm::block_manager::block::BasicMetadata,
+        >,
+    ),
+    Device(
+        dynamo_llm::block_manager::block::MutableBlock<
+            dynamo_llm::block_manager::storage::DeviceStorage,
+            dynamo_llm::block_manager::block::BasicMetadata,
+        >,
+    ),
+}
+
+struct DlPackTensor {
+    block: Arc<Mutex<BlockType>>,
+    // TODO: Metadata should be stored in the block manager?
+    dtype: dynamo_llm::common::dtype::DType,
+    device_id: usize,
+}
+
+impl ToTensor for DlPackTensor {
+    fn data_ptr(&self) -> *mut std::ffi::c_void {
+        let mut mutable_block = self.block.lock().unwrap();
+        let ptr = match &mut *mutable_block {
+            BlockType::Pinned(block) => {
+                let mut block_view_mut = block
+                    .block_view_mut()
+                    .expect("Failed to get mutable Pinned block view");
+                unsafe { block_view_mut.as_mut_ptr() }
+            }
+            BlockType::Device(block) => {
+                let mut block_view_mut = block
+                    .block_view_mut()
+                    .expect("Failed to get mutable Device block view");
+                unsafe { block_view_mut.as_mut_ptr() }
+            }
+        };
+        ptr as *mut std::ffi::c_void
+    }
+
+    fn byte_offset(&self) -> u64 {
+        0
+    }
+
+    fn device(&self) -> Device {
+        let mutable_block = self.block.lock().unwrap();
+        match &*mutable_block {
+            BlockType::Pinned(_) => {
+                // TODO: Why torch does not support CPU_PINNED here?
+                /*Device {
+                    device_type: DeviceType::CudaHost,
+                    device_id: 0,
+                }*/
+                Device::CPU
+            }
+            BlockType::Device(_) => Device::cuda(self.device_id),
+        }
+    }
+
+    fn dtype(&self) -> DataType {
+        // Map from dynamo_llm::common::dtype::DType to dlpark::prelude::DataType
+        match self.dtype {
+            dynamo_llm::common::dtype::DType::FP8 => {
+                // No direct FP8 equivalent, use U8 as closest alternative
+                DataType::U8
+            }
+            dynamo_llm::common::dtype::DType::FP16 => DataType::F16,
+            dynamo_llm::common::dtype::DType::BF16 => DataType::BF16,
+            dynamo_llm::common::dtype::DType::FP32 => DataType::F32,
+            dynamo_llm::common::dtype::DType::U8 => DataType::U8,
+            dynamo_llm::common::dtype::DType::U16 => DataType::U16,
+            dynamo_llm::common::dtype::DType::U32 => DataType::U32,
+            dynamo_llm::common::dtype::DType::U64 => DataType::U64,
+            dynamo_llm::common::dtype::DType::I8 => DataType::I8,
+            dynamo_llm::common::dtype::DType::I16 => DataType::I16,
+            dynamo_llm::common::dtype::DType::I32 => DataType::I32,
+            dynamo_llm::common::dtype::DType::I64 => DataType::I64,
+        }
+    }
+
+    fn shape_and_strides(&self) -> ShapeAndStrides {
+        let mutable_block = self.block.lock().unwrap();
+        let (num_blocks, num_layers, page_size, inner_dim) = match &*mutable_block {
+            BlockType::Pinned(block) => (
+                block.num_blocks(),
+                block.num_layers(),
+                block.page_size(),
+                block.inner_dim(),
+            ),
+            BlockType::Device(block) => (
+                block.num_blocks(),
+                block.num_layers(),
+                block.page_size(),
+                block.inner_dim(),
+            ),
+        };
+        let shape_i64: Vec<i64> = vec![
+            num_blocks as i64,
+            num_layers as i64,
+            page_size as i64,
+            inner_dim as i64,
+        ];
+        ShapeAndStrides::new_contiguous(&shape_i64)
+    }
+}
+
+/*impl Drop for DlPackTensor {
+    fn drop(&mut self) {
+        println!("Dropping DlPackTensor");
+    }
+}*/
+
+#[pyclass]
+pub struct Block {
+    inner: Arc<Mutex<BlockType>>,
+    // TODO: Metadata should be stored in the block manager?
+    dtype: dynamo_llm::common::dtype::DType,
+    device_id: usize,
+}
+
+impl Block {
+    pub fn from_rust(
+        block: Arc<Mutex<BlockType>>,
+        dtype: dynamo_llm::common::dtype::DType,
+        device_id: usize,
+    ) -> Self {
+        Self {
+            inner: block,
+            dtype: dtype,
+            device_id: device_id,
+        }
+    }
+}
+
+#[pymethods]
+impl Block {
+    #[pyo3(signature = (stream=None, max_version=None, dl_device=None, copy=None))]
+    fn __dlpack__(
+        &self,
+        stream: Option<PyObject>,
+        max_version: Option<PyObject>,
+        dl_device: Option<PyObject>,
+        copy: Option<bool>,
+    ) -> PyResult<PyObject> {
+        // Panic if any arguments are provided
+        if stream.is_some() {
+            panic!("stream argument is not supported");
+        }
+        if max_version.is_some() {
+            panic!("max_version argument is not supported");
+        }
+        if dl_device.is_some() {
+            panic!("dl_device argument is not supported");
+        }
+        if copy.is_some() {
+            panic!("copy argument is not supported");
+        }
+
+        // Create DLPack PyCapsule
+        let manager_ctx = ManagerCtx::new(DlPackTensor {
+            block: self.inner.clone(),
+            dtype: self.dtype.clone(),
+            device_id: self.device_id,
+        });
+        let py_capsule = Python::with_gil(|py| manager_ctx.into_py(py));
+        Ok(py_capsule)
+    }
+
+    fn __dlpack_device__(&self) -> PyResult<Py<PyTuple>> {
+        let dlpack_device = Python::with_gil(|py| {
+            let device_type_list = py.eval(c_str!("[('CPU', 1), ('CUDA', 2), ('CPU_PINNED', 3), ('OPENCL', 4), ('VULKAN', 7), ('METAL', 8), ('VPI', 9), ('ROCM', 10)]"), None, None).unwrap();
+            let device_type_enum = py
+                .import("enum")
+                .unwrap()
+                .getattr("Enum")
+                .unwrap()
+                .call1(("DLDeviceType", device_type_list))
+                .unwrap();
+            let block = self.inner.lock().unwrap();
+            let device_type = match &*block {
+                BlockType::Pinned(_) => device_type_enum.getattr("CPU_PINNED").unwrap(),
+                BlockType::Device(_) => device_type_enum.getattr("CUDA").unwrap(),
+            };
+            let device_id = self.device_id.into_py(py).into_bound(py);
+            let device = vec![device_type, device_id];
+            PyTuple::new(py, device).unwrap().unbind()
+        });
+        Ok(dlpack_device)
+    }
+}
+
+/*impl Drop for Block {
+    fn drop(&mut self) {
+        println!("Dropping Block");
+    }
+}*/
--- a/lib/bindings/python/rust/llm/block_manager/block_list.rs
+++ b/lib/bindings/python/rust/llm/block_manager/block_list.rs
+// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#![cfg(feature = "block-manager")]
+// Silence warnings about deprecated features (like pyo3::IntoPy::into_py)
+#![allow(deprecated)]
+
+use super::*;
+
+use pyo3::{types::PyList, PyResult, Python};
+use std::sync::{Arc, Mutex};
+
+#[pyclass]
+pub struct BlockList {
+    inner: Vec<Arc<Mutex<block::BlockType>>>,
+    // TODO: Metadata should be stored in the block manager?
+    dtype: dynamo_llm::common::dtype::DType,
+    device_id: usize,
+    // Python iterator state
+    py_itr_idx: usize,
+}
+
+impl BlockList {
+    pub fn from_rust(
+        block_list: Vec<block::BlockType>,
+        dtype: dynamo_llm::common::dtype::DType,
+        device_id: usize,
+    ) -> Self {
+        Self {
+            inner: block_list
+                .into_iter()
+                .map(|b| Arc::new(Mutex::new(b)))
+                .collect(),
+            dtype: dtype,
+            device_id: device_id,
+            py_itr_idx: 0,
+        }
+    }
+}
+
+#[pymethods]
+impl BlockList {
+    fn to_list(&self) -> PyResult<Py<PyList>> {
+        let py_list = Python::with_gil(|py| {
+            let blocks: Vec<block::Block> = self
+                .inner
+                .iter()
+                .map(|b| block::Block::from_rust(b.clone(), self.dtype.clone(), self.device_id))
+                .collect();
+            PyList::new(py, blocks).unwrap().unbind()
+        });
+        Ok(py_list)
+    }
+
+    fn __len__(&self) -> PyResult<usize> {
+        Ok(self.inner.len())
+    }
+
+    fn __getitem__(&self, index: usize) -> PyResult<block::Block> {
+        if index >= self.inner.len() {
+            return Err(pyo3::exceptions::PyIndexError::new_err(format!(
+                "Index {} out of range for BlockList of length {}",
+                index,
+                self.inner.len()
+            )));
+        }
+        let block = block::Block::from_rust(
+            self.inner[index].clone(),
+            self.dtype.clone(),
+            self.device_id,
+        );
+        Ok(block)
+    }
+
+    fn __iter__(slf: Py<Self>) -> PyResult<Py<Self>> {
+        Python::with_gil(|py| {
+            let mut slf = slf.borrow_mut(py);
+            // Reset iterator index at the beginning of each iteration
+            // Use to_list() for iterating concurrently
+            slf.py_itr_idx = 0;
+        });
+        Ok(slf)
+    }
+
+    fn __next__(&mut self) -> PyResult<block::Block> {
+        if self.py_itr_idx >= self.inner.len() {
+            return Err(pyo3::exceptions::PyStopIteration::new_err(
+                "No more items in BlockList",
+            ));
+        }
+        let block = block::Block::from_rust(
+            self.inner[self.py_itr_idx].clone(),
+            self.dtype.clone(),
+            self.device_id,
+        );
+        self.py_itr_idx += 1;
+        Ok(block)
+    }
+}
--- a/lib/bindings/python/src/dynamo/_core.pyi
+++ b/lib/bindings/python/src/dynamo/_core.pyi
@@ -13,7 +13,16 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from typing import AsyncGenerator, AsyncIterator, Callable, Dict, List, Optional, Union
+from typing import (
+    Any,
+    AsyncGenerator,
+    AsyncIterator,
+    Callable,
+    Dict,
+    List,
+    Optional,
+    Union,
+)

 def log_message(level: str, message: str, module: str, file: str, line: int) -> None:
    """
@@ -663,3 +672,130 @@ class NatsQueue:
        """
        ...

+class Block:
+    """
+    A KV cache block
+    """
+
+    ...
+
+    def __dlpack__(self, stream: Optional[Any] = None, max_version: Optional[Any] = None, dl_device: Optional[Any] = None, copy: Optional[bool] = None) -> Any:
+        """
+        Get a dlpack capsule from the block
+        """
+        ...
+
+    def __dlpack_device__(self) -> Any:
+        """
+        Get the dlpack device of the block
+        """
+        ...
+
+class BlockList:
+    """
+    A list of KV cache blocks
+    """
+
+    ...
+
+    def __len__(self) -> int:
+        """
+        Get the number of blocks in the list
+        """
+        ...
+
+    def __getitem__(self, index: int) -> Block:
+        """
+        Get a block by index
+        """
+        ...
+
+    def __iter__(self) -> 'BlockList':
+        """
+        Get an iterator over the blocks
+        """
+        ...
+
+    def __next__(self) -> Block:
+        """
+        Get the next block in the iterator
+        """
+        ...
+
+    def to_list(self) -> List[Block]:
+        """
+        Get a list of blocks
+        """
+        ...
+
+class BlockManager:
+    """
+    A KV cache block manager
+    """
+
+    def __init__(
+        self,
+        worker_id: int,
+        num_layer: int,
+        page_size: int,
+        inner_dim: int,
+        dtype: Optional[str] = None,
+        host_num_blocks: Optional[int] = None,
+        device_num_blocks: Optional[int] = None,
+        device_id: int = 0
+    ) -> None:
+        """
+        Create a `BlockManager` object
+
+        Parameters:
+        -----------
+        worker_id: int
+            The worker ID for this block manager
+        num_layer: int
+            Number of layers in the model
+        page_size: int
+            Page size for blocks
+        inner_dim: int
+            Inner dimension size
+        dtype: Optional[str]
+            Data type (e.g., 'fp16', 'bf16', 'fp32'), defaults to 'fp16' if None
+        host_num_blocks: Optional[int]
+            Number of host blocks to allocate, None means no host blocks
+        device_num_blocks: Optional[int]
+            Number of device blocks to allocate, None means no device blocks
+        device_id: int
+            CUDA device ID, defaults to 0
+        """
+        ...
+
+    def allocate_host_blocks_blocking(self, count: int) -> BlockList:
+        """
+        Allocate a list of host blocks (blocking call)
+
+        Parameters:
+        -----------
+        count: int
+            Number of blocks to allocate
+
+        Returns:
+        --------
+        BlockList
+            List of allocated blocks
+        """
+        ...
+
+    def allocate_device_blocks_blocking(self, count: int) -> BlockList:
+        """
+        Allocate a list of device blocks (blocking call)
+
+        Parameters:
+        -----------
+        count: int
+            Number of blocks to allocate
+
+        Returns:
+        --------
+        BlockList
+            List of allocated blocks
+        """
+        ...
--- a/lib/bindings/python/src/dynamo/llm/__init__.py
+++ b/lib/bindings/python/src/dynamo/llm/__init__.py
@@ -14,6 +14,7 @@
 # limitations under the License.

 from dynamo._core import AggregatedMetrics as AggregatedMetrics
+from dynamo._core import BlockManager as BlockManager
 from dynamo._core import DisaggregatedRouter as DisaggregatedRouter
 from dynamo._core import HttpAsyncEngine as HttpAsyncEngine
 from dynamo._core import HttpError as HttpError

--- a/lib/bindings/python/tests/test_block_manager.py
+++ b/lib/bindings/python/tests/test_block_manager.py
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import asyncio
+
+import pytest
+import torch
+
+from dynamo.llm import BlockManager
+
+pytestmark = pytest.mark.pre_merge
+
+
+WORKER_ID = 0
+NUM_LAYER = 5
+PAGE_SIZE = 4
+INNER_DIM = 13
+DTYPE, TORCH_DTYPE = "FP32", torch.float32
+HOST_NUM_BLOCKS = 16
+DEVICE_NUM_BLOCKS = 16
+DEVICE_ID = 0
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA unavailable")
+async def test_block_manager_initialization():
+    # Python should drop the BlockManager instance as soon as it goes out of scope, but
+    # it may not be garbage collected immediately, depending on the garbage collector.
+    BlockManager(WORKER_ID, NUM_LAYER, PAGE_SIZE, INNER_DIM)
+    BlockManager(WORKER_ID, NUM_LAYER, PAGE_SIZE, INNER_DIM, DTYPE)
+    BlockManager(WORKER_ID, NUM_LAYER, PAGE_SIZE, INNER_DIM, DTYPE, HOST_NUM_BLOCKS)
+    BlockManager(
+        WORKER_ID,
+        NUM_LAYER,
+        PAGE_SIZE,
+        INNER_DIM,
+        DTYPE,
+        device_num_blocks=DEVICE_NUM_BLOCKS,
+    )
+    BlockManager(
+        WORKER_ID,
+        NUM_LAYER,
+        PAGE_SIZE,
+        INNER_DIM,
+        DTYPE,
+        HOST_NUM_BLOCKS,
+        DEVICE_NUM_BLOCKS,
+    )
+    BlockManager(
+        WORKER_ID,
+        NUM_LAYER,
+        PAGE_SIZE,
+        INNER_DIM,
+        DTYPE,
+        device_num_blocks=DEVICE_NUM_BLOCKS,
+        device_id=DEVICE_ID,
+    )
+    BlockManager(
+        WORKER_ID,
+        NUM_LAYER,
+        PAGE_SIZE,
+        INNER_DIM,
+        DTYPE,
+        HOST_NUM_BLOCKS,
+        DEVICE_NUM_BLOCKS,
+        DEVICE_ID,
+    )
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA unavailable")
+async def test_cpu_block_access():
+    block_manager = BlockManager(
+        WORKER_ID,
+        NUM_LAYER,
+        PAGE_SIZE,
+        INNER_DIM,
+        DTYPE,
+        HOST_NUM_BLOCKS,
+        DEVICE_NUM_BLOCKS,
+        DEVICE_ID,
+    )
+    block_count = 2
+    block_list = block_manager.allocate_host_blocks_blocking(block_count)
+    py_blocks = block_list.to_list()
+    assert len(py_blocks) == block_count
+    tensors = [torch.from_dlpack(b) for b in py_blocks]
+    for tensor in tensors:
+        assert tensor.get_device() == -1  # CPU
+        assert tensor.shape == (1, NUM_LAYER, PAGE_SIZE, INNER_DIM)
+        assert tensor.dtype == TORCH_DTYPE
+    # print(tensors)
+    for tensor in tensors:
+        tensor[0][0][0][0] = 1.0
+        tensor[0][NUM_LAYER - 1][PAGE_SIZE - 1][INNER_DIM - 1] = 1.0
+    # print(tensors)
+    py_blocks_ = block_list.to_list()
+    assert py_blocks is not py_blocks_
+    assert len(py_blocks) == len(py_blocks_)
+    tensors_ = [torch.from_dlpack(b) for b in py_blocks_]
+    for tensor, tensor_ in zip(tensors, tensors_):
+        assert tensor is not tensor_
+        assert tensor.shape == tensor_.shape
+        assert tensor.dtype == tensor_.dtype
+        assert torch.allclose(tensor, tensor_)
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA unavailable")
+async def test_gpu_block_access():
+    block_manager = BlockManager(
+        WORKER_ID,
+        NUM_LAYER,
+        PAGE_SIZE,
+        INNER_DIM,
+        DTYPE,
+        HOST_NUM_BLOCKS,
+        DEVICE_NUM_BLOCKS,
+        DEVICE_ID,
+    )
+    block_count = 6
+    block_list = block_manager.allocate_device_blocks_blocking(block_count)
+    py_blocks = block_list.to_list()
+    assert len(py_blocks) == block_count
+    tensors = [torch.from_dlpack(b) for b in py_blocks]
+    for tensor in tensors:
+        assert tensor.get_device() == DEVICE_ID  # GPU
+        assert tensor.shape == (1, NUM_LAYER, PAGE_SIZE, INNER_DIM)
+        assert tensor.dtype == TORCH_DTYPE
+    # print(tensors)
+    for tensor in tensors:
+        tensor[0][0][0][0] = 1.0
+        tensor[0][NUM_LAYER - 1][PAGE_SIZE - 1][INNER_DIM - 1] = 1.0
+    # print(tensors)
+    py_blocks_ = block_list.to_list()
+    assert py_blocks is not py_blocks_
+    assert len(py_blocks) == len(py_blocks_)
+    tensors_ = [torch.from_dlpack(b) for b in py_blocks_]
+    for tensor, tensor_ in zip(tensors, tensors_):
+        assert tensor is not tensor_
+        assert tensor.shape == tensor_.shape
+        assert tensor.dtype == tensor_.dtype
+        assert torch.allclose(tensor, tensor_)
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA unavailable")
+async def test_block_list_iteration():
+    block_manager = BlockManager(
+        WORKER_ID,
+        NUM_LAYER,
+        PAGE_SIZE,
+        INNER_DIM,
+        DTYPE,
+        HOST_NUM_BLOCKS,
+        DEVICE_NUM_BLOCKS,
+        DEVICE_ID,
+    )
+    block_count = 4
+    block_list = block_manager.allocate_host_blocks_blocking(block_count)
+    # Test __len__()
+    assert len(block_list) == block_count
+    # Test __getitem__()
+    for i in range(block_count):
+        block = block_list[i]
+        tensor = torch.from_dlpack(block)
+        tensor[0][0][0][0] = 1.0 + i
+    # Test __iter__() and __next__()
+    idx = 1.0
+    for block in block_list:
+        tensor = torch.from_dlpack(block)
+        assert tensor[0][0][0][0] == idx
+        tensor[0][0][0][0] += 0.5
+        idx += 1.0
+    assert idx == 1.0 + block_count
+    # Test __iter__() should reset current index
+    idx = 1.0
+    for block in block_list:
+        tensor = torch.from_dlpack(block)
+        assert tensor[0][0][0][0] == idx + 0.5
+        idx += 1.0
+    assert idx == 1.0 + block_count
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA unavailable")
+async def test_block_copy_g1_g2():
+    block_manager = BlockManager(
+        WORKER_ID,
+        NUM_LAYER,
+        PAGE_SIZE,
+        INNER_DIM,
+        DTYPE,
+        HOST_NUM_BLOCKS,
+        DEVICE_NUM_BLOCKS,
+        DEVICE_ID,
+    )
+    # Allocate device (G1) and host (G2) block
+    host_block_list = block_manager.allocate_host_blocks_blocking(1)
+    device_block_list = block_manager.allocate_device_blocks_blocking(1)
+    # Populate host block with unique values
+    host_tensor = torch.from_dlpack(host_block_list[0])
+    for i in range(NUM_LAYER):
+        for j in range(PAGE_SIZE):
+            for k in range(INNER_DIM):
+                host_tensor[0][i][j][k] = i * PAGE_SIZE * INNER_DIM + j * INNER_DIM + k
+    # Copy host block to device block after permuting
+    permute_dims = (0, 2, 3, 1)
+    device_tensor_ = torch.from_dlpack(device_block_list[0]).permute(*permute_dims)
+    device_tensor_.copy_(host_tensor.permute(*permute_dims))
+    # Assert device block is contiguous and updated in block manager
+    device_tensor = torch.from_dlpack(device_block_list[0])
+    for i in range(NUM_LAYER):
+        for j in range(PAGE_SIZE):
+            for k in range(INNER_DIM):
+                assert (
+                    device_tensor[0][i][j][k]
+                    == i * PAGE_SIZE * INNER_DIM + j * INNER_DIM + k
+                )
+    # Set host block to zero and assert updated in block manager
+    host_tensor_ = torch.from_dlpack(host_block_list[0]).permute(*permute_dims)
+    host_tensor_.zero_()
+    assert torch.all(host_tensor == 0)
+    # Copy device block back to host block
+    host_tensor_.copy_(device_tensor_)
+    # Assert host block is updated in block manager
+    for i in range(NUM_LAYER):
+        for j in range(PAGE_SIZE):
+            for k in range(INNER_DIM):
+                assert (
+                    host_tensor[0][i][j][k]
+                    == i * PAGE_SIZE * INNER_DIM + j * INNER_DIM + k
+                )
+
+
+async def main():
+    await test_block_manager_initialization()
+    await test_cpu_block_access()
+    await test_gpu_block_access()
+    await test_block_list_iteration()
+    await test_block_copy_g1_g2()
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/lib/llm/src/block_manager/block.rs
+++ b/lib/llm/src/block_manager/block.rs
@@ -217,7 +217,7 @@ impl<S: Storage, M: BlockMetadata> Block<S, M> {

    /// Get the number of blocks in the block
    pub fn num_blocks(&self) -> usize {
-        self.data.layout.num_blocks()
+        1
    }

    /// Get the number of layers in the block
@@ -617,6 +617,32 @@ impl<S: Storage, M: BlockMetadata> DerefMut for MutableBlock<S, M> {
    }
 }

+impl<S: Storage + NixlDescriptor, M: BlockMetadata> BlockDataExt<S> for MutableBlock<S, M> {
+    fn is_fully_contiguous(&self) -> bool {
+        self.data.is_fully_contiguous()
+    }
+
+    fn num_layers(&self) -> usize {
+        self.data.num_layers()
+    }
+
+    fn layer_view(&self, layer_idx: usize) -> BlockResult<view::LayerView<S>> {
+        self.data.layer_view(layer_idx)
+    }
+
+    fn layer_view_mut(&mut self, layer_idx: usize) -> BlockResult<view::LayerViewMut<S>> {
+        self.data.layer_view_mut(layer_idx)
+    }
+
+    fn block_view(&self) -> BlockResult<view::BlockView<S>> {
+        self.data.block_view()
+    }
+
+    fn block_view_mut(&mut self) -> BlockResult<view::BlockViewMut<S>> {
+        self.data.block_view_mut()
+    }
+}
+
 impl<S: Storage + NixlDescriptor, M: BlockMetadata> BlockDataProvider for MutableBlock<S, M> {
    type StorageType = S;

@@ -720,6 +746,40 @@ impl<S: Storage, M: BlockMetadata> Deref for ImmutableBlock<S, M> {
    }
 }

+impl<S: Storage + NixlDescriptor, M: BlockMetadata> BlockDataExt<S> for ImmutableBlock<S, M> {
+    fn is_fully_contiguous(&self) -> bool {
+        self.block.is_fully_contiguous()
+    }
+
+    fn num_layers(&self) -> usize {
+        self.block.num_layers()
+    }
+
+    fn layer_view(&self, layer_idx: usize) -> BlockResult<view::LayerView<S>> {
+        self.block.layer_view(layer_idx)
+    }
+
+    fn layer_view_mut(&mut self, _: usize) -> BlockResult<view::LayerViewMut<S>> {
+        // This should never be called since ImmutableBlock is immutable,
+        // but we need to implement the full trait
+        Err(BlockError::InvalidState(
+            "Cannot get mutable layer view from immutable block".to_string(),
+        ))
+    }
+
+    fn block_view(&self) -> BlockResult<view::BlockView<S>> {
+        self.block.block_view()
+    }
+
+    fn block_view_mut(&mut self) -> BlockResult<view::BlockViewMut<S>> {
+        // This should never be called since ImmutableBlock is immutable,
+        // but we need to implement the full trait
+        Err(BlockError::InvalidState(
+            "Cannot get mutable block view from immutable block".to_string(),
+        ))
+    }
+}
+
 impl<S: Storage + NixlDescriptor, M: BlockMetadata> BlockDataProvider for ImmutableBlock<S, M> {
    type StorageType = S;

@@ -1711,4 +1771,123 @@ mod tests {
        // drop(layout);
        tracing::info!("Layout dropped");
    }
+
+    #[test]
+    fn test_mutable_block_data_ext() {
+        init_logging();
+
+        // Create a layout with multiple layers and blocks for testing all methods
+        let config = LayoutConfig::builder()
+            .num_blocks(10)
+            .num_layers(2)
+            .page_size(4)
+            .inner_dim(13)
+            .build()
+            .unwrap();
+
+        let layout = FullyContiguous::allocate(config, &SystemAllocator).unwrap();
+        let layout = Arc::new(layout);
+
+        // Create a channel for returning blocks
+        let (return_tx, _return_rx) = tokio::sync::mpsc::unbounded_channel();
+
+        // Create a block and wrap it in a MutableBlock
+        let block_data = BlockData::new(layout.clone(), 0, 42, 0);
+        let block = Block::new(block_data, BasicMetadata::default()).unwrap();
+        let mut mutable_block = MutableBlock::new(block, return_tx.clone());
+
+        // Test is_fully_contiguous()
+        assert!(mutable_block.is_fully_contiguous());
+
+        // Test num_layers()
+        assert_eq!(mutable_block.num_layers(), 2);
+
+        // Test layer_view()
+        let layer_view = mutable_block.layer_view(0).unwrap();
+        assert_eq!(layer_view.size(), 4 * 13 * 2); // page_size x inner_dim x dtype_bytes
+        assert!(!unsafe { layer_view.as_ptr() }.is_null());
+
+        // Test layer_view_mut()
+        let mut layer_view_mut = mutable_block.layer_view_mut(1).unwrap();
+        assert_eq!(layer_view_mut.size(), 4 * 13 * 2); // page_size x inner_dim x dtype_bytes
+        assert!(!unsafe { layer_view_mut.as_mut_ptr() }.is_null());
+
+        // Test block_view()
+        let block_view = mutable_block.block_view().unwrap();
+        assert_eq!(block_view.size(), 2 * 4 * 13 * 2); // num_layers x page_size x inner_dim x dtype_bytes
+        assert!(!unsafe { block_view.as_ptr() }.is_null());
+
+        // Test block_view_mut()
+        let mut block_view_mut = mutable_block.block_view_mut().unwrap();
+        assert_eq!(block_view_mut.size(), 2 * 4 * 13 * 2); // num_layers x page_size x inner_dim x dtype_bytes
+        assert!(!unsafe { block_view_mut.as_mut_ptr() }.is_null());
+
+        tracing::info!("MutableBlock BlockDataExt tests completed successfully");
+    }
+
+    #[test]
+    fn test_immutable_block_data_ext() {
+        init_logging();
+
+        // Create a layout with multiple layers and blocks for testing all methods
+        let config = LayoutConfig::builder()
+            .num_blocks(10)
+            .num_layers(2)
+            .page_size(4)
+            .inner_dim(13)
+            .build()
+            .unwrap();
+
+        let layout = FullyContiguous::allocate(config, &SystemAllocator).unwrap();
+        let layout = Arc::new(layout);
+
+        // Create a channel for returning blocks
+        let (return_tx, _return_rx) = tokio::sync::mpsc::unbounded_channel();
+
+        // Create a block and wrap it in a MutableBlock
+        let block_data = BlockData::new(layout.clone(), 0, 42, 0);
+        let block = Block::new(block_data, BasicMetadata::default()).unwrap();
+        let mutable_block = MutableBlock::new(block, return_tx.clone());
+
+        // Wrap the mutable block in an Arc and create an ImmutableBlock from it
+        let arc_mutable_block = Arc::new(mutable_block);
+        let immutable_block = ImmutableBlock::new(arc_mutable_block);
+
+        // Test is_fully_contiguous()
+        assert!(immutable_block.is_fully_contiguous());
+
+        // Test num_layers()
+        assert_eq!(immutable_block.num_layers(), 2);
+
+        // Test layer_view()
+        let layer_view = immutable_block.layer_view(0).unwrap();
+        assert_eq!(layer_view.size(), 4 * 13 * 2); // page_size x inner_dim x dtype_bytes
+        assert!(!unsafe { layer_view.as_ptr() }.is_null());
+
+        // Test block_view()
+        let block_view = immutable_block.block_view().unwrap();
+        assert_eq!(block_view.size(), 2 * 4 * 13 * 2); // num_layers x page_size x inner_dim x dtype_bytes
+        assert!(!unsafe { block_view.as_ptr() }.is_null());
+
+        // Test that mutable methods return errors
+        let mut mut_immutable_block = immutable_block; // We need a mutable reference for these tests
+
+        let layer_view_mut_res = mut_immutable_block.layer_view_mut(0);
+        assert!(layer_view_mut_res.is_err());
+        if let Err(BlockError::InvalidState(msg)) = layer_view_mut_res {
+            assert!(msg.contains("immutable block"));
+        } else {
+            panic!("Expected InvalidState error");
+        }
+
+        let block_view_mut_res = mut_immutable_block.block_view_mut();
+        assert!(block_view_mut_res.is_err());
+        if let Err(BlockError::InvalidState(msg)) = block_view_mut_res {
+            assert!(msg.contains("immutable block"));
+        } else {
+            panic!("Expected InvalidState error");
+        }
+
+        tracing::info!("ImmutableBlock BlockDataExt tests completed successfully");
+    }
 }