Commit 0ed05516 authored by huteng.ht's avatar huteng.ht
Browse files

feat: upgrade to sdk v1 latest version



* 70b2701 on master
Signed-off-by: default avatarhuteng.ht <huteng.ht@bytedance.com>
parent 61d052cb
Pipeline #2963 canceled with stages
# veTurboIO
火山引擎研发的一款用于高性能读写 PyTorch 模型文件的 Python 库。该库实现了主要基于 safetensors 文件格式,实现高效的存储与读取张量数据。
## 安装
[En](./README.md) | [中文](./README.zh.md)
可以直接通过以下方式进行安装:
```bash
pip install veturboio -f https://veturbo-cn-beijing.tos-cn-beijing.volces.com/veturboio/index.html
```
Tips: 该指令会优先下载与当前 Python、PyTorch 版本匹配的 whl 文件,如果没有找到匹配的 whl 文件,会自动下载源码进行编译安装。
当使用源码安装时,可增加 `--no-build-isolation` 来使用当前的运行环境进行编译并安装(否则会尝试创建虚拟环境)。
A Python library for high-performance reading and writing of PyTorch model files
developed by Volcano Engine. This library mainly implements based on the safetensors
file format to achieve efficient storage and reading of tensor data.
## Install
如果已经安装失败,可以尝试通过下载源码进行安装:
It can be installed directly through the following way:
```bash
cd veturboio
python setup.py get_libcfs
python setup.py install
```
## 快速开始
Tips: This instruction will preferentially download the whl file that matches the
current Python and PyTorch versions. If no matching whl file is found, it will
automatically download the source code for compilation and installation.
```python
import torch
import veturboio
tensors = {
"weight1": torch.zeros((1024, 1024)),
"weight2": torch.zeros((1024, 1024))
}
If the installation fails, you can also try to install by downloading the source code,
and then compile and install it manually.
veturboio.save_file(tensors, "model.safetensors")
```bash
# CUDA ops, default
python setup.py install --cuda_ext
reloaded_tensor = veturboio.load("model.safetensors", map_location="cpu")
# NPU ops
python setup.py install --npu_ext
# check if the tensors are the same
for k, v in tensors.items():
assert torch.allclose(v, reloaded_tensor[k])
# CPU only
python setup.py install --cpu_ext
```
### 使用锁页内存加速连续加载数据到GPU
```python
import torch
import veturboio
tensors1 = {
"weight1": torch.zeros((1024, 1024)),
"weight2": torch.zeros((1024, 1024))
}
veturboio.save_file(tensors1, "model1.safetensors")
## Quick Start
tensors2 = {
"weight1": torch.zeros((1024, 1024)),
"weight2": torch.zeros((1024, 1024))
}
### Read and write model files
veturboio.save_file(tensors2, "model2.safetensors")
helper = veturboio.init_io_helper()
reloaded_tensor1 = veturboio.load("model1.safetensors", map_location="cuda:0", use_pinmem=True, helper=helper)
# the map_location may be different
reloaded_tensor2 = veturboio.load("model2.safetensors", map_location="cuda:0", use_pinmem=True, helper=helper)
# check if the tensors are the same
for k, v in tensors1.items():
assert torch.allclose(v.cuda(), reloaded_tensor1[k])
for k, v in tensors2.items():
assert torch.allclose(v.cuda(), reloaded_tensor2[k])
```
### 读写模型时启用加解密
该库底层通过两种接口读写:SFCS SDK 和 POSIX。如果文件路径前缀为 `sfcs://` 就视为使用 SFCS SDK,所需的鉴权信息可以从火山引擎可信服务的 unix domain socket 获取, 或者设置以下三个环境变量:
| 环境变量名 | 含义 |
| ------------------------------ | --------------------------------- |
| SFCS_ACCESS_KEY | SFCS 文件系统的 AK |
| SFCS_SECRET_KEY | SFCS 文件系统的 SK |
| SFCS_NAMENODE_ENDPOINT_ADDRESS | SFCS 文件系统 name 节点地址 |
加解密读写模型文件所需的 data key 和 iv,共有3种获取方式,优先级按照序号:
- [1] 加密的 data key 和 iv 存放在密文模型文件的 header 中,使用火山引擎 KMS 解密得到明文的 data key。
- [1.1] 访问 KMS 所需的 AK/SK/ST 从火山引擎可信服务的 unix domain socket 获取,需要额外挂载。
- [1.2] 访问 KMS 所需的 AK/SK/ST 从环境变量获取。
- [2] 访问火山引擎可信服务的 unix domain socket 直接获取 data key 和 iv,需要额外挂载。
- [3] 环境变量直接设置 data key 和 iv。
不同方式需要设置的环境变量如下:
| 环境变量名 | 含义 |
| ------------------------------ | --------------------------------- |
| VETURBOIO_KMS_HOST | [1] KMS 服务地址,默认值 open.volcengineapi.com|
| VETURBOIO_KMS_REGION | [1] KMS 服务所在区域,默认值 cn-beijing |
| VETURBOIO_KMS_KEYRING_NAME | [1] KMS 服务解密 data key 的钥匙环名 |
| VETURBOIO_KMS_KEY_NAME | [1] KMS 服务解密 data key 的主密钥名 |
| DATAPIPE_SOCKET_PATH | [1.1][2] 可信服务 uds 的路径 |
| VETURBOIO_KMS_ACCESS_KEY | [1.2] KMS 鉴权的 AK |
| VETURBOIO_KMS_SECRET_KEY | [1.2] KMS 鉴权的 SK |
| VETURBOIO_KMS_SESSION_TOKEN | [1.2] KMS 鉴权的临时令牌,非必需|
| VETURBOIO_KEY | [3] 加解密的 128 位数据密钥的 base64 编码 |
| VETURBOIO_IV | [3] 加解密的 128 位初始向量的 base64 编码 |
按照上述三种方式设置好后,可以参考下面代码在读写模型文件时启用加解密:
```python
import torch
import veturboio
......@@ -113,63 +51,61 @@ tensors = {
"weight2": torch.zeros((1024, 1024))
}
# use cpu to encrypt
veturboio.save_file(tensors, "sfcs://model.safetensors", use_cipher=True)
# use cpu to decrypt if map_location is cpu
reloaded_tensor1 = veturboio.load("sfcs://model.safetensors", map_location="cpu", use_cipher=True)
veturboio.save_file(tensors, "model.safetensors")
# use gpu to decrypt if map_location is cuda
reloaded_tensor2 = veturboio.load("sfcs://model.safetensors", map_location="cuda:0", use_cipher=True)
new_tensors = veturboio.load("model.safetensors")
# check if the tensors are the same
for k, v in tensors.items():
assert torch.allclose(v, reloaded_tensor1[k])
for k, v in tensors.items():
assert torch.allclose(v, reloaded_tensor2[k])
assert torch.allclose(v, new_tensors[k])
```
### 转换现有的 PyTorch 文件
### Convert existing PyTorch files
```bash
python -m veturboio.convert -i model.pt -o model.safetensors
```
## 性能测试
直接运行
## Performance test
Run directly:
```bash
bash bench/io_bench.sh
```
可以得到如下结果
Then, you can get the following results:
```
fs_name tensor_size veturboio load_time(s) torch load_time(s)
shm 1073741824 0.08 0.63
shm 2147483648 0.19 1.26
shm 4294967296 0.36 2.32
fs_name tensor_size veturboio load_time(s) torch load_time(s)
shm 1073741824 0.08 0.63
shm 2147483648 0.19 1.26
shm 4294967296 0.36 2.32
```
也可以进一步根据以下命令的参数说明调整使用参数
Also, you can run the following command to get more options:
```bash
python bench/io_bench.py -h
```
## 特性
## Advance Features
### Using veMLP to accelerate reading and writing
Volcano Engine Machine Learning Platform (veMLP) provides a distributed cache file system
based on the physical disks of the GPU cluster.
<p align="center">
<img src="./docs/imgs/SFCS.png" style="zoom:15%;">
</p>
- [x] 多线程高性能读取文件;
- [x] zero-copy 读取,不额外花费内存;
- [x] 支持直接加载到 CUDA;
- [x] bfloat16 数值 类型支持;
- [x] 支持固定 pin-memory 用于让 GPU 快速反复读取大文件;
- [x] 兼容 PyTorch 标准格式(无性能提升);
- [x] 兼容 safetensors 格式;
- [x] 特殊加密格式存储;
When a cluster-level task needs to read
a model file, the caching system can efficiently distribute the model file between GPU
machines via RDMA transfer, thus avoiding network transfer bottlenecks. When using this
system, veTurboIO can maximize its performance advantages.
## 收益
### Encrypt and decrypt model files
veTurboIO supports encryption and decryption of model files. You can read the [tutorial](./docs/encrypt_model.md)
to learn how to keep your model files secure. When you use GPU as target device, veTurboIO can decrypt the model file on the fly.
标准的 PyTorch 模型文件会经过 zip 与 pickle 两次操作,这两个操作极大的抑制了读取的速度,同时 unpickle 也会带来潜在的不安全性。我们使用一种自定义的模型格式来存储 tensor 数据,希望可以改善 PyTorch 标准格式所存在的这些问题。目前已经实现的优点有:
- 多线程读取:当前文件对象主要的存放点为云端存储,单一进程无法达到云存储的带宽上限,必须使用多线程读取才能达到最大的读取速度。PyTorch 标准格式的读取速度受限于 pickle 解析速度,远无法达到云存储的速度上限;
- 云端适配:基于火山引擎的云端存储(vePFS、SFCS)特性,最大化的利用了云端存储的带宽;
- 安全性:不再使用 pickle 对象,避免了 pickle 的安全性问题;
## License
## 更新记录
[Apache License 2.0](./LICENSE)
前往 [CHANGELOG](./CHANGELOG.md) 了解更多。
\ No newline at end of file
# veTurboIO
[en](./README.md) | [中文](./README.zh.md)
一个由 Volcano Engine 开发的用于高性能读写 PyTorch 模型文件的 Python 库。该库主要基于 safetensors 文件格式实现,以实现对张量数据的高效存储和读取。
## 安装
可以直接通过以下方式安装:
```bash
pip install veturboio -f https://veturbo-cn-beijing.tos-cn-beijing.volces.com/veturboio/index.html --no-build-isolation
```
提示:此指令会优先下载与当前 Python 和 PyTorch 版本匹配的 whl 文件,如果没有找到匹配的 whl 文件,会自动下载源码进行编译安装。
如果安装失败,也可以尝试通过下载源码安装,然后手动编译安装。
```bash
# CUDA ops, default
python setup.py install --cuda_ext
# NPU ops
python setup.py install --npu_ext
# CPU only
python setup.py install --cpu_ext
```
## 快速开始
### 读写模型文件
```python
import torch
import veturboio
tensors = {
"weight1": torch.zeros((1024, 1024)),
"weight2": torch.zeros((1024, 1024))
}
veturboio.save_file(tensors, "model.safetensors")
new_tensors = veturboio.load("model.safetensors")
# check if the tensors are the same
for k, v in tensors.items():
assert torch.allclose(v, new_tensors[k])
```
## 转换已有 PyTorch 文件
```bash
python -m veturboio.convert -i model.pt -o model.safetensors
```
## 性能测试
直接运行:
```bash
bash bench/io_bench.sh
```
接下来,你可以获得如下的结果:
```
fs_name tensor_size veturboio load_time(s) torch load_time(s)
shm 1073741824 0.08 0.63
shm 2147483648 0.19 1.26
shm 4294967296 0.36 2.32
```
## 进阶功能
### 使用 veMLP 加速读写
Volcano Engine Machine Learning Platform (veMLP) 提供了基于 GPU 集群的物理磁盘的分布式缓存文件系统。
<p align="center">
<img src="./docs/imgs/SFCS.png" style="zoom:15%;">
</p>
当集群级任务需要读取模型文件时,缓存系统可以通过 RDMA 传输高效地在 GPU 机器之间分发模型文件,从而避免网络传输瓶颈。使用此系统时,veTurboIO 可以最大化其性能优势。
### 加密和解密模型文件
veTurboIO 支持模型文件的加密和解密。您可以阅读[教程]([tutorial](./docs/encrypt_model.md))以了解如何保护您的模型文件。当您使用 GPU 作为目标设备时,veTurboIO 可以实时解密模型文件。
## 许可证
[Apache License 2.0](./LICENSE)
# API
::: veturboio.io
# 加解密模型文件
该库底层通过两种接口读写:SFCS SDK 和 POSIX。如果文件路径前缀为 `sfcs://` 就视为使用 SFCS SDK,所需的鉴权信息可以从火山引擎可信服务的 `unix domain socket` 获取或者设置以下三个环境变量:
| 环境变量名 | 含义 |
| ------------------------------ | --------------------------------- |
| SFCS_ACCESS_KEY | SFCS 文件系统的 AK |
| SFCS_SECRET_KEY | SFCS 文件系统的 SK |
| SFCS_NAMENODE_ENDPOINT_ADDRESS | SFCS 文件系统 NameNode 地址 |
加解密读写模型文件需要 data key 和 iv,有 3 种获取方式,读取优先级按照下列顺序:
- [1] 加密的 data key 和 iv 存放在密文模型文件的 header 中,使用火山引擎 KMS 解密得到明文的 data key。
- [1.1] 访问 KMS 所需的 AK/SK/ST 从火山引擎可信服务的 unix domain socket 获取,需要额外挂载。
- [1.2] 访问 KMS 所需的 AK/SK/ST 从环境变量获取。
- [2] 访问火山引擎可信服务的 unix domain socket 直接获取 data key 和 iv,需要额外挂载。
- [3] 通过环境变量直接设置 data key 和 iv。
不同方式需要设置的环境变量如下:
| 环境变量名 | 含义 |
| ------------------------------ | --------------------------------- |
| VETURBOIO_KMS_HOST | [1] KMS 服务地址,默认值 open.volcengineapi.com|
| VETURBOIO_KMS_REGION | [1] KMS 服务所在区域,默认值 cn-beijing |
| VETURBOIO_KMS_KEYRING_NAME | [1] KMS 服务解密 data key 的钥匙环名 |
| VETURBOIO_KMS_KEY_NAME | [1] KMS 服务解密 data key 的主密钥名 |
| DATAPIPE_SOCKET_PATH | [1.1][2] 可信服务 uds 的路径 |
| VETURBOIO_KMS_ACCESS_KEY | [1.2] KMS 鉴权的 AK |
| VETURBOIO_KMS_SECRET_KEY | [1.2] KMS 鉴权的 SK |
| VETURBOIO_KMS_SESSION_TOKEN | [1.2] KMS 鉴权的临时令牌,非必需|
| VETURBOIO_KEY | [3] 加解密的 128 位数据密钥的 base64 编码 |
| VETURBOIO_IV | [3] 加解密的 128 位初始向量的 base64 编码 |
按照上述三种方式设置好后,可以参考下面代码在读写模型文件时启用加解密:
```python
import torch
import veturboio
tensors = {
"weight1": torch.zeros((1024, 1024)),
"weight2": torch.zeros((1024, 1024))
}
# use cpu to encrypt
veturboio.save_file(tensors, "sfcs://model.safetensors", use_cipher=True)
# use cpu to decrypt if map_location is cpu
reloaded_tensor1 = veturboio.load("sfcs://model.safetensors", map_location="cpu", use_cipher=True)
# use gpu to decrypt if map_location is cuda
reloaded_tensor2 = veturboio.load("sfcs://model.safetensors", map_location="cuda:0", use_cipher=True)
# check if the tensors are the same
for k, v in tensors.items():
assert torch.allclose(v, reloaded_tensor1[k])
for k, v in tensors.items():
assert torch.allclose(v, reloaded_tensor2[k])
```
### 使用锁页内存加速连续加载数据到GPU
```python
import torch
import veturboio
tensors1 = {
"weight1": torch.zeros((1024, 1024)),
"weight2": torch.zeros((1024, 1024))
}
veturboio.save_file(tensors1, "model1.safetensors")
tensors2 = {
"weight1": torch.zeros((1024, 1024)),
"weight2": torch.zeros((1024, 1024))
}
veturboio.save_file(tensors2, "model2.safetensors")
helper = veturboio.init_io_helper()
reloaded_tensor1 = veturboio.load("model1.safetensors", map_location="cuda:0", use_pinmem=True, helper=helper)
# the map_location may be different
reloaded_tensor2 = veturboio.load("model2.safetensors", map_location="cuda:0", use_pinmem=True, helper=helper)
# check if the tensors are the same
for k, v in tensors1.items():
assert torch.allclose(v.cuda(), reloaded_tensor1[k])
for k, v in tensors2.items():
assert torch.allclose(v.cuda(), reloaded_tensor2[k])
```
......@@ -16,19 +16,35 @@ limitations under the License.
import os
import platform
import sys
import requests
import setuptools
import torch
from pkg_resources import parse_version
from setuptools import find_packages, setup
from torch.utils.cpp_extension import BuildExtension, CppExtension, CUDAExtension
from setuptools import Extension, find_packages, setup
from torch.utils.cpp_extension import BuildExtension, CppExtension, include_paths
# initialize variables for compilation
IS_LINUX = platform.system() == "Linux"
IS_DARWIN = platform.system() == "Darwin"
IS_WINDOWS = platform.system() == "Windows"
this_dir = os.path.dirname(os.path.abspath(__file__))
def get_option():
if os.getenv("NPU_EXTENSION_ENABLED", "0") == "1":
sys.argv.append("--npu_ext")
elif "--cuda_ext" not in sys.argv and "--npu_ext" not in sys.argv and "--cpu_ext" not in sys.argv:
print(
'''No known extension specified, default to use --cuda_ext. Currently supported:
--cuda_ext
--npu_ext
--cpu_ext'''
)
sys.argv.append("--cuda_ext")
def get_version():
import importlib.util
......@@ -37,7 +53,12 @@ def get_version():
m = importlib.util.module_from_spec(spec)
spec.loader.exec_module(m)
return m.__version__
if "--cpu_ext" in sys.argv:
return m.__version__ + "+cpu"
elif "--npu_ext" in sys.argv:
return m.__version__ + "+npu"
else:
return m.__version__
def make_relative_rpath(path):
......@@ -50,6 +71,7 @@ def make_relative_rpath(path):
def get_veturboio_extension():
get_option()
# prevent ninja from using too many resources
try:
import psutil
......@@ -71,41 +93,108 @@ def get_veturboio_extension():
# Since PyTorch1.8.0, it has a default value so users do not need
# to pass an empty list anymore.
# More details at https://github.com/pytorch/pytorch/pull/45956
extra_compile_args = {'cxx': [], 'nvcc': ['-O3']}
extra_compile_args = {'cxx': ['-fvisibility=hidden'], 'nvcc': ['-O3']}
if parse_version(torch.__version__) <= parse_version('1.12.1'):
extra_compile_args['cxx'] = ['-std=c++14']
extra_compile_args['cxx'].append('-std=c++14')
else:
extra_compile_args['cxx'] = ['-std=c++17']
extra_compile_args['cxx'].append('-std=c++17')
name = "veturboio_ext"
sources = [
"veturboio/ops/csrc/pybind.cpp",
"veturboio/ops/csrc/posix.cpp",
"veturboio/ops/csrc/sfcs.cpp",
"veturboio/ops/csrc/io_helper_cpu_common.cpp",
"veturboio/ops/csrc/cipher.cpp",
]
include_dirs = include_paths()
include_dirs.append("veturboio/ops/csrc/include")
torch_dir = os.path.join(os.path.dirname(torch.__file__), "lib")
library_dirs = [torch_dir]
library_dirs.append("veturboio/ops/csrc/lib")
libraries = ["cloudfs", ":libfastcrypto_gpu.so.0.3"]
include_dirs = ["veturboio/ops/csrc/include"]
library_dirs = ["veturboio/ops/csrc/lib"]
libraries = ["cfs", ":libfastcrypto_gpu.so.0.3"]
extra_link_args = [make_relative_rpath("veturboio/ops/csrc/lib")]
return CUDAExtension(
name="veturboio_ext",
sources=[
"veturboio/ops/csrc/pybind.cpp",
"veturboio/ops/csrc/load_utils.cpp",
"veturboio/ops/csrc/sfcs.cpp",
"veturboio/ops/csrc/io_helper.cu",
"veturboio/ops/csrc/cipher.cpp",
],
define_macros=define_macros,
include_dirs=include_dirs,
library_dirs=library_dirs,
libraries=libraries,
extra_compile_args=extra_compile_args,
extra_link_args=extra_link_args,
)
# Refer to: https://github.com/pytorch/pytorch/blob/main/torch/utils/cpp_extension.py#L918
# In torch 2.0, this flag is False, and the *.so lib set this flag as False when building.
# In newer torch, this flag is True, to keep compatibility with *.so lib, we set it False
# to generate g++ flags '-D_GLIBCXX_USE_CXX11_ABI=0' when building veturboio_ext, otherwise
# some 'undefine symbol' error of std::string will be thrown.
torch._C._GLIBCXX_USE_CXX11_ABI = False
if "--cuda_ext" in sys.argv:
sys.argv.remove("--cuda_ext")
extra_compile_args['nvcc'].append('-O3')
sources.append("veturboio/ops/csrc/io_helper.cu")
define_macros.append(("USE_CUDA", "1"))
from torch.utils.cpp_extension import CUDAExtension
return CUDAExtension(
name=name,
sources=sources,
define_macros=define_macros,
include_dirs=include_dirs,
library_dirs=library_dirs,
libraries=libraries,
extra_compile_args=extra_compile_args,
extra_link_args=extra_link_args,
)
else:
extra_compile_args['cxx'].append('-O3')
libraries.append("torch_cpu")
libraries.append("torch_python")
extra_link_args.append(f"-Wl,--rpath={torch_dir},--enable-new-dtags")
if "--npu_ext" in sys.argv:
sys.argv.remove("--npu_ext")
sources.append("veturboio/ops/csrc/io_helper_npu.cpp")
define_macros.append(("USE_NPU", "1"))
return Extension(
name=name,
sources=sources,
define_macros=define_macros,
include_dirs=include_dirs,
library_dirs=library_dirs,
libraries=libraries,
extra_compile_args=extra_compile_args,
extra_link_args=extra_link_args,
)
elif "--cpu_ext" in sys.argv:
sys.argv.remove("--cpu_ext")
sources.append("veturboio/ops/csrc/io_helper_cpu.cpp")
return Extension(
name=name,
sources=sources,
define_macros=define_macros,
include_dirs=include_dirs,
library_dirs=library_dirs,
libraries=libraries,
extra_compile_args=extra_compile_args,
extra_link_args=extra_link_args,
)
class GetLibCfsCommand(setuptools.Command):
"""get libcfs from url"""
description = 'get libcfs from url'
user_options = [('src=', 's', 'source url of libcfs.so'), ('dst=', 'd', 'dest filepath of libcfs.so')]
user_options = [('src=', 's', 'source url of libcloudfs.so'), ('dst=', 'd', 'dest filepath of libcloudfs.so')]
def initialize_options(self):
from veturboio.utils.load_veturboio_ext import LIBCFS_DEFAULT_PATH, LIBCFS_DEFAULT_URL
......@@ -117,7 +206,7 @@ class GetLibCfsCommand(setuptools.Command):
pass
def run(self):
print(f"download libcfs.so from {self.src}, save to {self.dst}")
print(f"download libcloudfs.so from {self.src}, save to {self.dst}")
r = requests.get(self.src, timeout=60)
with open(self.dst, 'wb') as f:
f.write(r.content)
......@@ -133,10 +222,12 @@ setup(
install_requires=[
"safetensors",
"numpy",
"netifaces",
"loguru",
"requests-unixsocket",
"requests",
],
include_package_data=True,
cmdclass={"get_libcfs": GetLibCfsCommand, "build_ext": BuildExtension},
dependency_links=['https://mirrors.ivolces.com/pypi/'],
)
......@@ -168,7 +168,6 @@ class TestCipherInfo(TestCase):
os.environ.pop(ENV_KMS_SK, None)
DataPipeClient.DATAPIPE_SOCKET_PATH = self.server_address
info = CipherInfo(True, header_bytes)
info = CipherInfo(True, header_bytes)
self.assertTrue(info.use_cipher)
self.assertTrue(info.use_header)
self.assertTrue(np.array_equal(info.key, self.target_key_2))
......@@ -176,7 +175,8 @@ class TestCipherInfo(TestCase):
def test_fetch_from_datapipe(self):
DataPipeClient.DATAPIPE_SOCKET_PATH = self.server_address
info = CipherInfo(True)
DataPipeClient.ENCRYPT_HEADER['X-Encrypt-Caller-Pod'] = 'test-pod-name'
info = CipherInfo(True, None, '/maas_model/test_path')
self.assertTrue(info.use_cipher)
self.assertTrue(np.array_equal(info.key, self.target_key))
self.assertTrue(np.array_equal(info.iv, self.target_iv))
......@@ -190,12 +190,12 @@ class TestCipherInfo(TestCase):
self.assertTrue(np.array_equal(info.key, self.target_key))
self.assertTrue(np.array_equal(info.iv, self.target_iv))
def test_fallback(self):
def test_raise_error(self):
DataPipeClient.DATAPIPE_SOCKET_PATH = '/path/not/exist'
os.environ['VETURBOIO_KEY'] = base64.b64encode(b'abcdefgh12').decode('ascii')
os.environ['VETURBOIO_IV'] = base64.b64encode(b'1234567887').decode('ascii')
info = CipherInfo(True)
self.assertFalse(info.use_cipher)
with self.assertRaises(RuntimeError):
info = CipherInfo(True)
@classmethod
def tearDownClass(cls):
......@@ -232,19 +232,9 @@ class TestCredentials(TestCase):
self.assertEqual(cred['SessionToken'], 'ST' * 12)
def test_sfcs_conf(self):
# case 1: a xml file already exists, do nothing
sfcs_conf = os.path.join(os.getcwd(), 'base_model.xml')
generate_sfcs_conf_xml(sfcs_conf, {'test': 'test'})
init_sfcs_conf('/base_model/tensor.pt')
self.assertEqual(os.environ['LIBCFS_CONF'], sfcs_conf)
self.assertEqual(len(credentials_helper.threads), 0)
self.assertEqual(len(credentials_helper.running), 0)
os.remove(sfcs_conf)
for e in SFCS_REQ_ENV_LIST:
os.environ[e] = 'test-value'
# case 2: env SFCS_ACCESS_KEY and SFCS_SECRET_KEY and SFCS_NAMENODE_ENDPOINT_ADDRESS exists
# case 1: env SFCS_ACCESS_KEY and SFCS_SECRET_KEY and SFCS_NAMENODE_ENDPOINT_ADDRESS exists
os.environ['SFCS_ACCESS_KEY'] = 'A' * 12
os.environ['SFCS_SECRET_KEY'] = 'S' * 12
os.environ['SFCS_NAMENODE_ENDPOINT_ADDRESS'] = '100.67.19.231'
......@@ -252,13 +242,13 @@ class TestCredentials(TestCase):
if os.path.exists(sfcs_conf):
os.remove(sfcs_conf)
init_sfcs_conf('/base_model2/tensor.pt')
self.assertEqual(os.environ['LIBCFS_CONF'], sfcs_conf)
self.assertEqual(os.environ['LIBCLOUDFS_CONF'], sfcs_conf)
self.assertEqual(len(credentials_helper.threads), 0)
self.assertEqual(len(credentials_helper.running), 0)
self.assertTrue(os.path.exists(sfcs_conf))
os.remove(sfcs_conf)
# case 3: use datapipe socket to get and refresh ak, sk, st and namenode_ip
# case 2: use datapipe socket to get and refresh ak, sk, st and namenode_ip
DataPipeClient.DATAPIPE_SOCKET_PATH = self.server_address
os.environ.pop('SFCS_ACCESS_KEY', None)
os.environ.pop('SFCS_SECRET_KEY', None)
......@@ -277,12 +267,15 @@ class TestCredentials(TestCase):
self.assertTrue(credentials_helper.running['base_model4'])
self.assertTrue(os.path.exists(sfcs_conf3))
self.assertTrue(os.path.exists(sfcs_conf4))
for i in range(5):
os.remove(sfcs_conf3)
os.remove(sfcs_conf4)
sleep(3)
self.assertTrue(os.path.exists(sfcs_conf3))
self.assertTrue(os.path.exists(sfcs_conf4))
print(credentials_helper.threads)
os.remove(sfcs_conf3)
os.remove(sfcs_conf4)
sleep(3)
self.assertTrue(os.path.exists(sfcs_conf3))
self.assertTrue(os.path.exists(sfcs_conf4))
print(credentials_helper.threads)
def test_sfcs_conf_json(self):
for e in SFCS_REQ_ENV_LIST:
......@@ -308,17 +301,18 @@ class TestCredentials(TestCase):
self.assertTrue(credentials_helper.running['base_model2'])
self.assertTrue(os.path.exists(sfcs_conf1))
self.assertTrue(os.path.exists(sfcs_conf2))
for i in range(5):
sleep(3)
self.assertTrue(os.path.exists(sfcs_conf1))
self.assertTrue(os.path.exists(sfcs_conf2))
print(credentials_helper.threads)
os.remove(sfcs_conf1)
os.remove(sfcs_conf2)
sleep(3)
self.assertTrue(os.path.exists(sfcs_conf1))
self.assertTrue(os.path.exists(sfcs_conf2))
print(credentials_helper.threads)
@classmethod
def tearDownClass(cls):
credentials_helper.stop()
os.environ.pop('LIBCFS_CONF', None)
os.environ.pop('LIBCLOUDFS_CONF', None)
for e in SFCS_REQ_ENV_LIST:
os.environ.pop(e, None)
for e in SFCS_OPT_ENV_LIST:
......
......@@ -46,19 +46,19 @@ class TestLoad(TestCase):
cls.tensors_0 = {
"weight1": torch.randn(2000, 10),
"weight2": torch.randn(2000, 10),
"weight2": torch.IntTensor(2000, 10),
}
cls.tensors_1 = {
"weight1": torch.randn(2000, 10),
"weight2": torch.randn(2000, 10),
"weight3": torch.randn(2000, 10),
"weight2": torch.IntTensor(2000, 10),
"weight3": torch.BoolTensor(2000, 10),
}
cls.filepath_0 = os.path.join(cls.tempdir.name, "model_0.safetensors")
cls.filepath_1 = os.path.join(cls.tempdir.name, "model_1.safetensors")
veturboio.save_file(cls.tensors_0, cls.filepath_0)
veturboio.save_file(cls.tensors_1, cls.filepath_1)
veturboio.save_file(cls.tensors_1, cls.filepath_1, enable_fast_mode=True)
cls.pt_filepath = os.path.join(cls.tempdir.name, "model.pt")
torch.save(cls.tensors_0, cls.pt_filepath)
......@@ -70,7 +70,7 @@ class TestLoad(TestCase):
cls.filepath_0_enc = os.path.join(cls.tempdir.name, "model_0_enc.safetensors")
cls.filepath_1_enc = os.path.join(cls.tempdir.name, "model_1_enc.safetensors")
veturboio.save_file(cls.tensors_0, cls.filepath_0_enc, use_cipher=True)
veturboio.save_file(cls.tensors_1, cls.filepath_1_enc, use_cipher=True)
veturboio.save_file(cls.tensors_1, cls.filepath_1_enc, use_cipher=True, enable_fast_mode=True)
cls.pt_filepath_enc = os.path.join(cls.tempdir.name, "model_enc.pt")
veturboio.save_pt(cls.tensors_0, cls.pt_filepath_enc, use_cipher=True)
......@@ -82,6 +82,7 @@ class TestLoad(TestCase):
cls.pt_filepath_enc_h = os.path.join(cls.tempdir.name, "model_enc_h.pt")
veturboio.save_pt(cls.tensors_0, cls.pt_filepath_enc_h, use_cipher=True)
del os.environ["VETURBOIO_CIPHER_HEADER"]
if torch.cuda.is_available():
cls.cuda_tensors_0 = deepcopy(cls.tensors_0)
......@@ -94,12 +95,15 @@ class TestLoad(TestCase):
@classmethod
def tearDownClass(cls):
# cls.tempdir.cleanup()
pass
cls.tempdir.cleanup()
def _run_pipeline(self, tensors, filepath, map_location, use_cipher, enable_fast_mode=True):
def _run_pipeline(self, tensors, filepath, map_location, use_cipher, enable_fast_mode=True, state_dict=None):
loaded_tensors = veturboio.load(
filepath, map_location=map_location, use_cipher=use_cipher, enable_fast_mode=enable_fast_mode
filepath,
map_location=map_location,
use_cipher=use_cipher,
enable_fast_mode=enable_fast_mode,
state_dict=state_dict,
)
for key in tensors.keys():
self.assertTrue(torch.allclose(tensors[key], loaded_tensors[key]))
......@@ -110,6 +114,30 @@ class TestLoad(TestCase):
self._run_pipeline(self.tensors_0, self.filepath_0_enc, "cpu", use_cipher=True)
self._run_pipeline(self.tensors_0, self.filepath_0, "cpu", use_cipher=False, enable_fast_mode=False)
self._run_pipeline(self.tensors_0, self.filepath_0_enc, "cpu", use_cipher=True, enable_fast_mode=False)
pre_allocated_tensors = {
"weight1": torch.randn(2000, 10),
"weight2": torch.IntTensor(2000, 10),
}
self._run_pipeline(self.tensors_0, self.filepath_0, "cpu", use_cipher=False, state_dict=pre_allocated_tensors)
self._run_pipeline(
self.tensors_0, self.filepath_0_enc, "cpu", use_cipher=True, state_dict=pre_allocated_tensors
)
self._run_pipeline(
self.tensors_0,
self.filepath_0,
"cpu",
use_cipher=False,
enable_fast_mode=False,
state_dict=pre_allocated_tensors,
)
self._run_pipeline(
self.tensors_0,
self.filepath_0_enc,
"cpu",
use_cipher=True,
enable_fast_mode=False,
state_dict=pre_allocated_tensors,
)
@unittest.skipIf(not torch.cuda.is_available(), "CUDA not available")
def test_pipeline_cuda(self):
......@@ -117,6 +145,32 @@ class TestLoad(TestCase):
self._run_pipeline(self.cuda_tensors_0, self.filepath_0_enc, "cuda:0", use_cipher=True)
self._run_pipeline(self.cuda_tensors_0, self.filepath_0, "cuda:0", use_cipher=False, enable_fast_mode=False)
self._run_pipeline(self.cuda_tensors_0, self.filepath_0_enc, "cuda:0", use_cipher=True, enable_fast_mode=False)
pre_allocated_tensors = {
"weight1": torch.randn(2000, 10).cuda(),
"weight2": torch.IntTensor(2000, 10).cuda(),
}
self._run_pipeline(
self.cuda_tensors_0, self.filepath_0, "cuda:0", use_cipher=False, state_dict=pre_allocated_tensors
)
self._run_pipeline(
self.cuda_tensors_0, self.filepath_0_enc, "cuda:0", use_cipher=True, state_dict=pre_allocated_tensors
)
self._run_pipeline(
self.cuda_tensors_0,
self.filepath_0,
"cuda:0",
use_cipher=False,
enable_fast_mode=False,
state_dict=pre_allocated_tensors,
)
self._run_pipeline(
self.cuda_tensors_0,
self.filepath_0_enc,
"cuda:0",
use_cipher=True,
enable_fast_mode=False,
state_dict=pre_allocated_tensors,
)
def test_read_multi_state_dict_cpu(self):
load_tensor_0 = self._run_pipeline(self.tensors_0, self.filepath_0, "cpu", use_cipher=False)
......@@ -165,16 +219,13 @@ class TestLoad(TestCase):
self.assertTrue(torch.allclose(self.cuda_tensors_0[key], loaded_tensors_enc[key]))
def test_load_cipher_header_cpu(self):
os.environ["VETURBOIO_CIPHER_HEADER"] = "1"
self._run_pipeline(self.tensors_0, self.filepath_0_enc_h, "cpu", use_cipher=True)
self._run_pipeline(self.tensors_0, self.pt_filepath_enc_h, "cpu", use_cipher=True)
self._run_pipeline(self.tensors_0, self.filepath_0_enc_h, "cpu", use_cipher=True, enable_fast_mode=False)
self._run_pipeline(self.tensors_0, self.pt_filepath_enc_h, "cpu", use_cipher=True, enable_fast_mode=False)
del os.environ["VETURBOIO_CIPHER_HEADER"]
@unittest.skipIf(not torch.cuda.is_available(), "CUDA not available")
def test_load_cipher_header_cuda(self):
os.environ["VETURBOIO_CIPHER_HEADER"] = "1"
self._run_pipeline(self.cuda_tensors_0, self.filepath_0_enc_h, "cuda:0", use_cipher=True)
self._run_pipeline(self.cuda_tensors_0, self.pt_filepath_enc_h, "cuda:0", use_cipher=True)
self._run_pipeline(
......@@ -183,12 +234,30 @@ class TestLoad(TestCase):
self._run_pipeline(
self.cuda_tensors_0, self.pt_filepath_enc_h, "cuda:0", use_cipher=True, enable_fast_mode=False
)
del os.environ["VETURBOIO_CIPHER_HEADER"]
def test_load_directIO_fall_back(self):
with tempfile.NamedTemporaryFile(dir="/dev/shm") as tmpFile:
veturboio.save_file(self.tensors_0, tmpFile.file.name)
veturboio.save_file(self.tensors_0, tmpFile.name)
tmpFile.flush()
loaded_tensors = veturboio.load(tmpFile.name, map_location="cpu", use_direct_io=True)
for key in self.tensors_0.keys():
self.assertTrue(torch.allclose(self.tensors_0[key], loaded_tensors[key]))
def test_load_to_shmem(self):
shmem = veturboio.load_to_shmem(self.filepath_0, use_cipher=False)
loaded_tensors = veturboio.load(
os.path.join("/dev/shm/", shmem.name), map_location="cpu", enable_fast_mode=False, use_cipher=False
)
for key in self.tensors_0.keys():
self.assertTrue(torch.allclose(self.tensors_0[key], loaded_tensors[key]))
shmem.close()
shmem.unlink()
shmem = veturboio.load_to_shmem(self.filepath_0_enc, use_cipher=True)
loaded_tensors = veturboio.load(
os.path.join("/dev/shm/", shmem.name), map_location="cpu", enable_fast_mode=False, use_cipher=False
)
for key in self.tensors_0.keys():
self.assertTrue(torch.allclose(self.tensors_0[key], loaded_tensors[key]))
shmem.close()
shmem.unlink()
......@@ -31,7 +31,8 @@ class TestSave(TestCase):
def setUpClass(cls):
cls.tensors_0 = {
"weight1": torch.randn(2000, 10),
"weight2": torch.randn(2000, 10),
"weight2": torch.IntTensor(2000, 10),
"weight3": torch.BoolTensor(2000, 10),
}
class MockModel(torch.nn.Module):
......@@ -46,6 +47,7 @@ class TestSave(TestCase):
cls.tempdir = tempfile.TemporaryDirectory()
cls.filepath_0 = os.path.join(cls.tempdir.name, "model_0.safetensors")
cls.filepath_1 = os.path.join(cls.tempdir.name, "model_0.pt")
cls.filepath_2 = os.path.join(cls.tempdir.name, "model_0_fast.safetensors")
cls.filepath_3 = os.path.join(cls.tempdir.name, "model_1.safetensors")
@classmethod
......@@ -55,7 +57,14 @@ class TestSave(TestCase):
def test_save_file(self):
veturboio.save_file(self.tensors_0, self.filepath_0)
with safe_open(self.filepath_0, framework="pt", device="cpu") as f:
assert len(f.keys()) == 2
assert len(f.keys()) == 3
for key in f.keys():
self.assertTrue(torch.allclose(self.tensors_0[key], f.get_tensor(key)))
# enable fast mode
veturboio.save_file(self.tensors_0, self.filepath_2, enable_fast_mode=True)
with safe_open(self.filepath_2, framework="pt", device="cpu") as f:
assert len(f.keys()) == 3
for key in f.keys():
self.assertTrue(torch.allclose(self.tensors_0[key], f.get_tensor(key)))
......
This diff is collapsed.
......@@ -14,7 +14,7 @@ See the License for the specific language governing permissions and
limitations under the License.
'''
from veturboio.io import load, save_file, save_model, save_pt
from veturboio.ops.load_utils import init_io_helper
from veturboio.io import load, load_to_shmem, save_file, save_model, save_pt
from veturboio.ops.io_utils import init_io_helper
__all__ = ["load", "save_file", "save_model", "init_io_helper", "save_pt"]
__all__ = ["load", "load_to_shmem", "save_file", "save_model", "init_io_helper", "save_pt"]
......@@ -15,21 +15,228 @@ limitations under the License.
'''
import argparse
import gc
import logging
import os
import sys
import traceback
from datetime import datetime
import torch
from safetensors.torch import _find_shared_tensors, _is_complete
from veturboio import save_file
import veturboio
parser = argparse.ArgumentParser()
parser.add_argument("--input", "-i", type=str, required=True)
parser.add_argument("--output", "-o", type=str, required=True)
def to_valid_state_dict(state_dict: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:
invalid_key = [k for k, v in state_dict.items() if not isinstance(v, torch.Tensor)]
if len(invalid_key) > 0:
logger.warning(f"invalid keys to removed: {invalid_key}")
state_dict = {k: v for k, v in state_dict.items() if k not in invalid_key}
result = {}
shared_tensor_groups = _find_shared_tensors(state_dict)
for group in shared_tensor_groups:
# check if all share tensors have the same data ptr, same shape, and same size
shared_tensors = [state_dict[k] for k in group]
data_ptrs = [t.data_ptr() for t in shared_tensors]
shapes = [t.shape for t in shared_tensors]
if len(set(data_ptrs)) != 1 or len(set(shapes)) != 1:
raise Exception(f"shared tensors {group} are not equal")
# make sure these tensors are complete and identical
converted_tensor = shared_tensors[0]
if not _is_complete(converted_tensor):
converted_tensor = converted_tensor.clone()
for t in group:
result[t] = converted_tensor
for k, v in state_dict.items():
if k not in result:
result[k] = v
return result
def add_handlers(logger: logging.Logger):
"""
Add handlers to logger
"""
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter(fmt="[%(levelname)s %(asctime)s] %(filename)s: %(lineno)d %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
def validate_result(input_state_dict: dict[str, torch.Tensor], output_state_dict: dict[str, torch.Tensor]):
input_state_dict = {k: v for k, v in input_state_dict.items() if isinstance(v, torch.Tensor)}
output_state_dict = {k: v for k, v in output_state_dict.items() if isinstance(v, torch.Tensor)}
input_key_set = set(input_state_dict.keys())
output_key_set = set(output_state_dict.keys())
if input_key_set != output_key_set:
not_in_output_key_set = input_key_set - output_key_set
not_in_input_key_set = output_key_set - input_key_set
raise Exception(
f"key set not equal, not in output key set: {not_in_output_key_set}, not in input key set: {not_in_input_key_set}"
)
not_equal_tensor = []
for key in input_state_dict:
if not torch.allclose(input_state_dict[key], output_state_dict[key]):
not_equal_tensor.append(key)
if len(not_equal_tensor) > 0:
raise Exception(f"result is not valid, not equal tensors: {not_equal_tensor}")
logger.info(f"all {len(input_key_set)} keys in state dict are equal")
def _get_available_cpu() -> int:
avail_cpu = os.cpu_count()
if os.path.isfile('/sys/fs/cgroup/cpu/cpu.cfs_quota_us'):
cpu_quota = int(open('/sys/fs/cgroup/cpu/cpu.cfs_quota_us').read().rstrip())
if cpu_quota != -1 and os.path.isfile('/sys/fs/cgroup/cpu/cpu.cfs_period_us'):
cpu_period = int(open('/sys/fs/cgroup/cpu/cpu.cfs_period_us').read().rstrip())
avail_cpu = int(cpu_quota / cpu_period)
logger.info(f"get veturboio thread {avail_cpu} from cgroup info")
return avail_cpu
class Pt2SafeTensorConverter:
def __init__(
self,
input_path: str,
output_path: str,
dry_run: bool,
enable_to_valid_state_dict: bool,
overwrite: bool,
use_direct_io: bool,
):
self.input_path = input_path
self.output_path = output_path
self.dry_run = dry_run
self.enable_to_valid_state_dict = enable_to_valid_state_dict
self.use_direct_io = use_direct_io
if self.input_path.startswith("sfcs://"):
try:
self.input_file_size = veturboio.ops.sfcs_utils.sfcs_get_file_size(self.input_path)
except BaseException as Exp:
raise FileNotFoundError("can't get size of sfcs file", Exp)
else:
if not os.path.exists(self.input_path):
raise Exception(f"file not exist: {self.input_path}")
# convert to abs path
if not os.path.isabs(self.input_path):
self.input_path = os.path.abspath(self.input_path)
self.input_file_size = os.path.getsize(self.input_path)
if not self.input_path.endswith(".pt"):
raise Exception("input file must end with .pt")
if self.output_path is None:
self.output_path = self.input_path.replace(".pt", ".safetensors")
elif not self.output_path.startswith("sfcs://") and not os.path.isabs(self.output_path):
self.output_path = os.path.abspath(self.output_path)
if not self.output_path.endswith(".safetensors"):
raise Exception("output file must end with .safetensors")
if overwrite:
if self.output_path.startswith("sfcs://"):
raise Exception("overwrite flag cannot be set when using sfcs")
if os.path.exists(self.output_path):
logger.info(f"overwrite output file {self.output_path}")
if not dry_run:
os.remove(self.output_path)
elif not self.output_path.startswith("sfcs://") and os.path.exists(self.output_path):
raise Exception(f"output file {self.output_path} already exists")
def convert(self):
logger.info(f"converting {self.input_path} to {self.output_path}")
available_cpus = _get_available_cpu()
ext_name = self.output_path.split(".")[-1]
state_dict = {}
if ext_name != "safetensors":
raise ValueError("output file should be safetensors file")
logger.info(f"start loading the pt file, the pt file has size of {self.input_file_size // 1000 // 1000}MB")
start_time = datetime.now()
if self.dry_run:
logger.info("dry run finished for veturboio.load_pt_file")
else:
state_dict = veturboio.load(
self.input_path, num_thread=available_cpus, use_direct_io=self.use_direct_io, enable_fast_mode=True
)
end_time = datetime.now()
logger.info(f"finish loading the pt file with duration {end_time - start_time}")
logger.info("start saving the safetensors file")
start_time = datetime.now()
if self.dry_run:
logger.info("dry run finished for veturboio.save_safetensors_file")
else:
if self.enable_to_valid_state_dict:
state_dict = to_valid_state_dict(state_dict)
veturboio.save_file(state_dict, self.output_path, force_save_shared_tensor=True)
end_time = datetime.now()
logger.info(f"finish saving the safetensors file with duration {end_time - start_time}")
del state_dict
gc.collect()
logger.info(f"gc finished")
def validate(self):
available_cpus = _get_available_cpu()
logger.info(f"validating if {self.input_path} in equal to {self.output_path}")
input_state_dict = veturboio.load(
self.input_path, num_thread=available_cpus, use_direct_io=self.use_direct_io, enable_fast_mode=True
)
logger.info(f"{self.input_path} loaded")
output_state_dict = veturboio.load(
self.output_path, num_thread=available_cpus, use_direct_io=self.use_direct_io, enable_fast_mode=True
)
logger.info(f"{self.output_path} loaded")
validate_result(input_state_dict, output_state_dict)
if __name__ == "__main__":
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
add_handlers(logger)
parser = argparse.ArgumentParser(description="converter used to convert .pt model to .safeTensor")
parser.add_argument(
"--input",
"-i",
type=str,
required=True,
help="indicate the path of .pt file, both posix path" "and sfcs prefix are supported",
)
parser.add_argument(
"--output",
"-o",
type=str,
required=False,
help="indicate the path of .safeTensor file, both "
"posix path and sfcs prefix are supported."
"will be placed into the same dir of the .pt "
"file if left empty",
)
parser.add_argument("--dry-run", "-d", action="store_true", help="just dry run, not really convert")
parser.add_argument("--overwrite", action="store_true", help="overwrite the output file if it exists")
parser.add_argument(
"--enable-to-valid-state-dict",
action="store_true",
help="execute to_valid_state_dict function before save to .safetensors",
)
parser.add_argument("--validate-result", action="store_true", help="validate result", default=False)
parser.add_argument("--use-direct-io", action="store_true", help="use direct io to load file", default=False)
args = parser.parse_args()
print(f"convert {args.input} to {args.output}")
ext_name = args.output.split(".")[-1]
if ext_name != "safetensors":
raise ValueError("output file should be safetensors file")
state_dict = torch.load(args.input)
save_file(state_dict, args.output, force_save_shared_tensor=True)
instance = Pt2SafeTensorConverter(
args.input, args.output, args.dry_run, args.enable_to_valid_state_dict, args.overwrite, args.use_direct_io
)
try:
instance.convert()
if args.validate_result:
instance.validate()
except Exception as e:
logger.error(f"convert failed.")
traceback.print_exc()
exit(1)
......@@ -15,16 +15,15 @@ limitations under the License.
'''
import os
from multiprocessing import shared_memory
from typing import Dict, Optional
import torch
from loguru import logger
from safetensors.torch import _remove_duplicate_names
from safetensors.torch import save_file as safetenors_save_file
from safetensors.torch import save_model as safetensors_save_model
from veturboio.loader import FasterPosixLoader, PosixLoader, SfcsClientLoader
from veturboio.ops.load_utils import IOHelper
from veturboio.ops.io_utils import IOHelper
from veturboio.safetensors import SafetensorsFile
from veturboio.saver import PosixSaver, SfcsClientSaver
from veturboio.types import FILE_PATH
......@@ -33,6 +32,8 @@ from veturboio.types import FILE_PATH
def is_sfcs_path(file: FILE_PATH):
if len(file) > 7 and file[:7] == "sfcs://":
return True, file[6:]
elif len(file) > 9 and file[:9] == "/dev/shm/":
return False, file
elif os.environ.get("VETURBOIO_USE_SFCS_SDK", "0") == "1":
return True, file
else:
......@@ -47,7 +48,8 @@ def load(
helper: Optional[IOHelper] = None,
use_pinmem: Optional[bool] = False,
use_direct_io: Optional[bool] = False,
use_cipher: Optional[bool] = False,
use_cipher: Optional[bool] = None,
state_dict: Dict[str, torch.Tensor] = None,
) -> Dict:
"""Load state dict object from checkpoint file. The file can be both safetensors file and pytorch file.
If the file is safetensors file, it will be loaded by veturboio and the loading speed will be accelerated.
......@@ -56,10 +58,14 @@ def load(
file (FILE_PATH): file path
map_location (str, optional): map location. Defaults to "cpu".
enable_fast_mode (bool, optional): enable fast mode. Defaults to True.
helper (IOHelper, optional): use IOHelper. Defaults to None.
use_pinmem (bool, optional): use pin memory. Defaults to False.
num_thread (int, optional): number of threads. Defaults to 32.
use_direct_io (bool, optional): open file in direct io mode. Defaults to False.
use_cipher (bool, optional): decrypt file. Defaults to False.
use_cipher (bool, optional): decrypt file. Defaults to None. Note: cipher is
disabled by force when use_cipher set to False. Otherwise, when use_cipher
set to True or environ VETURBOIO_USE_CIPHER set to '1', cipher is enabled.
state_dict (Dict): pre allocated state dict. Defaults to None.
Returns:
state_dict (Dict): state dict
......@@ -97,7 +103,56 @@ def load(
)
safetensors_file = SafetensorsFile(file, loader, use_cipher)
return safetensors_file.load(map_location=map_location)
return safetensors_file.load(map_location=map_location, state_dict=state_dict)
def load_to_shmem(
file: FILE_PATH,
num_thread: Optional[int] = 32,
helper: Optional[IOHelper] = None,
use_direct_io: Optional[bool] = False,
use_cipher: Optional[bool] = None,
) -> shared_memory.SharedMemory:
"""Load checkpoint file to shmem.
Args:
file (FILE_PATH): file path
num_thread (int, optional): number of threads. Defaults to 32.
helper (IOHelper, optional): use IOHelper. Defaults to None.
use_cipher (bool, optional): decrypt file. Defaults to None. Note: cipher is
disabled by force when use_cipher set to False. Otherwise, when use_cipher
set to True or environ VETURBOIO_USE_CIPHER set to '1', cipher is enabled.
Returns:
shmem (shared_memory.SharedMemory): shared memory object.
Examples:
```
import veturboio
shmem_file = veturboio.load_to_shmem("sfcs://model.safetensors")
```
"""
if helper is None:
helper = IOHelper()
use_sfcs_sdk, file = is_sfcs_path(file)
if use_sfcs_sdk:
loader = SfcsClientLoader(
helper=helper,
file=file,
num_thread=num_thread,
)
else:
loader = FasterPosixLoader(
file,
helper,
num_thread=num_thread,
use_direct_io=use_direct_io,
)
safetensors_file = SafetensorsFile(file, loader, use_cipher)
return safetensors_file.load_to_shmem()
def save_file(
......@@ -108,6 +163,8 @@ def save_file(
force_clone_shared_tensor: bool = False,
metadata: Dict[str, str] = None,
use_cipher: Optional[bool] = False,
helper: Optional[IOHelper] = None,
enable_fast_mode: Optional[bool] = False,
) -> None:
"""Save state dict object to safetensors file.
......@@ -120,6 +177,8 @@ def save_file(
when force_save_shared_tensor is enabled. Defaults to False.
metadata (Dict[str, str], optional): metadata. Defaults to None.
use_cipher (bool, optional): decrypt file. Defaults to False.
helper (IOHelper, optional): use IOHelper. Defaults to None.
enable_fast_mode (bool, optional): enable fast mode. Defaults to False.
Examples:
```
......@@ -130,18 +189,21 @@ def save_file(
veturboio.save_file(state_dict, "model.safetensors")
```
"""
if helper is None:
helper = IOHelper()
use_sfcs_sdk, file = is_sfcs_path(file)
if use_sfcs_sdk:
saver = SfcsClientSaver(file=file, use_cipher=use_cipher)
saver = SfcsClientSaver(file=file, use_cipher=use_cipher, helper=helper)
else:
saver = PosixSaver(file=file, use_cipher=use_cipher)
saver = PosixSaver(file=file, use_cipher=use_cipher, helper=helper)
# TODO: there are some bugs while state_dict is loaded from veturboio
if not force_save_shared_tensor:
if force_clone_shared_tensor:
logger.warning("force_clone_shared_tensor won't take any effect while force_save_shared_tensor is False;")
try:
saver.save_file(state_dict, metadata=metadata)
saver.save_file(state_dict, metadata=metadata, enable_fast_mode=enable_fast_mode)
except ValueError as e:
msg = str(e)
raise ValueError(msg)
......@@ -165,7 +227,7 @@ def save_file(
if force_contiguous:
state_dict = {k: v.contiguous() for k, v in state_dict.items()}
return saver.save_file(state_dict, metadata=metadata)
return saver.save_file(state_dict, metadata=metadata, enable_fast_mode=enable_fast_mode)
def save_model(model: torch.nn.Module, file: FILE_PATH, use_cipher: Optional[bool] = False) -> None:
......
......@@ -37,7 +37,12 @@ class BaseLoader:
def load_to_bytes(self, offset: int, count: int, cipher_info: CipherInfo = CipherInfo(False)) -> bytes:
raise NotImplementedError
def load_safetensors(self, safetensors_file: Any, map_location: str = "cpu") -> Dict[str, torch.Tensor]:
def load_safetensors(
self,
safetensors_file: Any,
map_location: str = "cpu",
state_dict: Dict[str, torch.Tensor] = None,
) -> Dict[str, torch.Tensor]:
raise NotImplementedError
def init_aligned_tensor(self, device, device_id: int, file_size, base_offset: int) -> torch.Tensor:
......@@ -74,20 +79,24 @@ class PosixLoader(BaseLoader):
decrypt(cipher_info, arr, arr, offset - h_off)
return arr.tobytes()
def load_safetensors(self, safetensors_file: Any, map_location: str = "cpu") -> Dict[str, torch.Tensor]:
state_dict = {}
def load_safetensors(
self,
safetensors_file: Any,
map_location: str = "cpu",
state_dict: Dict[str, torch.Tensor] = None,
) -> Dict[str, torch.Tensor]:
if not state_dict:
state_dict = {}
base_offset = safetensors_file.tensor_offset
device = torch.device(map_location)
cipher_info = safetensors_file._cipher_info
mp_mode = "c" if cipher_info.use_cipher else "r"
for tensor_meta in safetensors_file.meta.values():
tensor_bytes = np.memmap(
safetensors_file.file,
dtype=np.uint8,
mode=mp_mode,
mode="c",
offset=base_offset + tensor_meta.data_offsets[0],
shape=tensor_meta.data_offsets[1] - tensor_meta.data_offsets[0],
)
......
......@@ -16,13 +16,17 @@ limitations under the License.
import io
import os
import random
import string
from multiprocessing import shared_memory
from typing import Dict
import numpy as np
import torch
from veturboio.ops.cipher import CipherInfo, decrypt
from veturboio.ops.load_utils import IOHelper, load_file_to_tensor
from veturboio.ops.io_utils import IOHelper, load_file_to_tensor
from veturboio.ops.posix_utils import posix_read_file
from veturboio.safetensors import SafetensorsFile
from veturboio.types import FILE_PATH
......@@ -45,7 +49,10 @@ class FasterPosixLoader(PosixLoader):
self.use_direct_io = use_direct_io
def load_safetensors(
self, safetensors_file: SafetensorsFile, map_location: str = "cpu"
self,
safetensors_file: SafetensorsFile,
map_location: str = "cpu",
state_dict: Dict[str, torch.Tensor] = None,
) -> Dict[str, torch.Tensor]:
file_size = os.path.getsize(safetensors_file.file)
base_offset = safetensors_file.tensor_offset
......@@ -55,22 +62,70 @@ class FasterPosixLoader(PosixLoader):
else:
device_id = -1
total_tensor = self.init_aligned_tensor(device, device_id, file_size, base_offset)
load_file_to_tensor(
file_path=safetensors_file.file,
total_tensor=total_tensor,
sample_tensor=torch.ones([], dtype=torch.uint8),
offset=base_offset,
helper=self.helper,
device_id=device_id,
if state_dict:
for tensor_meta in safetensors_file._meta.values():
tensor = state_dict[tensor_meta.name]
if not tensor.is_contiguous():
raise RuntimeError("allocated tensor not contiguous")
if not tensor.dtype == tensor_meta.dtype:
raise RuntimeError("allocated tensor dtype not match")
offset = tensor_meta.data_offsets[0]
length = tensor_meta.data_offsets[1] - tensor_meta.data_offsets[0]
tensor_length = torch.numel(tensor) * tensor.element_size()
if tensor_length < length:
raise RuntimeError("allocated tensor size not enough")
load_file_to_tensor(
file_path=safetensors_file.file,
total_tensor=tensor,
length=length,
offset=base_offset + offset,
helper=self.helper,
device_id=device_id,
num_thread=self.num_thread,
use_pinmem=self.use_pinmem,
use_sfcs_sdk=False,
use_direct_io=self.use_direct_io,
cipher_info=safetensors_file._cipher_info,
)
tensor = tensor.resize_(tensor_meta.shape)
state_dict[tensor_meta.name] = tensor
return state_dict
else:
total_tensor = self.init_aligned_tensor(device, device_id, file_size, base_offset)
load_file_to_tensor(
file_path=safetensors_file.file,
total_tensor=total_tensor,
offset=base_offset,
helper=self.helper,
device_id=device_id,
num_thread=self.num_thread,
use_pinmem=self.use_pinmem,
use_sfcs_sdk=False,
use_direct_io=self.use_direct_io,
cipher_info=safetensors_file._cipher_info,
)
return SafetensorsFile.split_tensor_to_state_dict(total_tensor, safetensors_file)
def load_to_shmem(self, cipher_info: CipherInfo = CipherInfo(False)) -> shared_memory.SharedMemory:
file_size = os.path.getsize(self.file)
file_name = ''.join(random.sample(string.ascii_lowercase + string.ascii_uppercase, 10))
shm = shared_memory.SharedMemory(name=file_name, create=True, size=file_size)
h_off = CipherInfo.HEADER_SIZE if cipher_info.use_header else 0
candidate = np.frombuffer(shm.buf, dtype=np.byte)
posix_read_file(
self.file,
candidate,
length=file_size - h_off,
offset=h_off,
num_thread=self.num_thread,
use_pinmem=self.use_pinmem,
use_sfcs_sdk=False,
cipher_info=cipher_info,
use_direct_io=self.use_direct_io,
cipher_info=safetensors_file._cipher_info,
)
return SafetensorsFile.split_tensor_to_state_dict(total_tensor, safetensors_file)
return shm
def load_pt(
self, map_location: str = "cpu", cipher_info: CipherInfo = CipherInfo(False)
......
......@@ -15,7 +15,10 @@ limitations under the License.
'''
import os
import random
import string
from io import BytesIO
from multiprocessing import shared_memory
from typing import Dict
import numpy as np
......@@ -24,8 +27,14 @@ from numpy import ndarray
from veturboio.loader.base_loader import BaseLoader
from veturboio.ops.cipher import CipherInfo
from veturboio.ops.load_utils import IOHelper, load_file_to_tensor
from veturboio.ops.sfcs_utils import init_sfcs_conf, sfcs_get_file_size, sfcs_read_file
from veturboio.ops.io_utils import IOHelper, load_file_to_tensor
from veturboio.ops.sfcs_utils import (
init_sfcs_conf,
path_mapper,
sfcs_default_config,
sfcs_get_file_size,
sfcs_read_file,
)
from veturboio.safetensors import SafetensorsFile
from veturboio.types import FILE_PATH
......@@ -46,52 +55,110 @@ class SfcsClientLoader(BaseLoader):
self.num_thread = num_thread
self.use_pinmem = use_pinmem
self.use_direct_io = use_direct_io
init_sfcs_conf(file)
self._mount_path = init_sfcs_conf(file)
self._sfcs_valid_path = path_mapper(self.file, self._mount_path)
def load_to_bytes(self, offset: int, count: int, cipher_info: CipherInfo = CipherInfo(False)) -> bytes:
file_size = sfcs_get_file_size(self.file)
file_size = sfcs_get_file_size(self._sfcs_valid_path)
if offset + count > file_size:
count = file_size - offset
file_bytes = bytes(count)
candidate = np.frombuffer(file_bytes, dtype=np.byte)
sfcs_read_file(
self.file, candidate, length=count, offset=offset, num_thread=self.num_thread, cipher_info=cipher_info
self._sfcs_valid_path,
candidate,
length=count,
offset=offset,
num_thread=self.num_thread,
cipher_info=cipher_info,
)
return file_bytes
def load_to_shmem(self, cipher_info: CipherInfo = CipherInfo(False)) -> shared_memory.SharedMemory:
file_size = sfcs_get_file_size(self._sfcs_valid_path)
file_name = ''.join(random.sample(string.ascii_lowercase + string.ascii_uppercase, 10))
shm = shared_memory.SharedMemory(name=file_name, create=True, size=file_size)
h_off = CipherInfo.HEADER_SIZE if cipher_info.use_header else 0
candidate = np.frombuffer(shm.buf, dtype=np.byte)
sfcs_read_file(
self._sfcs_valid_path,
candidate,
length=file_size - h_off,
offset=h_off,
num_thread=self.num_thread,
cipher_info=cipher_info,
)
return shm
def load_safetensors(
self, safetensors_file: SafetensorsFile, map_location: str = "cpu"
self,
safetensors_file: SafetensorsFile,
map_location: str = "cpu",
state_dict: Dict[str, torch.Tensor] = None,
) -> Dict[str, torch.Tensor]:
file_size = sfcs_get_file_size(safetensors_file.file)
# TODO should be the same as self.loader
sfcs_valid_path = path_mapper(safetensors_file.file, self._mount_path)
file_size = sfcs_get_file_size(sfcs_valid_path)
base_offset = safetensors_file.tensor_offset
device = torch.device(map_location)
if device.type == "cuda":
device_id = device.index if device.index is not None else torch.cuda.current_device()
else:
device_id = -1
total_tensor = self.init_aligned_tensor(device, device_id, file_size, base_offset)
load_file_to_tensor(
file_path=safetensors_file.file,
total_tensor=total_tensor,
sample_tensor=torch.ones([], dtype=torch.uint8),
offset=base_offset,
helper=self.helper,
device_id=device_id,
num_thread=self.num_thread,
use_pinmem=self.use_pinmem,
use_sfcs_sdk=True,
use_direct_io=self.use_direct_io,
cipher_info=safetensors_file._cipher_info,
)
if state_dict:
for tensor_meta in safetensors_file._meta.values():
tensor = state_dict[tensor_meta.name]
if not tensor.is_contiguous():
raise RuntimeError("allocated tensor not contiguous")
if not tensor.dtype == tensor_meta.dtype:
raise RuntimeError("allocated tensor dtype not match")
offset = tensor_meta.data_offsets[0]
length = tensor_meta.data_offsets[1] - tensor_meta.data_offsets[0]
tensor_length = torch.numel(tensor) * tensor.element_size()
if tensor_length < length:
raise RuntimeError("allocated tensor size not enough")
load_file_to_tensor(
file_path=sfcs_valid_path,
total_tensor=tensor,
length=length,
offset=base_offset + offset,
helper=self.helper,
device_id=device_id,
num_thread=self.num_thread,
use_pinmem=self.use_pinmem,
use_sfcs_sdk=True,
use_direct_io=self.use_direct_io,
cipher_info=safetensors_file._cipher_info,
)
tensor = tensor.resize_(tensor_meta.shape)
state_dict[tensor_meta.name] = tensor
return state_dict
else:
total_tensor = self.init_aligned_tensor(device, device_id, file_size, base_offset)
load_file_to_tensor(
file_path=sfcs_valid_path,
total_tensor=total_tensor,
offset=base_offset,
helper=self.helper,
device_id=device_id,
num_thread=self.num_thread,
use_pinmem=self.use_pinmem,
use_sfcs_sdk=True,
use_direct_io=self.use_direct_io,
cipher_info=safetensors_file._cipher_info,
)
return SafetensorsFile.split_tensor_to_state_dict(total_tensor, safetensors_file)
def load_pt(
self, map_location: str = "cpu", cipher_info: CipherInfo = CipherInfo(False)
) -> Dict[str, torch.Tensor]:
file_size = sfcs_get_file_size(self.file)
file_size = sfcs_get_file_size(self._sfcs_valid_path)
h_off = CipherInfo.HEADER_SIZE if cipher_info.use_header else 0
file_bytes = self.load_to_bytes(offset=h_off, count=file_size - h_off, cipher_info=cipher_info)
return torch.load(BytesIO(file_bytes), map_location=map_location)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment