README.md 2.82 KB
Newer Older
huteng.ht's avatar
huteng.ht committed
1
2
3
# veTurboIO


4
[En](./README.md) | [中文](./README.zh.md)
huteng.ht's avatar
huteng.ht committed
5
6


7
8
9
A Python library for high-performance reading and writing of PyTorch model files 
developed by Volcano Engine. This library mainly implements based on the safetensors 
file format to achieve efficient storage and reading of tensor data.
huteng.ht's avatar
huteng.ht committed
10

11
## Install
huteng.ht's avatar
huteng.ht committed
12

13
It can be installed directly through the following way:
huteng.ht's avatar
huteng.ht committed
14
15
```bash
cd veturboio
16
python setup.py get_libcfs
huteng.ht's avatar
huteng.ht committed
17
18
19
python setup.py install
```

20
21
22
Tips: This instruction will preferentially download the whl file that matches the 
current Python and PyTorch versions. If no matching whl file is found, it will 
automatically download the source code for compilation and installation.
huteng.ht's avatar
huteng.ht committed
23
24


25
26
If the installation fails, you can also try to install by downloading the source code, 
and then compile and install it manually.
huteng.ht's avatar
huteng.ht committed
27

28
29
30
```bash
# CUDA ops, default
python setup.py install --cuda_ext
huteng.ht's avatar
huteng.ht committed
31

32
33
# NPU ops
python setup.py install --npu_ext
huteng.ht's avatar
huteng.ht committed
34

35
36
# CPU only
python setup.py install --cpu_ext
huteng.ht's avatar
huteng.ht committed
37
38
39
```


40
## Quick Start
huteng.ht's avatar
huteng.ht committed
41

42
### Read and write model files
huteng.ht's avatar
huteng.ht committed
43
44
45
46
47
48
49
50
51
52
53


```python
import torch
import veturboio

tensors = {
   "weight1": torch.zeros((1024, 1024)),
   "weight2": torch.zeros((1024, 1024))
}

54
veturboio.save_file(tensors, "model.safetensors")
huteng.ht's avatar
huteng.ht committed
55

56
new_tensors = veturboio.load("model.safetensors")
huteng.ht's avatar
huteng.ht committed
57
58
59

# check if the tensors are the same
for k, v in tensors.items():
60
    assert torch.allclose(v, new_tensors[k])
huteng.ht's avatar
huteng.ht committed
61
62
```

63
64
### Convert existing PyTorch files

huteng.ht's avatar
huteng.ht committed
65
66
67
68
```bash
python -m veturboio.convert -i model.pt -o model.safetensors
```

69
70
71
## Performance test

Run directly:
huteng.ht's avatar
huteng.ht committed
72
73
74
```bash
bash bench/io_bench.sh
```
75
Then, you can get the following results:
huteng.ht's avatar
huteng.ht committed
76
```
77
78
79
80
fs_name    tensor_size     veturboio load_time(s)             torch load_time(s)
shm        1073741824      0.08                               0.63
shm        2147483648      0.19                               1.26
shm        4294967296      0.36                               2.32
huteng.ht's avatar
huteng.ht committed
81
```
82
83

Also, you can run the following command to get more options:
huteng.ht's avatar
huteng.ht committed
84
85
86
87
```bash
python bench/io_bench.py -h
```

88
89
90
91
92
93
94
95
96
## Advance Features

### Using veMLP to accelerate reading and writing
Volcano Engine Machine Learning Platform (veMLP) provides a distributed cache file system
based on the physical disks of the GPU cluster. 

<p align="center">
    <img src="./docs/imgs/SFCS.png" style="zoom:15%;">
</p>
huteng.ht's avatar
huteng.ht committed
97

98
99
100
101
When a cluster-level task needs to read 
a model file, the caching system can efficiently distribute the model file between GPU 
machines via RDMA transfer, thus avoiding network transfer bottlenecks. When using this 
system, veTurboIO can maximize its performance advantages.
huteng.ht's avatar
huteng.ht committed
102

103
104
105
### Encrypt and decrypt model files
veTurboIO supports encryption and decryption of model files. You can read the [tutorial](./docs/encrypt_model.md) 
to learn how to keep your model files secure. When you use GPU as target device, veTurboIO can decrypt the model file on the fly.
huteng.ht's avatar
huteng.ht committed
106
107


108
## License
huteng.ht's avatar
huteng.ht committed
109

110
[Apache License 2.0](./LICENSE)
huteng.ht's avatar
huteng.ht committed
111