README.md 729 Bytes
Newer Older
Olivier Dehaene's avatar
Init  
Olivier Dehaene committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# BLOOM Inference

A Rust and gRPC server for BLOOM Inference.

## Install

```shell
cd server
pip install .
```

```
cd router
cargo build --release
```

## Run

```shell
python server/bloom_inference/main.py bigscience/bloom --num-gpus 8 --shard-directory /dev/shm/models
```

```shell
./router/target/release/router
```

## TODO:

- [ ] Improve model download
  - Store "shardable" layers separately and layer by layer
- [ ] Add batching args to router CLI 
- [ ] Add docstrings + comments everywhere as the codebase is fairly complicated
- [ ] Add tests
- [ ] Add shutdown logic in router and server
- [ ] Improve multi-processing logic in server
- [ ] Improve error handling everywhere
- [ ] Improve past key layer indexing?