Commits · fb103cfd6e91dc1f72e884099aaed84ce5e198cf · OpenDAS / ColossalAI

20 Nov, 2023 2 commits
- [inference] update examples and engine (#5073) · fb103cfd
  Xu Kai authored Nov 20, 2023
```
* update examples and engine

* fix choices

* update example
```
  fb103cfd
- [hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) · 0c7d8beb
  Bin Jia authored Nov 20, 2023
  
  0c7d8beb
19 Nov, 2023 1 commit

[inference] Refactor inference architecture (#5057) · fd6482ad

Xu Kai authored Nov 19, 2023



* [inference] support only TP (#4998)

* support only tp

* enable tp

* add support for bloom (#5008)

* [refactor] refactor gptq and smoothquant llama (#5012)

* refactor gptq and smoothquant llama

* fix import error

* fix linear import torch-int

* fix smoothquant llama import error

* fix import accelerate error

* fix bug

* fix import smooth cuda

* fix smoothcuda

* [Inference Refactor] Merge chatglm2 with pp and tp (#5023)

merge chatglm with pp and tp

* [Refactor] remove useless inference code (#5022)

* remove useless code

* fix quant model

* fix test import bug

* mv original inference legacy

* fix chatglm2

* [Refactor] refactor policy search and quant type controlling in inference (#5035)

* [Refactor] refactor policy search and quant type controling in inference

* [inference] update readme (#5051)

* update readme

* update readme

* fix architecture

* fix table

* fix table

* [inference] udpate example (#5053)

* udpate example

* fix run.sh

* fix rebase bug

* fix some errors

* update readme

* add some features

* update interface

* update readme

* update benchmark

* add requirements-infer

---------
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>

fd6482ad

06 Nov, 2023 1 commit
- [format] applied code formatting on changed files in pull request 4926 (#5007) · c36e782d
  github-actions[bot] authored Nov 06, 2023
```
Co-authored-by: github-actions <github-actions@github.com>
```
  c36e782d
03 Nov, 2023 1 commit

[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) · 1a3315e3

littsk authored Nov 03, 2023



* [hotfix] Add layer norm gradients all-reduce for sequence parallel. (#4915)

* Add layer norm gradients all-reduce for sequence parallel.

* skip pipeline inference test

* [hotfix] fixing polices of sequence parallel (#4922)

* Add layer norm gradients all-reduce for sequence parallel.

* fix parameter passing when calling get_autopolicy

---------
Co-authored-by: littsk <1214689160@qq.com>

* Hotfix/add grad all reduce for sequence parallel (#4927)

* Add layer norm gradients all-reduce for sequence parallel.


* fix parameter passing when calling get_autopolicy

* fix bug using wrong variables

---------
Co-authored-by: littsk <1214689160@qq.com>

* fix policy initialization

* fix bloom and chatglm policices

* polish code of handling layernorm

* fix moe module

* polish code of class initializing

---------
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>

1a3315e3

01 Nov, 2023 1 commit

[Pipeline Inference] Merge pp with tp (#4993) · b6696beb

Bin Jia authored Nov 01, 2023

* refactor pipeline into new CaiInferEngine

* updata llama modeling forward

* merge tp with pp

* update docstring

* optimize test workflow and example

* fix typo

* add assert and todo

b6696beb

27 Oct, 2023 1 commit

[Pipeline inference] Combine kvcache with pipeline inference (#4938) · 1db67276

Bin Jia authored Oct 27, 2023

* merge kvcache with pipeline inference and refactor the code structure

* support ppsize > 2

* refactor pipeline code

* do pre-commit

* modify benchmark

* fix bench mark

* polish code

* add docstring and update readme

* refactor the code

* fix some logic bug of ppinfer

* polish readme

* fix typo

* skip infer test

1db67276

18 Oct, 2023 1 commit
- [format] applied code formatting on changed files in pull request 4820 (#4886) · 486d06a2
  github-actions[bot] authored Oct 18, 2023
```
Co-authored-by: github-actions <github-actions@github.com>
```
  486d06a2
11 Oct, 2023 1 commit

[Pipeline Inference] Sync pipeline inference branch to main (#4820) · 08a9f76b

Bin Jia authored Oct 11, 2023

* [pipeline inference] pipeline inference (#4492)

* add pp stage manager as circle stage

* fix a bug when create process group

* add ppinfer basic framework

* add micro batch manager and support kvcache-pp gpt2 fwd

* add generate schedule

* use mb size to control mb number

* support generate with kv cache

* add output, remove unused code

* add test

* reuse shardformer to build model

* refactor some code and use the same attribute name of hf

* fix review and add test for generation

* remove unused file

* fix CI

* add cache clear

* fix code error

* fix typo

* [Pipeline inference] Modify to tieweight (#4599)

* add pp stage manager as circle stage

* fix a bug when create process group

* add ppinfer basic framework

* add micro batch manager and support kvcache-pp gpt2 fwd

* add generate schedule

* use mb size to control mb number

* support generate with kv cache

* add output, remove unused code

* add test

* reuse shardformer to build model

* refactor some code and use the same attribute name of hf

* fix review and add test for generation

* remove unused file

* modify the way of saving newtokens

* modify to tieweight

* modify test

* remove unused file

* solve review

* add docstring

* [Pipeline inference] support llama pipeline inference (#4647)

* support llama pipeline inference

* remove tie weight operation

* [pipeline inference] Fix the blocking of communication when ppsize is 2 (#4708)

* add benchmark verbose

* fix export tokens

* fix benchmark verbose

* add P2POp style to do p2p communication

* modify schedule as p2p type when ppsize is 2

* remove unused code and add docstring

* [Pipeline inference] Refactor code, add docsting, fix bug (#4790)

* add benchmark script

* update argparse

* fix fp16 load

* refactor code style

* add docstring

* polish code

* fix test bug

* [Pipeline inference] Add pipeline inference docs (#4817)

* add readme doc

* add a ico

* Add performance

* update table of contents

* refactor code (#4873)

08a9f76b