Commits · d882d18c6544d024dd181c04fbb8c10893d3a653 · OpenDAS / ColossalAI

27 Feb, 2024 1 commit
- [example] reuse flash attn patch (#5400) · d882d18c
  Hongxin Liu authored Feb 27, 2024
  
  d882d18c
30 Jan, 2024 1 commit
- fix typo change dosen't to doesn't (#5308) · 71321a07
  digger yu authored Jan 30, 2024
  
  71321a07
25 Jan, 2024 2 commits
- [feat] refactored extension module (#5298) · 7cfed5f0
  Frank Lee authored Jan 25, 2024
```
* [feat] refactored extension module

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish

* polish
```
  7cfed5f0
- fix some typo (#5307) · bce9499e
  digger yu authored Jan 25, 2024
  
  bce9499e
19 Jan, 2024 1 commit
- fix llama pretrain (#5287) · f7e3f82a
  flybird11111 authored Jan 19, 2024
  
  f7e3f82a
15 Jan, 2024 1 commit
- [hotfix]: add pp sanity check and fix mbs arg (#5268) · ef4f0ee8
  Wenhao Chen authored Jan 15, 2024
```
* fix: fix misleading mbs arg

* feat: add pp sanity check

* fix: fix 1f1b sanity check
```
  ef4f0ee8
11 Jan, 2024 1 commit
- [doc] fix doc typo (#5256) · c174c4fc
  binmakeswell authored Jan 11, 2024
```
* [doc] fix annotation display

* [doc] fix llama2 doc
```
  c174c4fc
09 Jan, 2024 1 commit

[npu] change device to accelerator api (#5239) · d202cc28

Hongxin Liu authored Jan 09, 2024



* update accelerator

* fix timer

* fix amp

* update

* fix

* update bug

* add error raise

* fix autocast

* fix set device

* remove doc accelerator

* update doc

* update doc

* update doc

* use nullcontext

* update cpu

* update null context

* change time limit for example

* udpate

* update

* update

* update

* [npu] polish accelerator code

---------
Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com>
Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>

d202cc28

08 Jan, 2024 1 commit

[npu] use extension for op builder (#5172) · dd2c28a3

Xuanlei Zhao authored Jan 08, 2024

* update extension

* update cpu adam

* update is

* add doc for cpu adam

* update kernel

* update commit

* update flash

* update memory efficient

* update flash attn

* update flash attention loader

* update api

* fix

* update doc

* update example time limit

* reverse change

* fix doc

* remove useless kernel

* fix

* not use warning

* update

* update

dd2c28a3

02 Jan, 2024 1 commit

[pipeline]: support arbitrary batch size in forward_only mode (#5201) · 3c0d82b1

Wenhao Chen authored Jan 02, 2024

* fix: remove drop last in val & test dataloader

* feat: add run_forward_only, support arbitrary bs

* chore: modify ci script

3c0d82b1

22 Dec, 2023 1 commit

[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) · 4fa689fc

Wenhao Chen authored Dec 22, 2023

* test: add more p2p tests

* fix: remove send_forward_recv_forward as p2p op list need to use the same group

* fix: make send and receive atomic

* feat: update P2PComm fn

* feat: add metadata cache in 1f1b

* feat: add metadata cache in interleaved pp

* feat: modify is_xx_stage fn

* revert: add _broadcast_object_list

* feat: add interleaved pp in llama policy

* feat: set NCCL_BUFFSIZE in HybridParallelPlugin

4fa689fc

08 Dec, 2023 1 commit
- [gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) · 21aa5de0
  flybird11111 authored Dec 08, 2023
```
* fix

aaa

fix

fix

fix

* fix

* fix

* test ci

* fix ci

fix
```
  21aa5de0
28 Nov, 2023 2 commits

[doc] add moe news (#5128) · 177c79f2
binmakeswell authored Nov 28, 2023
```
* [doc] add moe news

* [doc] add moe news

* [doc] add moe news
```
177c79f2

[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) · 7172459e

Wenhao Chen authored Nov 28, 2023



* [shardformer] implement policy for all GPT-J models and test

* [shardformer] support interleaved pipeline parallel for bert finetune

* [shardformer] shardformer support falcon (#4883)

* [shardformer]: fix interleaved pipeline for bert model (#5048)

* [hotfix]: disable seq parallel for gptj and falcon, and polish code (#5093)

* Add Mistral support for Shardformer (#5103)

* [shardformer] add tests to mistral (#5105)

---------
Co-authored-by: Pengtai Xu <henryxu880@gmail.com>
Co-authored-by: ppt0011 <143150326+ppt0011@users.noreply.github.com>
Co-authored-by: flybird11111 <1829166702@qq.com>
Co-authored-by: eric8607242 <e0928021388@gmail.com>

7172459e

27 Nov, 2023 1 commit
- [nfc] fix typo change directoty to directory (#5111) · d5661f0f
  digger yu authored Nov 27, 2023
  
  d5661f0f
22 Nov, 2023 2 commits
- [npu] add npu support for hybrid plugin and llama (#5090) · 3acbf6d4
  Xuanlei Zhao authored Nov 22, 2023
```
* llama 3d

* update

* fix autocast
```
  3acbf6d4
- [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) · aae49663
  flybird11111 authored Nov 22, 2023
```
* fix flash attn

* fix

fix
```
  aae49663
21 Nov, 2023 1 commit

[inference] refactor examples and fix schedule (#5077) · 1cd7efc5

Hongxin Liu authored Nov 21, 2023

* [setup] refactor infer setup

* [hotfix] fix infenrece behavior on 1 1 gpu

* [exmaple] refactor inference examples

1cd7efc5

20 Nov, 2023 5 commits

[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) · 4e3959d3
Bin Jia authored Nov 20, 2023
```
* fix init model with random parameters

* fix example
```
4e3959d3
[format] applied code formatting on changed files in pull request 5067 (#5072) · 8921a73c
github-actions[bot] authored Nov 20, 2023
```
Co-authored-by: github-actions <github-actions@github.com>
```
8921a73c
[inference] update examples and engine (#5073) · fb103cfd
Xu Kai authored Nov 20, 2023
```
* update examples and engine

* fix choices

* update example
```
fb103cfd

[npu] add npu support for gemini and zero (#5067) · e5ce4c8e

Hongxin Liu authored Nov 20, 2023

* [npu] setup device utils (#5047)

* [npu] add npu device support

* [npu] support low level zero

* [test] update npu zero plugin test

* [hotfix] fix import

* [test] recover tests

* [npu] gemini support npu (#5052)

* [npu] refactor device utils

* [gemini] support npu

* [example] llama2+gemini support npu

* [kernel] add arm cpu adam kernel (#5065)

* [kernel] add arm cpu adam

* [optim] update adam optimizer

* [kernel] arm cpu adam remove bf16 support

e5ce4c8e

[Kernels]added flash-decoidng of triton (#5063) · bce91970

Cuiqing Li (李崔卿) authored Nov 20, 2023



* added flash-decoidng of triton based on lightllm kernel

* add req

* clean

* clean

* delete build.sh

---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>

bce91970

19 Nov, 2023 1 commit

[inference] Refactor inference architecture (#5057) · fd6482ad

Xu Kai authored Nov 19, 2023



* [inference] support only TP (#4998)

* support only tp

* enable tp

* add support for bloom (#5008)

* [refactor] refactor gptq and smoothquant llama (#5012)

* refactor gptq and smoothquant llama

* fix import error

* fix linear import torch-int

* fix smoothquant llama import error

* fix import accelerate error

* fix bug

* fix import smooth cuda

* fix smoothcuda

* [Inference Refactor] Merge chatglm2 with pp and tp (#5023)

merge chatglm with pp and tp

* [Refactor] remove useless inference code (#5022)

* remove useless code

* fix quant model

* fix test import bug

* mv original inference legacy

* fix chatglm2

* [Refactor] refactor policy search and quant type controlling in inference (#5035)

* [Refactor] refactor policy search and quant type controling in inference

* [inference] update readme (#5051)

* update readme

* update readme

* fix architecture

* fix table

* fix table

* [inference] udpate example (#5053)

* udpate example

* fix run.sh

* fix rebase bug

* fix some errors

* update readme

* add some features

* update interface

* update readme

* update benchmark

* add requirements-infer

---------
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com>
Co-authored-by: Zhongkai Zhao <kanezz620@gmail.com>

fd6482ad

18 Nov, 2023 1 commit
- [exampe] fix llama example' loss error when using gemini plugin (#5060) · bc09b95f
  flybird11111 authored Nov 18, 2023
```
fix llama example
```
  bc09b95f
16 Nov, 2023 2 commits

[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading... · b2ad0d9e

Elsa Granger authored Nov 16, 2023


[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017)

* Use p2p

* Cannot bidirectonal send p2p

* Refactor tensor creation and serialization in P2P
communication

* Fix llama forward args in flash attention

* Add flop estimate from megatron

* Support loading weight not in weight_map when strict=False in hybrid_parallel

* Use send_forward_recv_backward, etc in 1f1b

* Use dataclass for metdata
Remove torch.cuda.synchronize() as suggested

* Add comment about the torch.cuda.synchronize for potential error

* Typo

* Update hybrid_parallel_checkpoint_io.py

* Update p2p.py

* Update one_f_one_b.py

* Update p2p.py

---------
Co-authored-by: flybird11111 <1829166702@qq.com>

b2ad0d9e

[Kernels]Update triton kernels into 2.1.0 (#5046) · 28052a71

Cuiqing Li (李崔卿) authored Nov 16, 2023



* update flash-context-attention

* adding kernels

* fix

* reset

* add build script

* add building process

* add llama2 exmaple

* add colossal-llama2 test

* clean

* fall back test setting

* fix test file

* clean

* clean

* clean

---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>

28052a71

10 Nov, 2023 1 commit

[hotfix] Suport extra_kwargs in ShardConfig (#5031) · 70885d70

Zhongkai Zhao authored Nov 10, 2023

* [refactor]: replace inference args with extra_kwargs in ShardConfig

* modify shardconfig

* polish code

* fix policy bug in llama

* fix bug in auto policy

* remove setattr in ShardConfig

70885d70

09 Nov, 2023 1 commit

[moe]: fix ep/tp tests, add hierarchical all2all (#4982) · 72444127

Wenhao Chen authored Nov 09, 2023

* fix: add warning for EP different behavior

* fix: use shard_data in ep & tp model

* to: add used_capacity

* fix: fix router test

* feat: add create_ep_node_group

* feat: add create_ep_hierarchical_group fn

* feat: add HierarchicalAllToAll

* test: add hierarchical all2all test

* fix: fix test errors

* fix: simplify create_ep_hierarchical_group

* fix: add hierarchical_alltoall arg

* fix: fix environ typo

* revert: revert process mesh order

* to: add todo mark

* fix: skip hierarchical_comm if torch < 1.13.1

72444127

08 Nov, 2023 1 commit

[moe] support optimizer checkpoint (#5015) · f71e63b0

Xuanlei Zhao authored Nov 08, 2023

* Refactor MoE Manager setup method

* unshard optim ckpt

* optim io

* update transformer version

* update requirements

* update ckpt

* update ckpt

* update ckpt

* fix engine

* fix engine

f71e63b0

02 Nov, 2023 1 commit
- [moe] merge moe into main (#4978) · dc003c30
  Xuanlei Zhao authored Nov 02, 2023
```
* update moe module
* support openmoe
```
  dc003c30
30 Oct, 2023 1 commit

[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama... · 459a88c8

Cuiqing Li authored Oct 30, 2023


[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding for llama token attention  (#4965)

* adding flash-decoding

* clean

* adding kernel

* adding flash-decoding

* add integration

* add

* adding kernel

* adding kernel

* adding triton 2.1.0 features for inference

* update bloom triton kernel

* remove useless vllm kernels

* clean codes

* fix

* adding files

* fix readme

* update llama flash-decoding

---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>

459a88c8

27 Oct, 2023 1 commit
- updated c++17 compiler flags (#4983) · 4e4a10c9
  アマデウス authored Oct 27, 2023
  
  4e4a10c9
24 Oct, 2023 1 commit

[Inference]ADD Bench Chatglm2 script (#4963) · c6cd629e

Jianghai authored Oct 24, 2023

* add bench chatglm

* fix bug and make utils

---------

Co-authored-by: CjhHa1 <cjh18671720497outlook.com>

c6cd629e

20 Oct, 2023 1 commit

[inference] add reference and fix some bugs (#4937) · 785802e8

Xu Kai authored Oct 20, 2023



* add reference and fix some bugs

* update gptq init

---------
Co-authored-by: Xu Kai <xukai16@foxamil.com>

785802e8

19 Oct, 2023 1 commit

[Refactor] Integrated some lightllm kernels into token-attention (#4946) · 3a41e830

Cuiqing Li authored Oct 19, 2023



* add some req for inference

* clean codes

* add codes

* add some lightllm deps

* clean codes

* hello

* delete rms files

* add some comments

* add comments

* add doc

* add lightllm deps

* add lightllm cahtglm2 kernels

* add lightllm cahtglm2 kernels

* replace rotary embedding with lightllm kernel

* add some commnets

* add some comments

* add some comments

* add

* replace fwd kernel att1

* fix a arg

* add

* add

* fix token attention

* add some comments

* clean codes

* modify comments

* fix readme

* fix bug

* fix bug

---------
Co-authored-by: cuiqing.li <lixx336@gmail.com>
Co-authored-by: CjhHa1 <cjh18671720497@outlook.com>

3a41e830

16 Oct, 2023 1 commit

[inference] Add smmoothquant for llama (#4904) · 611a5a80

Xu Kai authored Oct 16, 2023

* [inference] add int8 rotary embedding kernel for smoothquant (#4843)

* [inference] add smoothquant llama attention (#4850)

* add smoothquant llama attention

* remove uselss code

* remove useless code

* fix import error

* rename file name

* [inference] add silu linear fusion for smoothquant llama mlp  (#4853)

* add silu linear

* update skip condition

* catch smoothquant cuda lib exception

* prcocess exception for tests

* [inference] add llama mlp for smoothquant (#4854)

* add llama mlp for smoothquant

* fix down out scale

* remove duplicate lines

* add llama mlp check

* delete useless code

* [inference] add smoothquant llama (#4861)

* add smoothquant llama

* fix attention accuracy

* fix accuracy

* add kv cache and save pretrained

* refactor example

* delete smooth

* refactor code

* [inference] add smooth function and delete useless code for smoothquant (#4895)

* add smooth function and delete useless code

* update datasets

* remove duplicate import

* delete useless file

* refactor codes (#4902)

* rafactor code

* add license

* add torch-int and smoothquant license

611a5a80

07 Oct, 2023 1 commit
- [nfc] fix minor typo in README (#4846) · 8aed02b9
  Blagoy Simandoff authored Oct 07, 2023
  
  8aed02b9
04 Oct, 2023 2 commits
- [infer] fix test bug (#4838) · d1fcc0fa
  Xu Kai authored Oct 04, 2023
```
* fix test bug

* delete useless code

* fix typo
```
  d1fcc0fa
- [inference]fix import bug and delete down useless init (#4830) · 013a4bed
  Jianghai authored Oct 04, 2023
```
* fix import bug and release useless init

* fix

* fix

* fix
```
  013a4bed