- 15 Aug, 2023 15 commits
-
-
Hongxin Liu authored
* [shardformer] embedding support inplace sharding * [shardformer] linear support inplace sharding * [shardformer] layernorm support inplace sharding * [shardformer] qkv support inplace sharding * [test] update shardformer layer test * [shardformer] fix shared param sharding * [shardformer] fix bert policy * [shardformer] fix bloom policy * [shardformer] fix llama policy * [shardformer] fix opt policy * [shardformer] fix t5 policy * [shardformer] fix fused qkv linear * [shardformer] fix bugs * force sync * [test] fix bugs * [test] fix transformer version
-
Baizhou Zhang authored
[pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245) * change for transformers loggers * add forward for GPT2ForQuestionAnswering * fix assert * fix torchrec test
-
Jianghai authored
* hot fix * hot fx tracer
-
Jianghai authored
* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache * support all bloom models * add bloom models policies * finish bloom pipeline and tests * add set pipeline * finish bloom
-
Jianghai authored
* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * finish some bert models * finish all bert models * finish bert tests * fix bugs * fix bugs * fix test pipeline * fix data gen for qa * update the set pipeline forward * shared params * fix bugs
-
Baizhou Zhang authored
* add forward for GPTLMHeadModel * add test for gpt_lm * arranging get_held_layers method * arrange forward replacement * add forward for GPT2ForTokenClassification * add forward for GPT2ForSequenceClassification * fix test_shard_gpt2.py * add GPT2DoubleHeadsmodel & fix bugs * add id checking in get_shared_params
-
Baizhou Zhang authored
* * fix typehint & docstring in sharder.py * * update pipeline forward for GPT2Model * * add test for pipeline forward of GPT2Model * * add cache cleaning in gpt2 test * * change assert to raise command
-
Jianghai authored
* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache
-
Jianghai authored
* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision
-
Jianghai authored
* bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt
-
Jianghai authored
* add pipeline forward * complete pipeline forward check * fix bert forward without pipeline * fix comments * discard useless line * add todo * clean prints * fix distribute layers
-
Hongxin Liu authored
* [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test
-
Jianghai authored
* move bert related pipeline components to shardformer * fix bugs * revision * fix bert model tests * fix bert_lm_head model tests * fix tests * fix tests * done checks * skip bloom
-
ver217 authored
-
ver217 authored
-
- 04 Jul, 2023 25 commits
-
-
github-actions[bot] authored
Co-authored-by:github-actions <github-actions@github.com>
-
Frank Lee authored
* [shardformer] made tensor parallelism configurable * polish code
-
Frank Lee authored
* [shardformer] refactored some doc and api * polish code
-
Frank Lee authored
-
Frank Lee authored
-
Frank Lee authored
-
Frank Lee authored
-
Kun Lin authored
* first v of vit shardformer * keep vit * update * vit shard add vitattention vitlayer * update num head shard para * finish test for vit * add new_model_class & postprocess * add vit readme * delete old files & fix the conflict * fix sth
-
jiangmingyan authored
* [shardformer] shardformer support opt models * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix * [shardformer] shardformer support opt models, fix
-
Frank Lee authored
-
FoolPlayer authored
* add layernorm to bert * add layernorm test * add layernorm test with load state dict * add use_mixedfusedLN in shard config * refactor policy to support fused_layernorm
-
Frank Lee authored
-
FoolPlayer authored
* add linearconv1d test * add linearconv1d test
-
Frank Lee authored
* [shardformer] support module saving and loading * polish code
-
FoolPlayer authored
* support kit use for bert test * support kit test for gpt2
-
Frank Lee authored
-
Frank Lee authored
* [shardformer] adapted T5 and LLaMa test to use kit * polish code
-
FoolPlayer authored
* add gpt2 test and layer class refactor * add dropout in gpt2 policy
-
Frank Lee authored
-
Frank Lee authored
-
FoolPlayer authored
* fix bert downstream with new api * remove comment line
-
FoolPlayer authored
-
Frank Lee authored
* [shardformer] refactored embedding and dropout to parallel module * polish code
-
FoolPlayer authored
-
Frank Lee authored
* [shardformer] integrated linear 1D with dtensor * polish code
-