- 27 Feb, 2024 1 commit
-
-
Hongxin Liu authored
-
- 15 Jan, 2024 1 commit
-
-
Wenhao Chen authored
* fix: fix misleading mbs arg * feat: add pp sanity check * fix: fix 1f1b sanity check
-
- 09 Jan, 2024 1 commit
-
-
Hongxin Liu authored
* update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by:
Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by:
zxl <43881818+oahzxl@users.noreply.github.com>
-
- 22 Dec, 2023 1 commit
-
-
Wenhao Chen authored
* test: add more p2p tests * fix: remove send_forward_recv_forward as p2p op list need to use the same group * fix: make send and receive atomic * feat: update P2PComm fn * feat: add metadata cache in 1f1b * feat: add metadata cache in interleaved pp * feat: modify is_xx_stage fn * revert: add _broadcast_object_list * feat: add interleaved pp in llama policy * feat: set NCCL_BUFFSIZE in HybridParallelPlugin
-
- 08 Dec, 2023 1 commit
-
-
flybird11111 authored
* fix aaa fix fix fix * fix * fix * test ci * fix ci fix
-
- 22 Nov, 2023 1 commit
-
-
Xuanlei Zhao authored
* llama 3d * update * fix autocast
-
- 20 Nov, 2023 2 commits
-
-
github-actions[bot] authored
Co-authored-by:github-actions <github-actions@github.com>
-
Hongxin Liu authored
* [npu] setup device utils (#5047) * [npu] add npu device support * [npu] support low level zero * [test] update npu zero plugin test * [hotfix] fix import * [test] recover tests * [npu] gemini support npu (#5052) * [npu] refactor device utils * [gemini] support npu * [example] llama2+gemini support npu * [kernel] add arm cpu adam kernel (#5065) * [kernel] add arm cpu adam * [optim] update adam optimizer * [kernel] arm cpu adam remove bf16 support
-
- 16 Nov, 2023 1 commit
-
-
Elsa Granger authored
[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) * Use p2p * Cannot bidirectonal send p2p * Refactor tensor creation and serialization in P2P communication * Fix llama forward args in flash attention * Add flop estimate from megatron * Support loading weight not in weight_map when strict=False in hybrid_parallel * Use send_forward_recv_backward, etc in 1f1b * Use dataclass for metdata Remove torch.cuda.synchronize() as suggested * Add comment about the torch.cuda.synchronize for potential error * Typo * Update hybrid_parallel_checkpoint_io.py * Update p2p.py * Update one_f_one_b.py * Update p2p.py --------- Co-authored-by:flybird11111 <1829166702@qq.com>
-
- 19 Sep, 2023 1 commit
-
-
Hongxin Liu authored
* [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format
-
- 28 Aug, 2023 1 commit
-
-
Hongxin Liu authored
* [example] transfer llama-1 example * [example] fit llama-2 * [example] refactor scripts folder * [example] fit new gemini plugin * [cli] fix multinode runner * [example] fit gemini optim checkpoint * [example] refactor scripts * [example] update requirements * [example] update requirements * [example] rename llama to llama2 * [example] update readme and pretrain script * [example] refactor scripts
-