- 09 Jan, 2024 1 commit
-
-
Hongxin Liu authored
* update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by:
Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by:
zxl <43881818+oahzxl@users.noreply.github.com>
-
- 17 Nov, 2023 1 commit
-
-
Wenhao Chen authored
* feat: modify create_ep_hierarchical_group args * test: add ep tests * fix: remove get_process_group_ranks * fix: fix src_rank
-
- 09 Nov, 2023 1 commit
-
-
Wenhao Chen authored
* fix: add warning for EP different behavior * fix: use shard_data in ep & tp model * to: add used_capacity * fix: fix router test * feat: add create_ep_node_group * feat: add create_ep_hierarchical_group fn * feat: add HierarchicalAllToAll * test: add hierarchical all2all test * fix: fix test errors * fix: simplify create_ep_hierarchical_group * fix: add hierarchical_alltoall arg * fix: fix environ typo * revert: revert process mesh order * to: add todo mark * fix: skip hierarchical_comm if torch < 1.13.1
-
- 08 Nov, 2023 1 commit
-
-
Xuanlei Zhao authored
* Refactor MoE Manager setup method * unshard optim ckpt * optim io * update transformer version * update requirements * update ckpt * update ckpt * update ckpt * fix engine * fix engine
-
- 02 Nov, 2023 1 commit
-
-
Xuanlei Zhao authored
* update moe module * support openmoe
-
- 19 Sep, 2023 1 commit
-
-
Hongxin Liu authored
* [misc] update pre-commit * [misc] run pre-commit * [misc] remove useless configuration files * [misc] ignore cuda for clang-format
-
- 18 Sep, 2023 1 commit
-
-
Hongxin Liu authored
* [legacy] remove outdated codes of pipeline (#4692) * [legacy] remove cli of benchmark and update optim (#4690) * [legacy] remove cli of benchmark and update optim * [doc] fix cli doc test * [legacy] fix engine clip grad norm * [legacy] remove outdated colo tensor (#4694) * [legacy] remove outdated colo tensor * [test] fix test import * [legacy] move outdated zero to legacy (#4696) * [legacy] clean up utils (#4700) * [legacy] clean up utils * [example] update examples * [legacy] clean up amp * [legacy] fix amp module * [legacy] clean up gpc (#4742) * [legacy] clean up context * [legacy] clean core, constants and global vars * [legacy] refactor initialize * [example] fix examples ci * [example] fix examples ci * [legacy] fix tests * [example] fix gpt example * [example] fix examples ci * [devops] fix ci installation * [example] fix examples ci
-
- 05 Sep, 2023 1 commit
-
-
Hongxin Liu authored
* [legacy] move engine to legacy * [example] fix seq parallel example * [example] fix seq parallel example * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [example] update seq parallel requirements
-
- 11 May, 2023 1 commit
-
-
digger-yu authored
* fix spelling error with examples/comminity/ * fix spelling error with tests/ * fix some spelling error with tests/ colossalai/ etc. * fix spelling error with tests/ etc. date:2023.5.10
-
- 06 Apr, 2023 2 commits
- 04 Apr, 2023 1 commit
-
-
ver217 authored
* [zero] refactor low-level zero folder structure * [zero] fix legacy zero import path * [zero] fix legacy zero import path * [zero] remove useless import * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] fix test import path * [zero] fix test * [zero] fix circular import * [zero] update import
-
- 31 Mar, 2023 1 commit
-
-
HELSON authored
* [moe] add checkpoint for moe models * [hotfix] fix bugs in unit test
-
- 30 Nov, 2022 1 commit
-
-
Jiarui Fang authored
-
- 23 Sep, 2022 2 commits
- 22 Sep, 2022 1 commit
-
-
HELSON authored
* remove forced FP32 modules * correct no_shard-contexts' positions
-
- 14 Apr, 2022 1 commit
-
-
Frank Lee authored
* [test] refactored with the new rerun decorator * polish test case
-
- 13 Apr, 2022 2 commits
- 12 Apr, 2022 2 commits
-
-
Jiarui Fang authored
-
HELSON authored
-
- 11 Apr, 2022 2 commits
-
-
Jiarui Fang authored
-
HELSON authored
* adapt post grad hooks for not-shard parameters * adapt optimizer for not-shard parameters * offload gradients for not-replicated parameters
-
- 08 Apr, 2022 1 commit
-
-
HELSON authored
-
- 07 Apr, 2022 1 commit
-
-
HELSON authored
* adapt model weight initialization for methods in Pytorch nn.init
-
- 03 Apr, 2022 1 commit
-
-
Jiarui Fang authored
-
- 02 Apr, 2022 2 commits
- 01 Apr, 2022 1 commit
-
-
HELSON authored
-
- 31 Mar, 2022 2 commits
-
-
HELSON authored
* support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler
-
Jiarui Fang authored
-
- 29 Mar, 2022 2 commits
-
-
HELSON authored
-
Liang Bowen authored
-
- 25 Mar, 2022 1 commit
-
-
Frank Lee authored
-
- 23 Mar, 2022 1 commit
-
-
Jiarui Fang authored
-
- 21 Mar, 2022 2 commits
-
-
Jiarui Fang authored
-
HELSON authored
-
- 18 Mar, 2022 1 commit
-
-
HELSON authored
-
- 11 Mar, 2022 1 commit
-
-
1SAA authored
-