- 03 Apr, 2023 1 commit
-
-
Mostofa Patwary authored
-
- 31 Mar, 2023 1 commit
-
-
Mostofa Patwary authored
-
- 22 Feb, 2023 1 commit
-
-
Lawrence McAfee authored
-
- 24 Sep, 2022 1 commit
-
-
Jared Casper authored
-
- 23 Sep, 2022 2 commits
-
-
Jared Casper authored
-
Jared Casper authored
The LICENSE file says everything is 3-clause BSD, which is what we want, but at some point the Apache license was added to the top of some files and that proliferated. This commit removes the Apache license from any files that we own the copyright to. Also updates the copyright year and removes the unnessary coding=utf-8 line.
-
- 21 Jul, 2022 1 commit
-
-
Jared Casper authored
Remove deprecated destination argument to state_dict functions and make all arguments keyword to avoid warnings.
-
- 24 May, 2022 2 commits
-
-
Vijay Korthikanti authored
-
Vijay Korthikanti authored
-
- 20 May, 2022 2 commits
-
-
Vijay Korthikanti authored
-
Vijay Korthikanti authored
-
- 17 May, 2022 1 commit
-
-
Vijay Korthikanti authored
-
- 28 Apr, 2022 1 commit
-
-
Vijay Korthikanti authored
-
- 30 Mar, 2022 1 commit
-
-
Vijay Korthikanti authored
-
- 15 Mar, 2022 2 commits
-
-
Lawrence McAfee authored
-
Lawrence McAfee authored
-
- 07 Mar, 2022 2 commits
-
-
Vijay Korthikanti authored
-
Vijay Korthikanti authored
-
- 02 Mar, 2022 1 commit
-
-
Vijay Korthikanti authored
-
- 19 Feb, 2022 1 commit
-
-
Vijay Korthikanti authored
-
- 16 Feb, 2022 1 commit
-
-
Sangkug Lym authored
remove redundant linear layer class definition add fuse_gradient_accumulation attribute to weights for simple targetting reflect feedback and clean up the codes arg change
-
- 17 Dec, 2021 1 commit
-
-
Vijay Korthikanti authored
-
- 01 Oct, 2021 1 commit
-
-
Jared Casper authored
Adds some backward compatibility code so old inference code still works.
-
- 29 Sep, 2021 1 commit
-
-
mshoeybi authored
-
- 20 Sep, 2021 1 commit
-
-
Mohammad Shoeybi authored
-
- 02 Sep, 2021 1 commit
-
-
Jared Casper authored
-
- 18 Aug, 2021 1 commit
-
-
Jared Casper authored
-
- 30 Jul, 2021 3 commits
-
-
Deepak Narayanan authored
-
Deepak Narayanan authored
-
Deepak Narayanan authored
- Accumulate encoder hidden state gradient to handle skip connection - Correctly compute the number of layers in encoder / decoder for T5 model - Ensure e weights are initialized the same way in embeddings - Synchronize embedding gradients across encoder and decoder for T5 model - Support for checkpoint loading and saving
-
- 18 May, 2021 1 commit
-
-
Mostofa Patwary authored
-
- 12 May, 2021 1 commit
-
-
Mostofa Patwary authored
-
- 11 May, 2021 1 commit
-
-
Mostofa Patwary authored
-
- 02 Apr, 2021 1 commit
-
-
Jared Casper authored
-
- 24 Mar, 2021 1 commit
-
-
Vijay Korthikanti authored
-
- 29 Jan, 2021 1 commit
-
-
Mostofa Patwary authored
-
- 22 Jan, 2021 1 commit
-
-
Vijay Korthikanti authored
-
- 13 Jan, 2021 1 commit
-
-
Vijay Korthikanti authored
-
- 12 Jan, 2021 2 commits
-
-
Vijay Korthikanti authored
-
Vijay Korthikanti authored
-