- 01 Jun, 2024 1 commit
-
-
Paweł Gadziński authored
* Llama 3 update Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Times update Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Times update Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * utils.py fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * utils.py fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * utils.py fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * update te llama tutorial to allow running with llama 3 weights Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * small fixes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * small fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * small fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add llama 3 vs llama 2 distinctions Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * paraphrasing and corrected facts Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Sudhakar Singh <sudhakars@nvidia.com>
-
- 28 May, 2024 1 commit
-
-
Tim Moon authored
* Use correct FP8 group in multi-GPU docs FP8 process group should be tensor-parallel group Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Synchronize FP8 scales over world group in multi-GPU docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 25 May, 2024 1 commit
-
-
Paweł Gadziński authored
* Fixed Llama tutorial. Changed batch size and added fused=True. Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
root <root@ipp2-0037.nvidia.com> * Tutorial updated but not complete yet. Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
root <root@ipp2-0037.nvidia.com> * Tutorial notebook reseted - removed fuse=true Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
root <root@ipp2-0037.nvidia.com> * Removed fused=true Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
root <root@ipp2-0037.nvidia.com> * Batch size back to 8 Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
root <root@ipp2-0037.nvidia.com> * Typo and commented out line Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
root <root@ipp2-0037.nvidia.com> * fixed whitespace Signed-off-by:
root <root@ipp2-0037.nvidia.com> * fixed whitespace Signed-off-by:
root <root@ipp2-0037.nvidia.com> * Added comment to attention line. Fixed potential bug with loading weights - now loading works correctly, confirmed by the generation code. Signed-off-by:
root <root@ipp2-1661.nvidia.com> * Comments Signed-off-by:
root <root@ipp2-1661.nvidia.com> * Models cast added again Signed-off-by:
root <root@ipp2-1661.nvidia.com> * Weight download info Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Moved parameter gate_proj_size to config Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * gate_proj_size removed and put immediate_size instead Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Llama 3 added to tutorial Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Typos fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Typos fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Fixed model loading Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Loading fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Different dim for attention Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Reversed other commit Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Changed name to kv_channels Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Fixed typo Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Back to kv_channels in transformer layer Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Back to kv_channels in transformer layer Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Small bug fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Small bug fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Test fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * changed file modes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * lint fix and resolved conflict Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * lint fix and resolved conflict Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Lint fix, hopefully last Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
root <root@ipp2-0037.nvidia.com> Signed-off-by:
root <root@ipp2-1661.nvidia.com> Co-authored-by:
root <root@ipp2-2373.nvidia.com> Co-authored-by:
root <root@ipp2-1588.nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
root <root@ipp2-0037.nvidia.com> Co-authored-by:
root <root@ipp2-1661.nvidia.com> Co-authored-by:
root <root@ipp2-2371.nvidia.com> Co-authored-by:
root <root@ipp2-1589.nvidia.com> Co-authored-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 31 Mar, 2024 1 commit
-
-
Paweł Gadziński authored
Llama tutorial fixes - all Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 20 Mar, 2024 1 commit
-
-
Sudhakar Singh authored
* tutorial and doc fixes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove extra code Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix typos Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com>
-
- 01 Mar, 2024 1 commit
-
-
Sudhakar Singh authored
-
- 08 Feb, 2024 1 commit
-
-
Quentin Anthony authored
Signed-off-by:Quentin Anthony <qganthony@yahoo.com>
-
- 19 Jan, 2024 1 commit
-
-
hugo-syn authored
Signed-off-by:hugo-syn <hugo.vincent@synacktiv.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 06 Dec, 2023 1 commit
-
-
Santosh Bhavani authored
* Add H200 perf non-alpha image Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst - non-transparent H200 plot Signed-off-by:
Santosh Bhavani <santosh@semantic.md> --------- Signed-off-by:
Santosh Bhavani <santosh@semantic.md>
-
- 24 Feb, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Remove redundant amax AR for SP case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * update advanced docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Jan, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* docs: remove build warnings and add FP8 caching note Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add comment about amax history Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 02 Dec, 2022 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com>
-
- 18 Nov, 2022 1 commit
-
-
Tim Moon authored
* Documentation for advanced perf optimizations Fix bug where we were doing backward passes inside fp8_autocast in example notebooks. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor tweaks to advanced perf optimization docs Review suggestions from @ptrendx Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rewording sequence parallelism in advanced perf optimization docs Review suggestion from @ksivaman Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-