1. 10 Jul, 2024 2 commits
  2. 02 Jul, 2024 2 commits
  3. 12 Jun, 2024 1 commit
  4. 09 Jun, 2024 1 commit
  5. 07 Jun, 2024 2 commits
  6. 05 Jun, 2024 1 commit
  7. 03 Jun, 2024 2 commits
  8. 02 Jun, 2024 1 commit
  9. 01 Jun, 2024 3 commits
  10. 31 May, 2024 4 commits
  11. 29 May, 2024 1 commit
  12. 25 May, 2024 2 commits
  13. 23 May, 2024 2 commits
  14. 22 May, 2024 3 commits
  15. 20 May, 2024 1 commit
  16. 16 May, 2024 4 commits
  17. 10 May, 2024 2 commits
  18. 09 May, 2024 2 commits
  19. 08 May, 2024 1 commit
  20. 07 May, 2024 2 commits
    • youkaichao's avatar
    • Philipp Moritz's avatar
      [Kernel] Make static FP8 scaling more robust (#4570) · a98187cf
      Philipp Moritz authored
      Previously FP8 static scaling works if the scales are overestimating the maxima of all activation tensors during computation. However this will not always be the case even if the scales were calibrated very carefully. For example, with the activations in my checkpoint
      
      https://huggingface.co/pcmoritz/Mixtral-8x7B-v0.1-fp8-act-scale
      
      (which was calibrated on https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), I'm getting the following mostly random performance on MMLU:
      
      |      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
      |------------------|-------|------|-----:|------|-----:|---|-----:|
      |mmlu              |N/A    |none  |     0|acc   |0.2295|±  |0.0035|
      | - humanities     |N/A    |none  |     5|acc   |0.2421|±  |0.0062|
      | - other          |N/A    |none  |     5|acc   |0.2398|±  |0.0076|
      | - social_sciences|N/A    |none  |     5|acc   |0.2171|±  |0.0074|
      | - stem           |N/A    |none  |     5|acc   |0.2125|±  |0.0073|
      With the fix in this PR where the scaled activations are clamped between [-std::numeric_limits<c10::Float8_e4m3fn>::max(), std::numeric_limits<c10::Float8_e4m3fn>::max()] to make sure there are no NaNs, the performance is
      
      |      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
      |------------------|-------|------|-----:|------|-----:|---|-----:|
      |mmlu              |N/A    |none  |     0|acc   |0.7008|±  |0.0036|
      | - humanities     |N/A    |none  |     5|acc   |0.6453|±  |0.0065|
      | - other          |N/A    |none  |     5|acc   |0.7692|±  |0.0072|
      | - social_sciences|N/A    |none  |     5|acc   |0.8083|±  |0.0070|
      | - stem           |N/A    |none  |     5|acc   |0.6115|±  |0.0083|
      This is not perfect yet but is getting very close to the FP16 / dynamic activation scale performance.
      a98187cf
  21. 03 May, 2024 1 commit