1. 27 Jun, 2025 3 commits
  2. 26 Jun, 2025 2 commits
  3. 25 Jun, 2025 3 commits
  4. 23 Jun, 2025 1 commit
    • NourFahmy's avatar
      Fix Anthropic API compatibility issues in chat completions (#3054) · 8bc46207
      NourFahmy authored
      
      
      * Fix Anthropic API compatibility issues in chat completions
      
      solves two important compatibility issues between the LM Eval Harness and Anthropic's API:
      
      1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included
      2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters
      
      tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors
      
      * pacufy pre-commit
      
      * add type
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      8bc46207
  5. 20 Jun, 2025 1 commit
  6. 19 Jun, 2025 3 commits
  7. 16 Jun, 2025 2 commits
  8. 12 Jun, 2025 1 commit
  9. 08 Jun, 2025 1 commit
    • Baber Abbasi's avatar
      [longbench] fix metric calculation (#2983) · 147e9d61
      Baber Abbasi authored
      * use all answers
      
      * use middle truncation
      
      * maybe fix classification score
      
      * strip classification preds
      
      * [vllm] remove stop tokens post-hoc
      
      * strip all preds
      
      * pacify pre-commit
      
      * start on truncation utility
      
      * add to readme
      
      * add a footgun doc
      
      * fix newline in yaml templates
      
      * do not strip code_sim preds!
      
      * fix pre-commit config
      
      * fix instruction warning
      
      * add not to longbench readme
      147e9d61
  10. 03 Jun, 2025 4 commits
  11. 02 Jun, 2025 2 commits
  12. 26 May, 2025 2 commits
  13. 23 May, 2025 2 commits
  14. 22 May, 2025 1 commit
  15. 21 May, 2025 6 commits
  16. 19 May, 2025 3 commits
  17. 17 May, 2025 1 commit
  18. 15 May, 2025 2 commits