1. 16 Oct, 2025 1 commit
  2. 15 Oct, 2025 9 commits
  3. 14 Oct, 2025 2 commits
  4. 02 Oct, 2025 1 commit
  5. 21 Sep, 2025 1 commit
  6. 12 Sep, 2025 1 commit
  7. 08 Sep, 2025 2 commits
  8. 27 Aug, 2025 1 commit
  9. 26 Aug, 2025 1 commit
    • Janna's avatar
      Support for AIME dataset (#3248) · 5ac7cdf8
      Janna authored
      * add AIME tasks
      
      * standardize the repeats
      
      * fix task naming
      
      * aime25 only has test set
      
      * edit readme
      
      * add utils
      
      * standardize
      
      * fix case sensitivity
      
      * repeat once
      
      * lint
      
      * more linting
      
      * lint huggingface.py
      5ac7cdf8
  10. 25 Aug, 2025 1 commit
  11. 21 Aug, 2025 2 commits
  12. 13 Aug, 2025 1 commit
  13. 02 Aug, 2025 1 commit
  14. 24 Jul, 2025 2 commits
  15. 23 Jul, 2025 3 commits
  16. 18 Jul, 2025 2 commits
  17. 16 Jul, 2025 1 commit
    • Baber Abbasi's avatar
      truncate thinking tags in generations (#3145) · 51ede33c
      Baber Abbasi authored
      * feat: add postprocessing for generated text to strip stop sequences and thinking tokens
      
      * nit
      
      * fix: trim leading whitespace after stripping thinking tokens from generation
      
      * feat: add think_end_token to model_args
      
      * nit
      
      * nit
      
      * nit
      
      * add to readme
      
      * nit
      51ede33c
  18. 15 Jul, 2025 1 commit
  19. 14 Jul, 2025 1 commit
  20. 06 Jul, 2025 1 commit
  21. 03 Jul, 2025 1 commit
    • Ankush's avatar
      Bugfix/hf tokenizer gguf override (#3098) · ff41a856
      Ankush authored
      * fix(hf-gguf): skip gguf_file if external tokenizer is provided
      
      * docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
      ff41a856
  22. 30 Jun, 2025 1 commit
    • Baber Abbasi's avatar
      [HF] fix quantization config (#3039) · fea4d11d
      Baber Abbasi authored
      * Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8
      
      .
      The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type.
      Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py.
      Require further modification to restore the functionality provided by the previous commit.
      
      * wrap quantization_config in AutoQuantizationConfig
      
      * handle quantization config not dict
      
      * wrap quantization_config in AutoQuantizationConfig if dict
      
      ---------
      Co-authored-by: default avatarshanhx2000 <hs359@duke.edu>
      fea4d11d
  23. 25 Jun, 2025 2 commits
  24. 23 Jun, 2025 1 commit
    • NourFahmy's avatar
      Fix Anthropic API compatibility issues in chat completions (#3054) · 8bc46207
      NourFahmy authored
      
      
      * Fix Anthropic API compatibility issues in chat completions
      
      solves two important compatibility issues between the LM Eval Harness and Anthropic's API:
      
      1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included
      2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters
      
      tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors
      
      * pacufy pre-commit
      
      * add type
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      8bc46207