1. 02 Jan, 2024 1 commit
  2. 29 Dec, 2023 1 commit
    • Paul McCann's avatar
      Don't silence errors when loading tasks (#1148) · 34b563b1
      Paul McCann authored
      
      
      * Add example failing task
      
      This task includes an invalid import. This will cause an exception and
      the task will not be loaded. But this just results in a DEBUG level log
      message, so in normal usage you'll see no error, and will be told the
      task doesn't exist.
      
      Here's an example command line to run the task:
      
          python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail
      
      This task is based on a Japanese Winograd task, but that's not
      important, and was just used due to familiarity.
      
      * Do not ignore errors when loading tasks
      
      * Change how task errors are logged
      
      This makes the proposed changes from PR discussion.
      
      1. Exceptions not related to missing modules/imports are logged as
         warnings.
      
      2. module/import related exceptions are still logged at debug level, but
         if any of them happen there is a warning about it with instructions
         on how to show logs.
      
      * Remove intentionally failing task
      
      ---------
      Co-authored-by: default avatarPaul O'Leary McCann <polm@dampfkraft.com>
      34b563b1
  3. 27 Dec, 2023 1 commit
    • Baber Abbasi's avatar
      nits + fix siqa (#1216) · 6a1c19ed
      Baber Abbasi authored
      * fix group
      
      * siqa: default.yml -> default.yaml
      
      * max_gen_toks -> self.max_gen_toks
      
      * add ids to task tests
      
      * fix siqa
      
      * fix gen_kwargs for openai-chat
      6a1c19ed
  4. 24 Dec, 2023 1 commit
  5. 21 Dec, 2023 1 commit
  6. 20 Dec, 2023 2 commits
    • GUIJIN SON's avatar
      Error in --num_fewshot option for K-MMLU Evaluation Harness (#1178) · 12f2c5ea
      GUIJIN SON authored
      * update kmmlu default formatting
      
      * Update _default_kmmlu_yaml
      
      * Delete lm_eval/tasks/kmmlu/utils.py
      12f2c5ea
    • Baber Abbasi's avatar
      Switch Linting to `ruff` (#1166) · 65b8761d
      Baber Abbasi authored
      * add ruff and isort. remove black and flake8
      
      * remove unnecessary dependencies
      
      * remove dependency from table
      
      * change order
      
      * ran ruff
      
      * check 3.9
      
      * exclude evaluator
      
      * update CI workflow
      
      * use ruff config in pyproject.toml
      
      * test
      
      * add isort rules to ruff
      
      * sort imports
      
      * import `make_table`
      
      * try stages for no-commit-to-branch
      
      * turn on mypy for pre-commit
      
      * test
      
      * test
      
      * test
      
      * change no-commit-to-branch to default
      
      * nits
      
      * fixed dependency
      65b8761d
  7. 19 Dec, 2023 1 commit
  8. 18 Dec, 2023 1 commit
  9. 17 Dec, 2023 1 commit
    • Wis Kojohnjaratkul's avatar
      [WIP] Add IFEval / Instruction-Following Eval (#1087) · aa61f940
      Wis Kojohnjaratkul authored
      * Add IFEval task
      
      * Check and download nltk punkt if not already downloaded
      
      * Update gen_max_toks to 2048 to support "900 words+" instructions
      
      * Resolve pre-commit linting issues
      
      * Reduce max_gen_toks to 1280 to conserve token usage
      
      * Add warning message in `process_results` call for non chat-finetuned models
      aa61f940
  10. 15 Dec, 2023 1 commit
    • MorishT's avatar
      Add benchmark FLD (#1122) · 755bf6e8
      MorishT authored
      
      
      * [fix] loading dataset from hub fails when the dataset name includes '.', as the program assumes it is on the local filesystem
      
      * add FLD benchmark
      
      * Update task.py
      
      * [update] add group 'fld'
      
      * [update] rename fld -> fld_default. add explanation to the readme
      
      * Update README.md
      
      ---------
      Co-authored-by: default avatarLintang Sutawika <lintang@sutawika.com>
      755bf6e8
  11. 14 Dec, 2023 2 commits
  12. 13 Dec, 2023 5 commits
  13. 11 Dec, 2023 5 commits
  14. 10 Dec, 2023 6 commits
  15. 08 Dec, 2023 1 commit
  16. 07 Dec, 2023 8 commits
  17. 04 Dec, 2023 2 commits