1. 10 Mar, 2024 1 commit
  2. 09 Mar, 2024 2 commits
  3. 06 Mar, 2024 7 commits
  4. 05 Mar, 2024 2 commits
  5. 04 Mar, 2024 4 commits
  6. 03 Mar, 2024 2 commits
    • Vicki Boykis's avatar
      Setting trust_remote_code to True for HuggingFace datasets compatibility (#1487) · 95167926
      Vicki Boykis authored
      * setting trust_remote_code
      
      * dataset list no notebooks
      
      * respect trust remote code
      
      * Address changes, move cli options and change datasets
      
      * fix task for tests
      
      * headqa
      
      * remove kobest
      
      * pin datasets and address comments
      
      * clean up space
      95167926
    • Baber Abbasi's avatar
      Vllm update DP+TP (#1508) · e5e35fca
      Baber Abbasi authored
      * use `@ray.remote` with distributed vLLM
      
      * update versions
      
      * bugfix
      
      * unpin vllm
      
      * fix pre-commit
      
      * added version assertion error
      
      * Revert "added version assertion error"
      
      This reverts commit 8041e9b78e95eea9f4f4d0dc260115ba8698e9cc.
      
      * added version assertion for DP
      
      * expand DP note
      
      * add warning
      
      * nit
      
      * pin vllm
      
      * fix typos
      e5e35fca
  7. 01 Mar, 2024 4 commits
  8. 28 Feb, 2024 1 commit
  9. 27 Feb, 2024 4 commits
    • Rich's avatar
      Fix AttributeError in huggingface.py When 'model_type' is Missing (#1489) · cc771eca
      Rich authored
      
      
      * model_type attribute error
      
      Getting attribute error when using a model without a 'model_type'
      
      * fix w/ and w/out the 'model_type' specification
      
      * use getattr(), also fix other config.model_type reference
      
      * Update huggingface.py
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      cc771eca
    • Hailey Schoelkopf's avatar
    • Zehan Li's avatar
      add multilingual mmlu eval (#1484) · 7cd004c4
      Zehan Li authored
      7cd004c4
    • Baber Abbasi's avatar
      Refactor `evaluater.evaluate` (#1441) · 5ccd65d4
      Baber Abbasi authored
      
      
      * change `all_gather` to `gather`
      
      * add TaskOutput utility class
      
      * Add FilterResults class and refactor task handling.
      
      * Rename `key` to `filter_key` for clarity
      
      * Add `print_writeout` function in utils.py
      
      * Add function to calculate limit size.
      
      * Add doc_iterator method to Task class
      
      * Refactor `doc_iterator` and cleanup in Task class
      
      * remove superfluous bits
      
      * change `all_gather` to `gather`
      
      * bugfix
      
      * bugfix
      
      * fix `gather`
      
      * Refactor `gather` loop
      
      * Refactor aggregate metrics calculation
      
      * Refactor and simplify aggregate metrics calculation
      Removed unused code
      
      * Simplify metrics calculation and remove unused code.
      
      * simplify the metrics calculation in `utils.py` and `evaluator.py`.
      
      * Fix group metric
      
      * change evaluate to hf_evaluate
      
      * change evaluate to hf_evaluate
      
      * add docs
      
      * add docs
      
      * nits
      
      * make isslice keyword only
      
      * nit
      
      * add todo
      
      * nit
      
      * nit
      
      * nit: swap order samples_metrics tuple
      
      * move instance sorting outside loop
      
      * nit
      
      * nit
      
      * Add __repr__ for ConfigurableTask
      
      * nit
      
      * nit
      
      * Revert "nit"
      
      This reverts commit dab8d9977a643752a17f840fd8cf7e4b107df28f.
      
      * fix some logging
      
      * nit
      
      * fix `predict_only` bug. thanks to `@LSinev`!
      
      * change `print_tasks` to `prepare_print_tasks`
      
      * nits
      
      * move eval utils
      
      * move eval utils
      
      * nit
      
      * add comment
      
      * added tqdm descriptions
      
      * Update lm_eval/evaluator_utils.py
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * fix mgsm bug
      
      * nit
      
      * fix `build_all_requests`
      
      * pre-commit
      
      * add ceil to limit
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      5ccd65d4
  10. 26 Feb, 2024 7 commits
  11. 24 Feb, 2024 1 commit
    • LSinev's avatar
      Add environment and transformers version logging in results dump (#1464) · f78e2da4
      LSinev authored
      * Save git_hash to results even if git is not available to call as subprocess
      
      * Store more info about environment and transformers version in results to help researchers track inconsistencies
      
      * moved added logging to logging_utils
      
      * moved get_git_commit_hash to logging_utils.py
      
      * moved add_env_info inside evaluator
      f78e2da4
  12. 23 Feb, 2024 2 commits
  13. 22 Feb, 2024 3 commits
    • Amine Elhattami's avatar
      Fixed generation args issue affection OpenAI completion model (#1458) · 75ac1f47
      Amine Elhattami authored
      
      
      * Fixed generation args issue affection openai completion model
      
      * Fixed hf unit test; removed pop attributes in OpenAi completion.
      
      * fix format
      
      * fix format
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      75ac1f47
    • Ayush Thakur's avatar
      feat: Add Weights and Biases support (#1339) · 2683fbbb
      Ayush Thakur authored
      
      
      * add wandb as extra dependency
      
      * wandb metrics logging
      
      * refactor
      
      * log samples as tables
      
      * fix linter
      
      * refactor: put in a class
      
      * change dir
      
      * add panels
      
      * log eval as table
      
      * improve tables logging
      
      * improve reports logging
      
      * precommit run
      
      * ruff check
      
      * handle importing reports api gracefully
      
      * ruff
      
      * compare results
      
      * minor pre-commit fixes
      
      * build comparison report
      
      * ruff check
      
      * log results as artifacts
      
      * remove comparison script
      
      * update dependency
      
      * type annotate and docstring
      
      * add example
      
      * update readme
      
      * fix typo
      
      * teardown
      
      * handle outside wandb run
      
      * gracefully fail reports creation
      
      * precommit checks
      
      * add report url to summary
      
      * use wandb  printer for better url stdout
      
      * fix ruff
      
      * handle N/A and groups
      
      * fix eval table
      
      * remove unused var
      
      * update wandb version req + disable reports stdout
      
      * remove reports feature to TODO
      
      * add label to multi-choice question data
      
      * log model predictions
      
      * lints
      
      * loglikelihood_rolling
      
      * log eval result for groups
      
      * log tables by group for better handling
      
      * precommit
      
      * choices column for multi-choice
      
      * graciously fail wandb
      
      * remove reports feature
      
      * track system metrics + total eval time + stdout
      
      ---------
      Co-authored-by: default avatarLintang Sutawika <lintang@eleuther.ai>
      2683fbbb
    • Lei Chen's avatar
      PR fixing the issue #1391 (wrong contexts in the mgsm task) (#1440) · a72babbf
      Lei Chen authored
      
      
      * fix the issue #1391, wrong contexts in mgsm tasks
      
      * fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default)
      
      * regenerate all task yaml files
      - change naming so that file name will match with task name
      - task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot
      
      * English CoTs should have a space as target_delimiter
      
      * Update utils.py
      
      * Apply suggestions from code review
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      a72babbf