1. 22 Feb, 2024 1 commit
    • Anjor Kanekar's avatar
      Add TemplateLM boilerplate LM class (#1279) · ba5cdf0f
      Anjor Kanekar authored
      * loglikelihood refactor using template lm
      
      * linter
      
      * fix whitespace in target + prompt for CoT gsm8k (#1275)
      
      * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (#1261)
      
      * Make parallelize=True distinction clearer in documentation.
      
      * run linter
      
      * Allow parameter edits for registered tasks when listed in a benchmark (#1273)
      
      * benchmark yamls allow minor edits of already registered tasks
      
      * add documentation
      
      * removed print
      
      * Fix data-parallel evaluation with quantized models (#1270)
      
      * add WIP device_map overrides
      
      * update handling outside of accelerate launcher
      
      * change .to(device) log to debug level
      
      * run linter
      
      * Rework documentation for explaining local dataset (#1284)
      
      * rewor documentation for explaining local dataset
      
      * fix typo
      
      * Update new_task_guide.md
      
      * Re-add citation
      
      It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9...
      ba5cdf0f
  2. 21 Feb, 2024 1 commit
    • Hanwool Albert Lee's avatar
      Added KMMLU evaluation method and changed ReadMe (#1447) · c26a6ac7
      Hanwool Albert Lee authored
      
      
      * update kmmlu default formatting
      
      * Update _default_kmmlu_yaml
      
      * Delete lm_eval/tasks/kmmlu/utils.py
      
      * new tasks implemented
      
      * add direct tasks
      
      * update direct evaluate
      
      * update direct eval
      
      * add cot sample
      
      * update cot
      
      * add cot
      
      * Update _cot_kmmlu_yaml
      
      * add kmmlu90
      
      * Update and rename _cot_kmmlu.yaml to _cot_kmmlu_yaml
      
      * Create kmmlu90.yaml
      
      * Update _cot_kmmlu_yaml
      
      * add direct
      
      * Update _cot_kmmlu_yaml
      
      * Update and rename kmmlu90.yaml to kmmlu90_cot.yaml
      
      * Update kmmlu90_direct.yaml
      
      * add kmmlu hard
      
      * Update _cot_kmmlu_yaml
      
      * Update _cot_kmmlu_yaml
      
      * update cot
      
      * update cot
      
      * erase typo
      
      * Update _cot_kmmlu_yaml
      
      * update cot
      
      * Rename dataset to match k-mmlu-hard
      
      * removed kmmlu90
      
      * fixed name 'kmmlu_cot' to 'kmmlu_hard_cot' and revised README
      
      * applied pre-commit before pull requests
      
      * rename datasets and add notes
      
      * Remove DS_Store cache
      
      * Update lm_eval/tasks/kmmlu/README.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Change citations and reflect reviews on version
      
      * Added kmmlu_hard and fixed other errors
      
      * fixing minor errors
      
      * remove duplicated
      
      * Rename files
      
      * try ".index"
      
      * minor fix
      
      * minor fix again
      
      * fix revert.
      
      * minor fix. thank for hailey
      
      ---------
      Co-authored-by: default avatarGUIJIN SON <spthsrbwls123@yonsei.ac.kr>
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      c26a6ac7
  3. 20 Feb, 2024 3 commits
  4. 19 Feb, 2024 2 commits
  5. 18 Feb, 2024 1 commit
  6. 15 Feb, 2024 1 commit
  7. 14 Feb, 2024 1 commit
  8. 13 Feb, 2024 1 commit
  9. 12 Feb, 2024 2 commits
  10. 11 Feb, 2024 3 commits
    • Uanu's avatar
      Add multilingual TruthfulQA task (#1420) · 7397b965
      Uanu authored
      7397b965
    • Uanu's avatar
      Add multilingual ARC task (#1419) · 0256c682
      Uanu authored
      0256c682
    • Baber Abbasi's avatar
      Evaluate (#1385) · 1ff84897
      Baber Abbasi authored
      * un-exclude `evaluate.py` from linting
      
      * readability
      
      * readability
      
      * add task name to build info message
      
      * fix link
      
      * nit
      
      * add functions for var and mean pooling
      
      * add functions for var and mean pooling
      
      * metadata compatibility with task
      
      * rename `override_config` to `set_config` and move to `Task`
      
      * add unit test
      
      * nit
      
      * nit
      
      * bugfix
      
      * nit
      
      * nit
      
      * nit
      
      * add docstrings
      
      * fix metadata-fewshot
      
      * revert metric refactor
      
      * nit
      
      * type checking
      
      * type hints
      
      * type hints
      
      * move `override_metric` to `Task`
      
      * change metadata
      
      * change name
      
      * pre-commit
      
      * rename
      
      * remove
      
      * remove
      
      * `override_metric` backwards compatible with `Task`
      
      * type hints
      
      * use generic
      
      * type hint
      1ff84897
  11. 10 Feb, 2024 2 commits
  12. 09 Feb, 2024 1 commit
  13. 07 Feb, 2024 1 commit
  14. 06 Feb, 2024 4 commits
  15. 05 Feb, 2024 1 commit
  16. 02 Feb, 2024 2 commits
  17. 01 Feb, 2024 4 commits
    • Lintang Sutawika's avatar
      Faster Task and Group Loading, Allow Recursive Groups (#1321) · d714fc95
      Lintang Sutawika authored
      * add trust_remote_code as default
      
      * task for testing recursive
      
      * changed source of ALL_TASKS
      
      * tasks should only accept TaskObjects
      
      * initialize_tasks returns list of tasks and groups
      
      * remove trust_remote_code for now
      
      * moved constructor process to inside load_yaml_config
      
      * more comprehensive way to index tasks and groups
      
      * pre-commit format
      
      * add exit after error
      
      * adjust how task objects are called
      
      * no need to use get_task_dict
      
      * load_task_or_group works but only for tasks
      
      * pre-commit format
      
      * half working for nested groups
      
      * changed variable names
      
      * allow groups and tasks to work
      
      * temp save
      
      * indexing and loading are part of a task_manager object
      
      * adapted initialize_tasks
      
      * iron out bugs
      
      * fixed typo
      
      * fixed typo
      
      * simplified code
      
      * further tidy up
      
      * remove lines for testing
      
      * removed test lines
      
      * removed unused code
      
      * remove unused import
      
      * fixed bu...
      d714fc95
    • Hailey Schoelkopf's avatar
      Enable override of printed `n-shot` in table (#1379) · 17191063
      Hailey Schoelkopf authored
      * allow tasks to specify printed fewshot val
      
      * fix to belebele
      
      * update metadata field's documentation
      17191063
    • Baber Abbasi's avatar
      Hf: minor egde cases (#1380) · 994bdb3f
      Baber Abbasi authored
      * edge cases where variable might not be assigned.
      
      * type hint
      994bdb3f
    • Hailey Schoelkopf's avatar
      Expand docs, update CITATION.bib (#1227) · f5408b6b
      Hailey Schoelkopf authored
      
      
      * Update CITATION.bib
      
      * Create CONTRIBUTING.md
      
      * add disclaimer re: multi node
      
      * flesh out some sections more
      
      * Flesh out contributor guide
      
      * revert CITATION.bib
      
      * appease pre-commit
      
      ---------
      Co-authored-by: default avatarlintangsutawika <lintang@eleuther.ai>
      f5408b6b
  18. 31 Jan, 2024 5 commits
    • Baber Abbasi's avatar
      add bypass metric (#1156) · f8203de1
      Baber Abbasi authored
      * add bypass metric
      
      * fixed `bypass` metric.
      
      * add task attributes if predict_only
      
      * add `predict_only` checks
      
      * add docs
      
      * added `overide_metric`, `override_config` to `Task`
      
      * nits
      
      * nit
      
      * changed --predict_only to generations; nits
      
      * nits
      
      * nits
      
      * change gen_kwargs warning
      
      * add note about `--predict_only` in README.md
      
      * added `predict_only`
      
      * move table to bottom
      
      * nit
      
      * change null aggregation to bypass (conflict)
      
      * bugfix; default `temp=0.0`
      
      * typo
      f8203de1
    • Eugene Cheah's avatar
      Add support for RWKV models with World tokenizer (#1374) · 084b7050
      Eugene Cheah authored
      
      
      * Add support for RWKV models with World tokenizer
      
      The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0
      
      This however fails all the "if set" checks, and would cause the tokenizer to crash.
      
      A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers
      
      * Update huggingface.py
      
      Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes.
      
      * Comply with formatting guidelines
      
      * fix format
      
      ---------
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      084b7050
    • Hailey Schoelkopf's avatar
      Make dependencies compatible with PyPI (#1378) · a0a2fec8
      Hailey Schoelkopf authored
      * make deps not point to github urls
      
      * formatting
      
      * try making PyPI only run on tag pushes
      a0a2fec8
    • Anjor Kanekar's avatar
      Publish to pypi (#1194) · 0da0dcba
      Anjor Kanekar authored
      * publish to pypi
      
      * lint
      
      * Update publish.yml
      
      * minor
      0da0dcba
    • Hailey Schoelkopf's avatar
      Fix unintuitive `--gen_kwargs` behavior (#1329) · bd7d265a
      Hailey Schoelkopf authored
      * don't override do_sample if no value for it is passed
      
      * Update gen_kwargs override condition
      
      * Update huggingface.py
      
      * Update huggingface.py
      
      * run linters
      
      * silence an erroneous warning
      bd7d265a
  19. 30 Jan, 2024 1 commit
  20. 29 Jan, 2024 1 commit
  21. 28 Jan, 2024 1 commit
  22. 26 Jan, 2024 1 commit
    • NoushNabi's avatar
      Add causalLM OpenVino models (#1290) · 97a67d27
      NoushNabi authored
      
      
      * added intel optimum
      
      * added intel optimum in readme
      
      * modified intel optimum
      
      * modified intel optimum
      
      * modified intel optimum
      
      * modified install optimum
      
      * modified path of IR file
      
      * added openvino_device
      
      * added openvino_device2
      
      * changed optimum-causal to openvino-causal
      
      * Update README.md
      
      * Update README.md
      
      * remove `lm_eval.base` import
      
      * update openvino-causal -> openvino ; pass device through super().__init__()
      
      * Update README.md
      
      * Add optimum to tests dependencies
      
      * apply pre-commit
      
      * fix so tests pass
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      97a67d27