- 18 Apr, 2024 1 commit
-
-
lintangsutawika authored
-
- 08 Apr, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 07 Apr, 2024 1 commit
-
-
nicho2 authored
* correction bug EleutherAI#1664 * add any invalid characters for Windows filenames and Unix-like systems see: https://gist.github.com/doctaphred/d01d05291546186941e1b7ddc02034d3?permalink_comment_id=3958715 * Update lm_eval/__main__.py * Update scripts/zeno_visualize.py * fix format --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 05 Apr, 2024 2 commits
-
-
Seungwoo Ryu authored
* claude3 * supply for anthropic claude3 * supply for anthropic claude3 * anthropic config changes * add callback options on anthropic * line passed * claude3 tiny change * help anthropic installation * mention sysprompt / being careful with format in readme --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
ZoneTwelve authored
* implementation of TMMLU+ * implemented: TMMLU+ ****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding**** - 4 categories - STEM - Social Science - Humanities - Other The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models. ```markdown Total number of tasks in the 'test' sets: 20160 Total number of tasks in the 'validation' sets: 2247 Total number of tasks in the 'train' sets: 335 ``` * Remove print from __init__.py There was my mistake in forgetting to remove the debug print from the code. * update: move TMMLU+ config generation program into default * fix: we should use training set as few shots example * update: README for TMMLU+ * update: a small changes of TMMLU+ README file * pre-commit run thought * Add README for TMMLU+ dataset * run precommit * trigger precommit again * trigger precommit again * isort is fussy * isort is fussy * format, again * oops * oops --------- Co-authored-by:lintang <lintang@eleuther.ai> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 04 Apr, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 01 Apr, 2024 2 commits
-
-
Michael Goin authored
The OpenAI interface supports batch size as an argument to the completions API, but does not seem to support specification of this on the CLI i.e. `lm_eval --model openai-completions --batch_size 16 ...` because of a simple lack of str->int conversion. This is confirmed by my usage and stacktrace from running `OPENAI_API_KEY=dummy lm_eval --model local-completions --tasks gsm8k --batch_size 16 --model_args model=nm- testing/zephyr-beta-7b-gptq-g128,tokenizer_backend=huggingface,base_url=http://localhost:8000/v1`: ``` Traceback (most recent call last): File "/home/michael/venv/bin/lm_eval", line 8, in <module> sys.exit(cli_evaluate()) File "/home/michael/code/lm-evaluation-harness/lm_eval/__main__.py", line 341, in cli_evaluate results = evaluator.simple_evaluate( File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper return fn(*args, **kwargs) File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 251, in simple_evaluate results = evaluate( File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper return fn(*args, **kwargs) File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 390, in evaluate resps = getattr(lm, reqtype)(cloned_reqs) File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 263, in generate_until list(sameuntil_chunks(re_ord.get_reordered(), self.batch_size)), File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 251, in sameuntil_chunks if len(ret) >= size or x[1] != lastuntil: TypeError: '>=' not supported between instances of 'int' and 'str' ```
-
Julen Etxaniz authored
* add basqueglue * add eus_exams * add eus_proficiency * add eus_reading * add eus_trivia * run pre-commit
-
- 28 Mar, 2024 1 commit
-
-
Or Sharir authored
-
- 27 Mar, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 26 Mar, 2024 1 commit
-
-
Sergio Perez authored
* Integration of NeMo models into LM Evaluation Harness library * rename nemo model as nemo_lm * move nemo section in readme after hf section * use self.eot_token_id in get_until() * improve progress bar showing loglikelihood requests * data replication or tensor/pipeline replication working fine within one node * run pre-commit on modified files * check whether dependencies are installed * clarify usage of torchrun in README
-
- 25 Mar, 2024 3 commits
-
-
Lintang Sutawika authored
* fix on --task list * add fixes to tokeniation * differentiate encoding for seq2seq and decoder * return token setting * format for pre-commit * Seq2seq fix, pt2 (#1630) * getting model class only when defined * encode_pair handles None, add_special_tokens turned into dict with default value --------- Co-authored-by:achervyakov <77295913+artemorloff@users.noreply.github.com>
-
WoosungMyung authored
* peft Version Assertion * fix the linter issue
-
Hailey Schoelkopf authored
-
- 22 Mar, 2024 1 commit
-
-
Baber Abbasi authored
* add logging of model args * nit * Add warnings. * nit * add warning * nit
-
- 21 Mar, 2024 2 commits
-
-
Hailey Schoelkopf authored
-
Haonan Li authored
* Add task ACLUE * fix minor bug * fix code style * fix code style
-
- 20 Mar, 2024 1 commit
-
-
Hailey Schoelkopf authored
* make vllm use prefix_token_id ; have prefix_token_id be optional method to define * custom_prefix_token_id wasn't set if not passed
-
- 19 Mar, 2024 3 commits
-
-
achervyakov authored
-
achervyakov authored
-
Hailey Schoelkopf authored
This reverts commit b7923a84.
-
- 18 Mar, 2024 3 commits
-
-
kwrobel.eth authored
* use BOS token in loglikelihood * improve comments * add model arg * log prefix token id * log prefix token id * Update lm_eval/api/model.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * change name to prefix_token_id --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Nouf M. Alotaibi authored
* Fix eval_logger import for mmlu/_generate_configs.py * linter --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
* Update interface.md * fix: make caching reqs always work with accelerate launch * remove stale task migration checklist * remove deprecation warnings * make informative TypeErrors for get_task_dict * bump version metadata * fix num_fewshot printing bug * add fewshot value to cache key
-
- 17 Mar, 2024 3 commits
-
-
kwrobel.eth authored
-
Lintang Sutawika authored
* Differentiate _encode_pair setting for decoder and enc-dec models * tok_decode to not skip special token so that eos doen't become empty string * Update model.py * Update model.py * Update huggingface.py * Update lm_eval/models/huggingface.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update model.py --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Vicki Boykis authored
* New tests for CLI args * fix spacing * change tests for parsing * add tests, fix parser * remove defaults for store_true
-
- 15 Mar, 2024 2 commits
-
-
Rylan Schaeffer authored
-
Eitan Turok authored
* Link to vllm integration * add pip install .[vllm] cmd
-
- 13 Mar, 2024 1 commit
-
-
achervyakov authored
* add manual tqdm disabling management * add typing to all new args * apply precommit changes --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
- 12 Mar, 2024 1 commit
-
-
Wongboo authored
-
- 11 Mar, 2024 4 commits
-
-
Hailey Schoelkopf authored
* add agieval * fix typo * add cloze / math exactmatch agieval tasks, rename * update exact-match agieval tasks, allow for multiple-correct answers * add more detail to readme * don't parse_math_answer twice --------- Co-authored-by:Alex Bäuerle <alex@a13x.io>
-
khalil authored
* add Arabic EXAMS benchmark * fixed the linter issue, and add more information on the readme * Update README.md --------- Co-authored-by:Lintang Sutawika <lintang@sutawika.com>
-
Hailey Schoelkopf authored
-
Hailey Schoelkopf authored
-
- 10 Mar, 2024 1 commit
-
-
Hisham Alyahya authored
* Support jinja templating for "description" * Update task_guide.md * Update lm_eval/api/task.py * fix format? * whitespace errors * fix whitespace * fix bad variable reference --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 09 Mar, 2024 2 commits
-
-
Piyush Thakur authored
* update gen_kwargs in code2-text-go.yaml * update gen_kwargs in rest code2-text
-
Antoni Baum authored
* Add compatibility for vLLM's new Logprob object * Fix * Update lm_eval/models/vllm_causallms.py * fix format? * trailing whitespace --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 06 Mar, 2024 2 commits
-
-
Sungho Park authored
Update installation commands in openai_completions.py and contributing document and, update wandb_args description (#1536) * Update openai completions and docs/CONTRIBUTING.md * Update wandb args description * Update docs/interface.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
LSinev authored
* Remove unused `decontamination_ngrams_path` and all mentions (still no alternative path provided) * Fix improper import of LM and usage of evaluator in one of scripts * update type hints in instance and task api * raising errors in task.py instead of asserts * Fix warnings from ruff * raising errors in __main__.py instead of asserts * raising errors in tasks/__init__.py instead of asserts * raising errors in evaluator.py instead of asserts * evaluator: update type hints and remove unused variables in code * Update lm_eval/__main__.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/__main__.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/evaluator.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * pre-commit induced fixes --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-