- 26 May, 2024 1 commit
-
-
Yen-Ting Lin authored
-
- 23 May, 2024 2 commits
-
-
Yen-Ting Lin authored
-
Yen-Ting Lin authored
-
- 07 May, 2024 8 commits
-
-
Yen-Ting Lin authored
-
Yen-Ting Lin authored
-
Yen-Ting Lin authored
-
Yen-Ting Lin authored
-
Yen-Ting Lin authored
-
Yen-Ting Lin authored
-
Yen-Ting Lin authored
-
Hailey Schoelkopf authored
-
- 06 May, 2024 2 commits
-
-
aditya thomas authored
-
LSinev authored
* Added fewshot sampling seeds to evaluator.simple_evaluate signature Way to control seed of fewshot sampling may help with #1591 * Added ability for custom sampler for ConfigurableTask May be set in config like ``` fewshot_config: sampler: !function utils.MyFewshotSampler ``` * explicitly set fewshot random generator seed for HFLM generate_until_task test * add backward compatibility for three args seed setup * save seeds info to logs/reports
-
- 05 May, 2024 4 commits
-
-
ciaranby authored
-
Muhammad Bin Usman authored
fix `----hf_hub_log_args` to `--hf_hub_log_args`
-
kwrobel.eth authored
* remove echo parameter in OpenAI completions API * remove context length parameter doc string
-
KonradSzafer authored
-
- 03 May, 2024 2 commits
-
-
KonradSzafer authored
-
KonradSzafer authored
* evaluation tracker implementation * OVModelForCausalLM test fix * typo fix * moved methods args * multiple args in one flag * loggers moved to dedicated dir * improved filename sanitization
-
- 02 May, 2024 2 commits
-
-
Helena Kloosterman authored
* Add option to set OpenVINO config * Use utils.eval_logger for logging
-
bcicc authored
* vllm lora support * remove print * version check, rename lora kwarg
-
- 01 May, 2024 4 commits
-
-
Simran Arora authored
* upload new tasks * add readmes * run linters --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
Zehan Li authored
* Update utils.py This is a 4-choice task, option_e is null for all but 3 samples * Fix options Adaptive choices * add option e * bump multilingual arc version --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Gabriel Mukobi authored
* Add Pile-10k readme * Add Pile-10k task configuration file
-
Chujie Zheng authored
-
- 26 Apr, 2024 2 commits
-
-
Nikita Lozhnikov authored
* Add register_filter decorator * Add register_filter docs
-
giorgossideris authored
* Support individual scrolls datasets * Add qmsum context * Fix formatting
-
- 25 Apr, 2024 3 commits
-
-
Lintang Sutawika authored
* Update task.py * Update __init__.py
-
Julen Etxaniz authored
* add xnli_eu tasks * update tasks readme * update readme
-
-
- 18 Apr, 2024 2 commits
-
-
sator-labs authored
-
Sergio Perez authored
-
- 16 Apr, 2024 2 commits
-
-
Michael Goin authored
* Add neuralmagic models for SparseML and DeepSparse * Update to latest and add test * Format * Fix list to List * Format * Add deepsparse/sparseml to automated testing * Update pyproject.toml * Update pyproject.toml * Update README * Fixes for dtype and device * Format * Fix test * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Address review comments! --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
KonradSzafer authored
* added delta weights * removed debug * readme update * better error handling * autogptq warn * warn update * peft and delta error, explicitly deleting _model_delta * linter fix
-
- 08 Apr, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 07 Apr, 2024 1 commit
-
-
nicho2 authored
* correction bug EleutherAI#1664 * add any invalid characters for Windows filenames and Unix-like systems see: https://gist.github.com/doctaphred/d01d05291546186941e1b7ddc02034d3?permalink_comment_id=3958715 * Update lm_eval/__main__.py * Update scripts/zeno_visualize.py * fix format --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 05 Apr, 2024 2 commits
-
-
Seungwoo Ryu authored
* claude3 * supply for anthropic claude3 * supply for anthropic claude3 * anthropic config changes * add callback options on anthropic * line passed * claude3 tiny change * help anthropic installation * mention sysprompt / being careful with format in readme --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
ZoneTwelve authored
* implementation of TMMLU+ * implemented: TMMLU+ ****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding**** - 4 categories - STEM - Social Science - Humanities - Other The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models. ```markdown Total number of tasks in the 'test' sets: 20160 Total number of tasks in the 'validation' sets: 2247 Total number of tasks in the 'train' sets: 335 ``` * Remove print from __init__.py There was my mistake in forgetting to remove the debug print from the code. * update: move TMMLU+ config generation program into default * fix: we should use training set as few shots example * update: README for TMMLU+ * update: a small changes of TMMLU+ README file * pre-commit run thought * Add README for TMMLU+ dataset * run precommit * trigger precommit again * trigger precommit again * isort is fussy * isort is fussy * format, again * oops * oops --------- Co-authored-by:lintang <lintang@eleuther.ai> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 04 Apr, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 01 Apr, 2024 1 commit
-
-
Michael Goin authored
The OpenAI interface supports batch size as an argument to the completions API, but does not seem to support specification of this on the CLI i.e. `lm_eval --model openai-completions --batch_size 16 ...` because of a simple lack of str->int conversion. This is confirmed by my usage and stacktrace from running `OPENAI_API_KEY=dummy lm_eval --model local-completions --tasks gsm8k --batch_size 16 --model_args model=nm- testing/zephyr-beta-7b-gptq-g128,tokenizer_backend=huggingface,base_url=http://localhost:8000/v1`: ``` Traceback (most recent call last): File "/home/michael/venv/bin/lm_eval", line 8, in <module> sys.exit(cli_evaluate()) File "/home/michael/code/lm-evaluation-harness/lm_eval/__main__.py", line 341, in cli_evaluate results = evaluator.simple_evaluate( File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper return fn(*args, **kwargs) File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 251, in simple_evaluate results = evaluate( File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper return fn(*args, **kwargs) File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 390, in evaluate resps = getattr(lm, reqtype)(cloned_reqs) File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 263, in generate_until list(sameuntil_chunks(re_ord.get_reordered(), self.batch_size)), File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 251, in sameuntil_chunks if len(ret) >= size or x[1] != lastuntil: TypeError: '>=' not supported between instances of 'int' and 'str' ```
-