Commits · 0dffdbb462fe495e29db1f29241a038e5907fca4 · gaoqiong / lm-evaluation-harness

27 Mar, 2024 1 commit
- Fix conditional import for Nemo LM class (#1641) · 0dffdbb4
  Hailey Schoelkopf authored Mar 27, 2024
  
  0dffdbb4
26 Mar, 2024 1 commit

Integration of NeMo models into LM Evaluation Harness library (#1598) · e9d429e1

Sergio Perez authored Mar 26, 2024

* Integration of NeMo models into LM Evaluation Harness library

* rename nemo model as nemo_lm

* move nemo section in readme after hf section

* use self.eot_token_id in get_until()

* improve progress bar showing loglikelihood requests

* data replication or tensor/pipeline replication working fine within one node

* run pre-commit on modified files

* check whether dependencies are installed

* clarify usage of torchrun in README

e9d429e1

25 Mar, 2024 3 commits

Seq2seq fix (#1604) · 262f879a

Lintang Sutawika authored Mar 25, 2024



* fix on --task list

* add fixes to tokeniation

* differentiate encoding for seq2seq and decoder

* return token setting

* format for pre-commit

* Seq2seq fix, pt2 (#1630)

* getting model class only when defined

* encode_pair handles None, add_special_tokens turned into dict with default value

---------
Co-authored-by: achervyakov <77295913+artemorloff@users.noreply.github.com>

262f879a

peft Version Assertion (#1635) · 8e72f267
WoosungMyung authored Mar 26, 2024
```
* peft Version Assertion

* fix the linter issue
```
8e72f267
Add vLLM FAQs to README (#1625) (#1633) · a97fde23
Hailey Schoelkopf authored Mar 25, 2024

a97fde23

22 Mar, 2024 1 commit
- add logging of model args (#1619) · cffc1bd3
  Baber Abbasi authored Mar 22, 2024
```
* add logging of model args

* nit

* Add warnings.

* nit

* add warning

* nit
```
  cffc1bd3
21 Mar, 2024 2 commits
- OpenAI Completions -- fix passing of unexpected 'until' arg (#1612) · 34c9b7e4
  Hailey Schoelkopf authored Mar 21, 2024
  
  34c9b7e4
- Add ACLUE task (#1614) · 65546905
  Haonan Li authored Mar 21, 2024
```
* Add task ACLUE

* fix minor bug

* fix code style

* fix code style
```
  65546905
20 Mar, 2024 1 commit

Fixes to Loglikelihood prefix token / VLLM (#1611) · c7b03ad4

Hailey Schoelkopf authored Mar 20, 2024

* make vllm use prefix_token_id ; have prefix_token_id be optional method to define

* custom_prefix_token_id wasn't set if not passed

c7b03ad4

19 Mar, 2024 3 commits
- fix until arg processing (#1608) · d4b8fc13
  achervyakov authored Mar 20, 2024
  
  d4b8fc13
- fix gen_kwargs arg reading (#1607) · 0d920e82
  achervyakov authored Mar 20, 2024
  
  0d920e82
- Revert "Patch for Seq2Seq Model predictions (#1584)" (#1601) · f871646f
  Hailey Schoelkopf authored Mar 19, 2024
```
This reverts commit b7923a84.
```
  f871646f
18 Mar, 2024 3 commits

use BOS token in loglikelihood (#1588) · a4192489

kwrobel.eth authored Mar 18, 2024



* use BOS token in loglikelihood

* improve comments

* add model arg

* log prefix token id

* log prefix token id

* Update lm_eval/api/model.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* change name to prefix_token_id

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

a4192489

Fix eval_logger import for mmlu/_generate_configs.py (#1593) · 4600d6bf

Nouf M. Alotaibi authored Mar 18, 2024



* Fix eval_logger import for mmlu/_generate_configs.py

* linter

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

4600d6bf

Cleanup for v0.4.2 release (#1573) · 5627e819

Hailey Schoelkopf authored Mar 18, 2024

* Update interface.md

* fix: make caching reqs always work with accelerate launch

* remove stale task migration checklist

* remove deprecation warnings

* make informative TypeErrors for get_task_dict

* bump version metadata

* fix num_fewshot printing bug

* add fewshot value to cache key

5627e819

17 Mar, 2024 3 commits

Add start date in results.json (#1592) · 6fae67a6
kwrobel.eth authored Mar 17, 2024

6fae67a6

Patch for Seq2Seq Model predictions (#1584) · b7923a84

Lintang Sutawika authored Mar 18, 2024



* Differentiate _encode_pair setting for decoder and enc-dec models

* tok_decode to not skip special token so that eos doen't become empty string

* Update model.py

* Update model.py

* Update huggingface.py

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update model.py

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

b7923a84

Proposed approach for testing CLI arg parsing (#1566) · 92f30afd

Vicki Boykis authored Mar 17, 2024

* New tests for CLI args

* fix spacing

* change tests for parsing

* add tests, fix parser

* remove defaults for store_true

92f30afd

15 Mar, 2024 2 commits
- Fix Jinja template for Advanced AI Risk (#1587) · dc90fecc
  Rylan Schaeffer authored Mar 15, 2024
  
  dc90fecc
- Fix README section on vllm integration (#1579) · 7d9922c8
  Eitan Turok authored Mar 15, 2024
```
* Link to vllm integration

* add pip install .[vllm] cmd
```
  7d9922c8
13 Mar, 2024 1 commit

add manual tqdm disabling management (#1569) · e74ec966

achervyakov authored Mar 13, 2024



* add manual tqdm disabling management

* add typing to all new args

* apply precommit changes

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

e74ec966

12 Mar, 2024 1 commit
- cli_evaluate calls simple_evaluate with the same verbosity. (#1563) · 49695e8d
  Wongboo authored Mar 12, 2024
  
  49695e8d
11 Mar, 2024 4 commits

AGIEval (#1359) · a3e56afe

Hailey Schoelkopf authored Mar 11, 2024



* add agieval

* fix typo

* add cloze / math exactmatch agieval tasks, rename

* update exact-match agieval tasks, allow for multiple-correct answers

* add more detail to readme

* don't parse_math_answer twice

---------
Co-authored-by: Alex Bäuerle <alex@a13x.io>

a3e56afe

add Arabic EXAMS benchmark (#1498) · 4ab07597

khalil authored Mar 11, 2024



* add Arabic EXAMS benchmark

* fixed the linter issue, and add more information on the readme

* Update README.md

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

4ab07597

Update ifeval.yaml (#1506) · 282b9e76
Hailey Schoelkopf authored Mar 11, 2024

282b9e76
Update generate_until_template_yaml (#1546) · a79a7c33
Hailey Schoelkopf authored Mar 11, 2024

a79a7c33

10 Mar, 2024 1 commit

Support jinja templating for task descriptions (#1553) · 3bdf25ec

Hisham Alyahya authored Mar 10, 2024



* Support jinja templating for "description"

* Update task_guide.md

* Update lm_eval/api/task.py

* fix format?

* whitespace errors

* fix whitespace

* fix bad variable reference

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3bdf25ec

09 Mar, 2024 2 commits

Fix incorrect `max_gen_toks` generation kwarg default in code2_text. (#1551) · f518228f
Piyush Thakur authored Mar 09, 2024
```
* update gen_kwargs in code2-text-go.yaml

* update gen_kwargs in rest code2-text
```
f518228f

Add compatibility for vLLM's new Logprob object (#1549) · 8051d954

Antoni Baum authored Mar 09, 2024



* Add compatibility for vLLM's new Logprob object

* Fix

* Update lm_eval/models/vllm_causallms.py

* fix format?

* trailing whitespace

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

8051d954

06 Mar, 2024 7 commits

Update installation commands in openai_completions.py and contributing... · 9e6e2402

Sungho Park authored Mar 07, 2024


Update installation commands in openai_completions.py and contributing document and, update wandb_args description (#1536)

* Update openai completions and docs/CONTRIBUTING.md

* Update wandb args description

* Update docs/interface.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

9e6e2402

Cleanup and fixes (Task, Instance, and a little bit of *evaluate) (#1533) · 4ee1b386

LSinev authored Mar 06, 2024



* Remove unused `decontamination_ngrams_path` and all mentions (still no alternative path provided)

* Fix improper import of LM and usage of evaluator in one of scripts

* update type hints in instance and task api

* raising errors in task.py instead of asserts

* Fix warnings from ruff

* raising errors in __main__.py instead of asserts

* raising errors in tasks/__init__.py instead of asserts

* raising errors in evaluator.py instead of asserts

* evaluator: update type hints and remove unused variables in code

* Update lm_eval/__main__.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/__main__.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/api/task.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/api/task.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/api/task.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/evaluator.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* pre-commit induced fixes

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

4ee1b386

update printed num-fewshot ; prevent fewshots from erroneously being used by... · 02705057
Hailey Schoelkopf authored Mar 06, 2024
```
update printed num-fewshot ; prevent fewshots from erroneously being used by cot which hardcodes fewshot prompt (#1502)
```
02705057
Update docs on LM.loglikelihood_rolling abstract method (#1532) · 525b8f5d
Hailey Schoelkopf authored Mar 06, 2024

525b8f5d
Adding new task : KorMedMCQA (#1530) · faee1adf
sean0042 authored Mar 06, 2024

faee1adf

Add WMDP Multiple-choice (#1534) · 29b2b013

Long Phan authored Mar 05, 2024



* init wmdp yaml file

* Add WMDP Multiple-choice

* fix linter issues

* Delete lm_eval/tasks/wmdp/_wmdp.yaml

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

29b2b013

Add EQ-Bench as per #1459 (#1511) · c5acce0c

Peter Bevan authored Mar 06, 2024

* Start adding eq-bench

* Start adding to yaml and utils

* Get metric working

* Add README

* Handle cases where answer is not parseable

* Deal with unparseable answers and add percent_parseable metric

* Update README

c5acce0c

05 Mar, 2024 2 commits

Add a new task GPQA (the part CoT and generative) (#1482) · 01108aca

Uanu authored Mar 06, 2024



* Add new tasks of GPQA

* Add README

* Remove unused functions

* Remove unused functions

* Linters

* Add flexible match

* update

* Remove deplicate function

* Linter

* update

* Update lm_eval/filters/extraction.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* register multi_choice_regex

* Update

* run precommit

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

01108aca

Openllm benchmark (#1526) · 8a875e9a
Baber Abbasi authored Mar 05, 2024

8a875e9a

04 Mar, 2024 2 commits
- Fix minor edge cases (#951 #1503) (#1520) · 292e5814
  Hailey Schoelkopf authored Mar 04, 2024
```
* Fix padding

* Fix elif in model loading

* format
```
  292e5814
- Hotfix: fix TypeError in `--trust_remote_code` (#1517) · 45823914
  Hailey Schoelkopf authored Mar 04, 2024
  
  45823914