Commits · e548d94d4c57bc5bd1ed6efea3203edb9e22eaee · gaoqiong / lm-evaluation-harness

21 Dec, 2023 1 commit
- Update README.md (#1184) · e548d94d
  Anjor Kanekar authored Dec 21, 2023
  
  e548d94d
20 Dec, 2023 4 commits

Implementing local OpenAI API-style chat completions on any given inference server (#1174) · fcfc0c60

Vicki Boykis authored Dec 20, 2023

* LocalChatCompletionsLM add

* clean up completions class

* clean up completions class

* update tokens

* README

* fix constructor

* eos token

* folding local-chat-completions into OpenAIChatCompletions

* refactoring to include gen_kwargs as passable option

* add todo on chat completion kwarg validation

* Ruff and README fix

* generalize to **kwargs

* remove unnecessary kwargs

* README and remove kwargs

* README

fcfc0c60

Error in --num_fewshot option for K-MMLU Evaluation Harness (#1178) · 12f2c5ea
GUIJIN SON authored Dec 21, 2023
```
* update kmmlu default formatting

* Update _default_kmmlu_yaml

* Delete lm_eval/tasks/kmmlu/utils.py
```
12f2c5ea

Switch Linting to `ruff` (#1166) · 65b8761d

Baber Abbasi authored Dec 20, 2023

* add ruff and isort. remove black and flake8

* remove unnecessary dependencies

* remove dependency from table

* change order

* ran ruff

* check 3.9

* exclude evaluator

* update CI workflow

* use ruff config in pyproject.toml

* test

* add isort rules to ruff

* sort imports

* import `make_table`

* try stages for no-commit-to-branch

* turn on mypy for pre-commit

* test

* test

* test

* change no-commit-to-branch to default

* nits

* fixed dependency

65b8761d

feat: add option to upload results to Zeno (#990) · 21d4ae98

Alex Bäuerle authored Dec 20, 2023



* feat: add option to upload results to Zeno

* config-based upload supporting different task types and metrics

* upload tasks as individual projects

* wording

* readme

* add example notebook

* Update documentation for Zeno integration

* Make zeno deps an extra

* Update README.md

* Document extra deps installation

* Update zeno_visualize.py

* fix: balance parens

* fix typo

* fix merge commit I botched

* Update zeno_visualize.py

* Update logger warning stmt

* fix whitespace

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

21d4ae98

19 Dec, 2023 5 commits

Fix Column Naming and Dataset Naming Conventions in K-MMLU Evaluation (#1171) · 9e03d9d0

seungduk.kim.2304 authored Dec 20, 2023



* Correct column names and dataset names

* Remove kmmlu_general_physics.yaml and kmmlu_korean_language.yaml

* Update _default_kmmlu_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

9e03d9d0

self.device in huggingface.py line 210 treated as torch.device but might be a string (#1172) · 78545d42

Pasquale Minervini authored Dec 19, 2023



* self.device in huggingface.py line 210

In huggingface.py line 210, self.device is str and does not have a "type" attribute

* Update huggingface.py

This handles both the case where `self.device` is a `torch.device` and a string

* Update huggingface.py

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

78545d42

generalize qwen pad token fix (#1146) · 3f0a3611
Hailey Schoelkopf authored Dec 19, 2023

3f0a3611

Simplify evaluator (#1126) · 42730d90

Lintang Sutawika authored Dec 19, 2023

* save progress

* fixed issue with table only showing 1 group

* store aliases directly in results_agg

* removed unused parts

42730d90

Add docs on adding a multiple choice metric (#1147) · 8e87eff4
Paul McCann authored Dec 19, 2023
```
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
```
8e87eff4

18 Dec, 2023 5 commits
- Update CODEOWNERS · 13fbfef7
  Stella Biderman authored Dec 18, 2023
  
  13fbfef7
- Remove GooseAI docs and change no-commit-to-branch precommit hook (#1154) · c2fad099
  Vicki Boykis authored Dec 18, 2023
```
* remove gooseAI

* Modify preconfig to specify commit branch

* precommit

* remove openai alias for completions
```
  c2fad099
- bugfix (#1150) · 6f9630c8
  Baber Abbasi authored Dec 18, 2023
  
  6f9630c8
- Add shorthand flags (#1149) · 3ab5a262
  Baber Abbasi authored Dec 18, 2023
  
  3ab5a262
- set `--gen_kwargs` arg to None (#1145) · 08fcf1fe
  Baber Abbasi authored Dec 18, 2023
```
* set `--gen_kwargs` to None + add help to CLI

* add logging metavar

* fix verbosity help messages

* Reorder severity levels.
```
  08fcf1fe
17 Dec, 2023 1 commit

[WIP] Add IFEval / Instruction-Following Eval (#1087) · aa61f940

Wis Kojohnjaratkul authored Dec 17, 2023

* Add IFEval task

* Check and download nltk punkt if not already downloaded

* Update gen_max_toks to 2048 to support "900 words+" instructions

* Resolve pre-commit linting issues

* Reduce max_gen_toks to 1280 to conserve token usage

* Add warning message in `process_results` call for non chat-finetuned models

aa61f940

16 Dec, 2023 2 commits
- openai nits (#1139) · 8f5b2295
  Baber Abbasi authored Dec 16, 2023
```
* fixed syntactic nits

* fix temperature and seed

* fix logprobs

* fixup merge
```
  8f5b2295
- `use_tqdm=False` if batch_size != auto (#1144) · f7c67f0e
  Baber Abbasi authored Dec 16, 2023
  
  f7c67f0e
15 Dec, 2023 8 commits
- Enabling OpenAI completions via gooseai (#1141) · bd0f2414
  Vicki Boykis authored Dec 15, 2023
```
* enabling OpenAI completions via gooseai

* openai-completions and pin openai
```
  bd0f2414
- add utils.clear_torch_cache() (#1142) · 35a65ba0
  Baber Abbasi authored Dec 16, 2023
  
  35a65ba0
- Update Linter CI Job (#1130) · b0d155d3
  Hailey Schoelkopf authored Dec 15, 2023
```
* add ignoring of no-commit-to-branch

* fix method of skipping pre-commit step
```
  b0d155d3
- add correct openai api key to README.md (#1138) · e65e5bbd
  Lenni Justen authored Dec 15, 2023
  
  e65e5bbd
- fix typo in README.md (#1136) · 38c36613
  Lenni Justen authored Dec 15, 2023
  
  38c36613
- Add benchmark FLD (#1122) · 755bf6e8
  MorishT authored Dec 15, 2023
```
* [fix] loading dataset from hub fails when the dataset name includes '.', as the program assumes it is on the local filesystem

* add FLD benchmark

* Update task.py

* [update] add group 'fld'

* [update] rename fld -> fld_default. add explanation to the readme

* Update README.md

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>
```
  755bf6e8
- place device onto `mps` (#1133) · 57c3b1a2
  Baber Abbasi authored Dec 15, 2023
  
  57c3b1a2
- fixed how to check if dataset_path is a local directory or not (#1127) · 04707a2d
  Lintang Sutawika authored Dec 15, 2023
  
  04707a2d
14 Dec, 2023 7 commits

fix: passing max_length to vllm engine args (#1124) · 2a47159c
NanoCode012 authored Dec 15, 2023
```
* fix: passing max_length to vllm engine args

* feat: add `max_model_len`

* chore: lint
```
2a47159c
Fix vllm `batch_size` type (#1128) · c4f8c40e
Yuliang Li authored Dec 15, 2023

c4f8c40e

doc_to_decontamination_query can use function (#1082) · fcb39a5a

Lintang Sutawika authored Dec 14, 2023

* doc_to_decontamination_query can use function

* add option for doc_to_decontamination_query to follow doc_to_text

* added documentation for doc_to_decontamination_query

* adjust description

* format

fcb39a5a

Additional process for doc_to_choice (#1093) · a2ed953f
Lintang Sutawika authored Dec 14, 2023
```
* Additional process for doc_to_choice

* doc_to_choice can also parse a string
```
a2ed953f

Refactor `hf` modeling code (#1096) · e0eda4d3

Hailey Schoelkopf authored Dec 14, 2023

* modularize HFLM code

* pass through extra kwargs to AutoModel.from_pretrained call

* remove explicit model_kwargs

* rename gptq -> autogptq

* fix tokenizer pad token errors

* ensure model always respects device_map and autogptq's selected devices

* add a _get_config helper fn

e0eda4d3

Merge pull request #1118 from Momo-Tori/main · 5133c9c4
Lintang Sutawika authored Dec 14, 2023
```
fix: bug of BBH_cot_fewshot
```
5133c9c4
fix: _generate_configs.py · c314246d
momotori authored Dec 14, 2023

c314246d

13 Dec, 2023 7 commits
- `qqp`, `mnli_mismatch`: remove unlabled test sets (#1114) · 057dc2d7
  Baber Abbasi authored Dec 14, 2023
```
* remove unlabled test sets

* add note to readme
```
  057dc2d7
- bump version on cot_fewshot tasks · a7707c76
  haileyschoelkopf authored Dec 13, 2023
  
  a7707c76
- update regeneration script, bump bbh_cot_fewshot version · 3fbdfea1
  haileyschoelkopf authored Dec 13, 2023
  
  3fbdfea1
- Revert "Simplified `evaluator.py`" (#1116) · 9ef853ac
  Lintang Sutawika authored Dec 13, 2023
  
  9ef853ac
- fix: enlarge max_gen_toks to make output of bbh_cot_fewshot complete · 7ec42165
  momotori authored Dec 13, 2023
  
  7ec42165
- Unpack group in `write_out` (#1113) · 72e583d5
  Baber Abbasi authored Dec 13, 2023
```
* unpack group; add output_path to arg

* Add `vllm` to overview
```
  72e583d5
- fix: fix bug in the "doc_to_text" of BBH_cot_fewshot · 33dcbd49
  momotori authored Dec 13, 2023
  
  33dcbd49