- 17 Jul, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 14 Jul, 2023 2 commits
-
-
lintangsutawika authored
-
haileyschoelkopf authored
-
- 03 Jul, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 23 Jun, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 21 Jun, 2023 1 commit
-
-
nikuya3 authored
-
- 20 Jun, 2023 1 commit
-
-
nikuya3 authored
-
- 19 Jun, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 16 Jun, 2023 1 commit
-
-
lintangsutawika authored
-
- 15 Jun, 2023 1 commit
-
-
lintangsutawika authored
-
- 13 Jun, 2023 3 commits
-
-
FarzanehNakhaee authored
-
haileyschoelkopf authored
-
lintangsutawika authored
fixes some minor issues on tasks. For yaml, the task name should be <dataset_path>_<dataset_name>:<task>
-
- 12 Jun, 2023 1 commit
-
-
Hailey Schoelkopf authored
* add wip gsm8k yaml * cleanup tasks dir * push gsm8k yaml changes * rename gpt2.py * add updated gsm8k , triviaqa baseline * add new cot yaml * allow for multiple filter pipelines, new filter types * updated gsm8k + sampling gen configs * cleanup self-consistency yaml * push outline for advanced docs * push docs checklist * switch to inheritance for many tasks * acc_norm and acc_mutual_info fixed * fix missing newline in error msg * remove many .py tasks * updated GSM8k * added more doc * Update advanced_task_guide.md Added list of parameters * Update advanced_task_guide.md * Added details on listing metrics * Update advanced_task_guide.md * Added more explanation * modify current default filter name * add new tags to tasks * remove a lingering print() * add rest of param docs, cleanup deprecated fields * push docs update * move ALL_TASKS definition location * confirm write_out.py works if no description dict passed --------- Co-authored-by:lintangsutawika <lintang@sutawika.com>
-
- 11 Jun, 2023 1 commit
-
-
gk authored
-
- 07 Jun, 2023 1 commit
-
-
FarzanehNakhaee authored
-
- 06 Jun, 2023 1 commit
-
-
lintangsutawika authored
-
- 01 Jun, 2023 1 commit
-
-
gakada authored
* Fix tokenization issue in BaseLM.loglikelihood * Add a regression script * Use entire non-continuation length as context --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 19 May, 2023 1 commit
-
-
lintangsutawika authored
-
- 18 May, 2023 1 commit
-
-
lintangsutawika authored
-
- 16 May, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 13 May, 2023 1 commit
-
-
lintangsutawika authored
-
- 11 May, 2023 2 commits
-
-
Julen Etxaniz authored
-
lintangsutawika authored
-
- 10 May, 2023 3 commits
-
-
Julen Etxaniz authored
-
lintangsutawika authored
-
Benjamin Fattori authored
-
- 08 May, 2023 3 commits
-
-
haileyschoelkopf authored
-
janEbert authored
-
janEbert authored
-
- 07 May, 2023 1 commit
-
-
Ken Tsui authored
When `limit` is <1, limit represents the percentage of the total number of examples. If it is >=1, then it means the number of examples per task (only use this for testing).
-
- 05 May, 2023 3 commits
-
-
Julen Etxaniz authored
This makes comparing the results of different models easier because tasks are ordered in the same way.
-
Benjamin Fattori authored
-
Benjamin Fattori authored
-
- 24 Apr, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 19 Apr, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 09 Mar, 2023 1 commit
-
-
Benjamin Fattori authored
-
- 03 May, 2022 1 commit
-
-
Fabrizio Milo authored
-