- 25 Sep, 2025 3 commits
- 27 Aug, 2025 1 commit
-
-
Baber Abbasi authored
-
- 25 Aug, 2025 1 commit
-
-
Nikita Savelyev authored
* Add support for OVModelForSeq2SeqLM * Add test
-
- 04 Aug, 2025 1 commit
-
-
parkhs21 authored
* improve include-path precedence handling * test: add task for test * add test for include path precedence handling * Refactor `test_include_path.py` --------- Co-authored-by:Baber <baber@hey.com>
-
- 26 Jul, 2025 3 commits
- 25 Jul, 2025 3 commits
- 19 Jul, 2025 1 commit
-
-
Avelina Asada Hadji-Kyriacou authored
* Added missing fixture in test_unitxt_tasks.py * pacify pre-commit --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
- 13 Jul, 2025 1 commit
-
-
Baber authored
-
- 11 Jul, 2025 3 commits
- 10 Jul, 2025 1 commit
-
-
Baber Abbasi authored
* check for chat for warning * add test * remove yaml extension from some evalita configs * move unitxt to own test script * fix CI test
-
- 06 Jul, 2025 1 commit
-
-
Baber Abbasi authored
-
- 05 Jul, 2025 1 commit
-
-
Debjyoti Ray authored
* git push --force correctly processes both formats of model_args: string and dictionary both * exctract to function for better test * nit --------- Co-authored-by:Baber <baber@hey.com>
-
- 04 Jul, 2025 1 commit
-
-
Neel Gupta authored
* [FIX] Initial code to disable multi-proc for stderr * add docs; align no-mp bootstrap with mp --------- Co-authored-by:Baber <baber@hey.com>
-
- 03 Jun, 2025 1 commit
-
-
Baber Abbasi authored
* fix: bug in acc_mutual_info slicing; add `target_delimiter` to uncond choices * add tests
-
- 16 Apr, 2025 1 commit
-
-
Baber Abbasi authored
* switch MMLU to cais/mmlu * switch back to tj-actions/changed-files * cache HF folder
-
- 17 Mar, 2025 1 commit
-
-
Angelika Romanou authored
* Add INCLUDE tasks * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
- 14 Mar, 2025 1 commit
-
-
daniel-salib authored
-
- 04 Mar, 2025 2 commits
-
-
Kiersten Stokes authored
* Add a test for a custom unitxt task * Update task.py to bring in line with breaking change in v1.17.2 * Fix lint
-
Lucia Quirke authored
* Enable steering HF models Co-authored-by:
Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu> * increase HF download timeout * Update readme; improve steering vector device handling * Update latest news * remove HF timeout increase * fix tests * ignore sae lens test * fix accidental force push --------- Co-authored-by:
Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>
-
- 25 Feb, 2025 1 commit
-
-
Jinwei authored
* initial components to support sglang * init of class SGLangLM * draft for generate_until of SGLang model * mock loglikelihood * initial loglikelihood_tokens * todo: fix bug of sglang engine init * implement generation tasks and test * support output type loglikelihood and loglikelihood_rolling (#1) * . * loglikelihood_rolling * / * support dp_size>1 * typo * add tests and clean code * skip tests of sglang for now * fix OOM error of sglang pytest * finish test for sglang * add sglang to readme * fix OOM of tests and clean SGLang model * update readme * clean pyproject and add tests for evaluator * add accuracy tests and it passed locally * add notes for test * Update README.md update readme * pre-commit --------- Co-authored-by:
Xiaotong Jiang <xiaotong.jiang@databricks.com> Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
- 19 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* update pre-commit
-
- 04 Dec, 2024 1 commit
-
-
Baber Abbasi authored
-
- 30 Nov, 2024 1 commit
-
-
Baber Abbasi authored
* make utility function to handle `until` * fix text
-
- 20 Nov, 2024 1 commit
-
-
Baber Abbasi authored
* fix test task * dont call lm.chat_template each time
-
- 18 Nov, 2024 1 commit
-
-
Kozzy Voudouris authored
* Add metabench (Kipnis et al. 2024) * Update metabench tasks for full replication of original benchmarks, using publicly available datasets * Remove unnecessary import * Add permute versions of each task, where the answer orders are randomly shuffled. * Add metabench group for easier evaluations * Fix mmlu counts after removing duplicate * Add secondary datasets * Fix f-string error * Fix f-string error for permute processing * Add original hash to outputs for easy matching to original results * Add line break at end of utils files * Remove extra line from winogrande * Reformat for linters * fix multiple input test * appease pre-commit * Add metabench to tasks README * fix multiple input `test_doc_to_text` --------- Co-authored-by:Baber <baber@hey.com>
-
- 09 Nov, 2024 1 commit
-
-
Baber Abbasi authored
* switch `max_tokens` for `max_completion_tokens`. OpenAI ChatCompletions * remove stop, temp=1 for o1 * add chat assertion * HF_DATASETS_TRUST_REMOTE_CODE = True for task tests * move warning
-
- 31 Oct, 2024 1 commit
-
-
Qubitium-ModelCloud authored
* support gptqmodel * code opt * add gptqmodel option * Update huggingface.py * Update pyproject.toml * gptqmodel version upgraded to 1.0.6 * GPTQModel version upgraded to 1.0.8 * Update pyproject.toml * fix ruff-format error * add gptqmodel test * Update gptqmodel test model * skip cuda * python3.8 compatible * Update README.md * Update README.md --------- Co-authored-by:CL-ModelCloud <cl@modelcloud.ai>
-
- 04 Oct, 2024 1 commit
-
-
Baber Abbasi authored
-
- 26 Sep, 2024 3 commits
-
-
Baber Abbasi authored
* add newlines to task descriptions; increment versions * fix task tests (with groups) * Apply suggestions from code review --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Baber Abbasi authored
* change glianorex to test set * nit * fix test; doc_to_target can be str for multiple_choice * nit
-
Giulio Lovisotto authored
* Treat python tasks same as yaml tasks. * Add tests. * Re-add fixture decorators. * Fix typing specification error for Python 3.9.
-
- 18 Sep, 2024 1 commit
-
-
David Corvoysier authored
* feat(neuron): align with latest optimum-neuron * feat(neuron): support pre-exported neuron models * fix(neuron): correctly use max_length * fix(neuron): adapt loglikelihood The evaluation of log likelihood was not working for neuron models using continuous batching, such as all cached neuron LLama models. * refactor(neuron): remove dead code
-