1. 04 Oct, 2025 1 commit
    • Baber Abbasi's avatar
      Fewshot refactor (#3227) · 003e5852
      Baber Abbasi authored
      
      
      * overhaul `ContextSampler`
      
      * refactor masakhapos
      
      * move multi_target to `exact_match`
      
      * remove doc_to_choice from `boolq-seq2seq`
      
      * remove doc_to_choice in generation process_results
      
      * Remove unused `doc_to_choice` and fix superglue whitespaces
      
      * require multiple_inputs and multiple_targets to be explicitly set in taskconfig
      
      * fix copa; better logging in task init
      
      * fix doc_to_target to return int rather than str (deprecated)
      
      * fix processing regression; recursively parse lists fron template
      
      * remove redundant jinja parsing logic
      
      * remove promptsource
      
      * for multiple_inputs use `doc_to_text: list[str]``
      
      * Refactor `ContextSampler` `fewshot_context`
      
      * fix multiple_input context
      
      * fix `target_delimiter` with `gen_prefix`
      
      * `doc_to_text` is list for multiple_inputs
      
      * Refactor `count_bytes` and `count_words` methods to `@staticmethod`
      
      * make has_*(train/test/validation) to properties
      
      * remove `multi_target` `generate_until`
      
      * `fix doc_to_target/multiple_targets handling add tests
      
      * rename `multi_target` to `multiple_targets`
      
      * evalaute list when multiple targets
      
      * allow doc_to_target to return list
      
      * Remove gen_prefix space and add warning (#3239)
      
      * Remove gen_prefix space and add warning
      
      * fix null gen_prefix bug again
      
      * use git tests
      
      ---------
      Co-authored-by: default avatarBoaz Ben-Dov <bendboaz@gmail.com>
      003e5852
  2. 25 Sep, 2025 3 commits
  3. 27 Aug, 2025 1 commit
  4. 25 Aug, 2025 1 commit
  5. 04 Aug, 2025 1 commit
  6. 26 Jul, 2025 3 commits
  7. 25 Jul, 2025 3 commits
  8. 19 Jul, 2025 1 commit
  9. 13 Jul, 2025 1 commit
  10. 11 Jul, 2025 3 commits
  11. 10 Jul, 2025 1 commit
  12. 06 Jul, 2025 1 commit
  13. 05 Jul, 2025 1 commit
  14. 04 Jul, 2025 1 commit
  15. 03 Jun, 2025 1 commit
  16. 16 Apr, 2025 1 commit
  17. 17 Mar, 2025 1 commit
  18. 14 Mar, 2025 1 commit
  19. 04 Mar, 2025 2 commits
  20. 25 Feb, 2025 1 commit
    • Jinwei's avatar
      Support SGLang as Potential Backend for Evaluation (#2703) · 29971faa
      Jinwei authored
      
      
      * initial components to support sglang
      
      * init of class SGLangLM
      
      * draft for generate_until of SGLang model
      
      * mock loglikelihood
      
      * initial loglikelihood_tokens
      
      * todo: fix bug of sglang engine init
      
      * implement generation tasks and test
      
      * support output type loglikelihood and loglikelihood_rolling (#1)
      
      * .
      
      * loglikelihood_rolling
      
      * /
      
      * support dp_size>1
      
      * typo
      
      * add tests and clean code
      
      * skip tests of sglang for now
      
      * fix OOM error of sglang pytest
      
      * finish test for sglang
      
      * add sglang to readme
      
      * fix OOM of tests and clean SGLang model
      
      * update readme
      
      * clean pyproject and add tests for evaluator
      
      * add accuracy tests and it passed locally
      
      * add notes for test
      
      * Update README.md
      
      update readme
      
      * pre-commit
      
      ---------
      Co-authored-by: default avatarXiaotong Jiang <xiaotong.jiang@databricks.com>
      Co-authored-by: default avatarBaber Abbasi <92168766+baberabb@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      29971faa
  21. 19 Jan, 2025 1 commit
  22. 04 Dec, 2024 1 commit
  23. 30 Nov, 2024 1 commit
  24. 20 Nov, 2024 1 commit
    • Baber Abbasi's avatar
      Nits (#2500) · 867413f8
      Baber Abbasi authored
      * fix test task
      
      * dont call lm.chat_template each time
      867413f8
  25. 18 Nov, 2024 1 commit
    • Kozzy Voudouris's avatar
      Add metabench task to LM Evaluation Harness (#2357) · 62b4364d
      Kozzy Voudouris authored
      
      
      * Add metabench (Kipnis et al. 2024)
      
      * Update metabench tasks for full replication of original benchmarks, using publicly available datasets
      
      * Remove unnecessary import
      
      * Add permute versions of each task, where the answer orders are randomly shuffled.
      
      * Add metabench group for easier evaluations
      
      * Fix mmlu counts after removing duplicate
      
      * Add secondary datasets
      
      * Fix f-string error
      
      * Fix f-string error for permute processing
      
      * Add original hash to outputs for easy matching to original results
      
      * Add line break at end of utils files
      
      * Remove extra line from winogrande
      
      * Reformat for linters
      
      * fix multiple input test
      
      * appease pre-commit
      
      * Add metabench to tasks README
      
      * fix multiple input `test_doc_to_text`
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      62b4364d
  26. 09 Nov, 2024 1 commit
  27. 31 Oct, 2024 1 commit
    • Qubitium-ModelCloud's avatar
      Add GPTQModel support for evaluating GPTQ models (#2217) · 4f8e479e
      Qubitium-ModelCloud authored
      
      
      * support gptqmodel
      
      * code opt
      
      * add gptqmodel option
      
      * Update huggingface.py
      
      * Update pyproject.toml
      
      * gptqmodel version upgraded to 1.0.6
      
      * GPTQModel version upgraded to 1.0.8
      
      * Update pyproject.toml
      
      * fix ruff-format error
      
      * add gptqmodel test
      
      * Update gptqmodel test model
      
      * skip cuda
      
      * python3.8 compatible
      
      * Update README.md
      
      * Update README.md
      
      ---------
      Co-authored-by: default avatarCL-ModelCloud <cl@modelcloud.ai>
      4f8e479e
  28. 04 Oct, 2024 1 commit
  29. 26 Sep, 2024 3 commits