1. 06 Jul, 2025 1 commit
  2. 05 Jul, 2025 2 commits
  3. 03 Jul, 2025 1 commit
    • Ankush's avatar
      Bugfix/hf tokenizer gguf override (#3098) · ff41a856
      Ankush authored
      * fix(hf-gguf): skip gguf_file if external tokenizer is provided
      
      * docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
      ff41a856
  4. 06 May, 2025 1 commit
    • Stella Biderman's avatar
      Change citation name (#2956) · a96085f1
      Stella Biderman authored
      This hasn't been a library for few shot language model evaluation in quite a while. Let's update the citation to use "the Language Model Evaluation Harness" as the title.
      a96085f1
  5. 01 Apr, 2025 1 commit
  6. 20 Mar, 2025 1 commit
  7. 19 Mar, 2025 1 commit
  8. 10 Mar, 2025 1 commit
  9. 04 Mar, 2025 1 commit
  10. 03 Mar, 2025 1 commit
    • Jinwei's avatar
      [Readme change for SGLang] fix error in readme and add OOM solutions for sglang (#2738) · 529f4805
      Jinwei authored
      
      
      * initial components to support sglang
      
      * init of class SGLangLM
      
      * draft for generate_until of SGLang model
      
      * mock loglikelihood
      
      * initial loglikelihood_tokens
      
      * todo: fix bug of sglang engine init
      
      * implement generation tasks and test
      
      * support output type loglikelihood and loglikelihood_rolling (#1)
      
      * .
      
      * loglikelihood_rolling
      
      * /
      
      * support dp_size>1
      
      * typo
      
      * add tests and clean code
      
      * skip tests of sglang for now
      
      * fix OOM error of sglang pytest
      
      * finish test for sglang
      
      * add sglang to readme
      
      * fix OOM of tests and clean SGLang model
      
      * update readme
      
      * clean pyproject and add tests for evaluator
      
      * add accuracy tests and it passed locally
      
      * add notes for test
      
      * Update README.md
      
      update readme
      
      * pre-commit
      
      * add OOM guideline for sglang and fix readme error
      
      * fix typo
      
      * fix typo
      
      * add readme
      
      ---------
      Co-authored-by: default avatarXiaotong Jiang <xiaotong.jiang@databricks.com>
      Co-authored-by: default avatarBaber Abbasi <92168766+baberabb@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      529f4805
  11. 25 Feb, 2025 1 commit
    • Jinwei's avatar
      Support SGLang as Potential Backend for Evaluation (#2703) · 29971faa
      Jinwei authored
      
      
      * initial components to support sglang
      
      * init of class SGLangLM
      
      * draft for generate_until of SGLang model
      
      * mock loglikelihood
      
      * initial loglikelihood_tokens
      
      * todo: fix bug of sglang engine init
      
      * implement generation tasks and test
      
      * support output type loglikelihood and loglikelihood_rolling (#1)
      
      * .
      
      * loglikelihood_rolling
      
      * /
      
      * support dp_size>1
      
      * typo
      
      * add tests and clean code
      
      * skip tests of sglang for now
      
      * fix OOM error of sglang pytest
      
      * finish test for sglang
      
      * add sglang to readme
      
      * fix OOM of tests and clean SGLang model
      
      * update readme
      
      * clean pyproject and add tests for evaluator
      
      * add accuracy tests and it passed locally
      
      * add notes for test
      
      * Update README.md
      
      update readme
      
      * pre-commit
      
      ---------
      Co-authored-by: default avatarXiaotong Jiang <xiaotong.jiang@databricks.com>
      Co-authored-by: default avatarBaber Abbasi <92168766+baberabb@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      29971faa
  12. 14 Feb, 2025 1 commit
  13. 13 Dec, 2024 1 commit
  14. 09 Dec, 2024 1 commit
  15. 05 Dec, 2024 1 commit
  16. 04 Dec, 2024 2 commits
  17. 01 Dec, 2024 1 commit
    • Yoav Katz's avatar
      Update Unitxt task to use locally installed unitxt and not download Unitxt... · 1170ef9e
      Yoav Katz authored
      
      Update Unitxt task to  use locally installed unitxt and not download Unitxt code from Huggingface (#2514)
      
      * Moved to require unitxt installation and not download unitxt from HF hub.
      
      This has performance benefits and simplifies the code.
      Signed-off-by: default avatarYoav Katz <katz@il.ibm.com>
      
      * Updated watsonx documentation
      
      * Updated installation instructions
      
      * Removed redundant comman
      
      * Allowed unitxt tasks to generate chat APIs
      
      Modified WatsonXI model to support chat apis
      
      * Removed print
      
      * Run precommit formatting
      
      ---------
      Signed-off-by: default avatarYoav Katz <katz@il.ibm.com>
      1170ef9e
  18. 31 Oct, 2024 1 commit
    • Qubitium-ModelCloud's avatar
      Add GPTQModel support for evaluating GPTQ models (#2217) · 4f8e479e
      Qubitium-ModelCloud authored
      
      
      * support gptqmodel
      
      * code opt
      
      * add gptqmodel option
      
      * Update huggingface.py
      
      * Update pyproject.toml
      
      * gptqmodel version upgraded to 1.0.6
      
      * GPTQModel version upgraded to 1.0.8
      
      * Update pyproject.toml
      
      * fix ruff-format error
      
      * add gptqmodel test
      
      * Update gptqmodel test model
      
      * skip cuda
      
      * python3.8 compatible
      
      * Update README.md
      
      * Update README.md
      
      ---------
      Co-authored-by: default avatarCL-ModelCloud <cl@modelcloud.ai>
      4f8e479e
  19. 17 Sep, 2024 1 commit
    • SYusupov's avatar
      Update README.md (#2297) · a5e0adcb
      SYusupov authored
      * Update README.md
      
      I encounter some Git buffer size limits when trying to download all commits history of the repository, such as:
      ```error: RPC failed; curl 18 transfer closed with outstanding read data remaining
      error: 5815 bytes of body are still expected
      fetch-pack: unexpected disconnect while reading sideband packet
      fatal: early EOF```
      
      therefore the installation is faster and there are not errors when I download only the last version of the repository
      
      * Fix linting issue
      a5e0adcb
  20. 13 Sep, 2024 1 commit
    • Lintang Sutawika's avatar
      Multimodal prototyping (#2243) · fb963f0f
      Lintang Sutawika authored
      
      
      * add WIP hf vlm class
      
      * add doc_to_image
      
      * add mmmu tasks
      
      * fix merge conflicts
      
      * add lintang's changes to hf_vlms.py
      
      * fix doc_to_image
      
      * added yaml_path for config-loading
      
      * revert
      
      * add line to process str type v
      
      * update
      
      * modeling cleanup
      
      * add aggregation for mmmu
      
      * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP)
      
      * implemented doc_to_image
      
      * update doc_to_image to accept list of features
      
      * update functions
      
      * readd image processed
      
      * update args process
      
      * bugfix for repeated images fed to model
      
      * push WIP loglikelihood code
      
      * commit most recent code (generative ; qwen2-vl testing)
      
      * preliminary image_token_id handling
      
      * small mmmu update: some qs have >4 mcqa options
      
      * push updated modeling code
      
      * use processor.apply_chat_template
      
      * add mathvista draft
      
      * nit
      
      * nit
      
      * ensure no footguns in text<>multimodal LM<>task incompatibility
      
      * add notification to readme regarding launch of prototype!
      
      * fix compatibility check
      
      * reorganize mmmu configs
      
      * chat_template=None
      
      * add interleave chat_template
      
      * add condition
      
      * add max_images; interleave=true
      
      * nit
      
      * testmini_mcq
      
      * nit
      
      * pass image string; convert img
      
      * add vllm
      
      * add init
      
      * vlm add multi attr
      
      * fixup
      
      * pass max images to vllm model init
      
      * nit
      
      * encoding to device
      
      * fix HFMultimodalLM.chat_template ?
      
      * add mmmu readme
      
      * remove erroneous prints
      
      * use HFMultimodalLM.chat_template ; restore tasks/__init__.py
      
      * add docstring for replace_placeholders in utils
      
      * fix `replace_placeholders`; set image_string=None
      
      * fix typo
      
      * cleanup + fix merge conflicts
      
      * update MMMU readme
      
      * del mathvista
      
      * add some sample scores
      
      * Update README.md
      
      * add log msg for image_string value
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      Co-authored-by: default avatarBaber Abbasi <baber@eleuther.ai>
      Co-authored-by: default avatarBaber <baber@hey.com>
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      fb963f0f
  21. 15 Aug, 2024 1 commit
  22. 05 Aug, 2024 2 commits
  23. 29 Jul, 2024 1 commit
    • Baber Abbasi's avatar
      bugfix and docs for API (#2139) · b70af4f5
      Baber Abbasi authored
      
      
      * encoding bugfix
      
      * encoding bugfix
      
      * overload logliklehood rather than loglikehood_tokens
      
      * add custom tokenizer
      
      * add docs
      
      * Update API_guide.md
      
      fix link; add note
      
      * Update API_guide.md
      
      typo
      
      * pre-commit
      
      * add link in readme
      
      * nit
      
      * nit
      
      * nit
      
      * Update API_guide.md
      
      nits
      
      * Update API_guide.md
      
      * Update API_guide.md
      
      * Update API_guide.md
      
      * Update API_guide.md
      
      * Update README.md
      
      * Update docs/API_guide.md
      
      * Update docs/API_guide.md
      
      * Update API_guide.md
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      b70af4f5
  24. 22 Jul, 2024 1 commit
    • Baber Abbasi's avatar
      Refactor API models (#2008) · 42dc2448
      Baber Abbasi authored
      
      
      * refactor pad_token handling to fn
      
      * fix docs
      
      * add pad_token_handling to vllm
      
      * start on API superclass
      
      * don't detokenize the returned logits
      
      * streamline vllm tokenizer
      
      * add type hint
      
      * pre-commit
      
      * seems to be in working order
      
      * add model to init
      
      * refactor api models
      
      * nit
      
      * cleanup
      
      * add pbar
      
      * fix type hints
      
      * change optional dependencies
      
      * json encode chat template
      
      * add type hints
      
      * deal with different prompt input requiremnts
      
      * nits
      
      * fix
      
      * cache inside async
      
      * fix
      
      * fix
      
      * nits
      
      * nits
      
      * nits
      
      * nit
      
      * fixup
      
      * fixup
      
      * nit
      
      * add dummy retry
      
      * add dummy retry
      
      * handle imports; skip failing test
      
      * add type hint
      
      * add tests
      
      * add dependency to tests
      
      * add package names to exception
      
      * nit
      
      * docs; type hints
      
      * handle api key
      
      * nit
      
      * tokenizer bug
      
      * fix tokenizer
      
      * nit
      
      * nit
      
      * add better error messages
      
      * nit
      
      * remove decorator
      
      * CI: install api dep
      
      * revert evaluator.py
      
      * consolidate
      
      * consolidate
      
      * nits
      
      * nit
      
      * fix typealias
      
      * nit
      
      * nit
      
      * nit
      
      * Update lm_eval/models/api_models.py
      
      typo
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Update lm_eval/models/openai_completions.py
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Update lm_eval/models/anthropic_llms.py
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Update lm_eval/models/api_models.py
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * fix typo
      
      * add news section
      
      * add info for API
      
      * pre-commit
      
      * typo
      
      * fix bug: unpack logliklehood requests
      
      * fix bug: shared gen_kwargs mutated
      
      * nit: handle copy properly
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Update api_models.py
      
      * Update README.md
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      42dc2448
  25. 08 Jul, 2024 1 commit
  26. 03 Jul, 2024 1 commit
  27. 03 Jun, 2024 1 commit
  28. 31 May, 2024 1 commit
    • KonradSzafer's avatar
      Add dataset card when pushing to HF hub (#1898) · f4f59251
      KonradSzafer authored
      
      
      * dataset card initial
      
      * few fixes
      
      * adds groups for math, mmlu, gpqa
      
      * added summary agrs
      
      * moved sanitize_list to utils
      
      * readme update
      
      * recreate metadata moved
      
      * multiple model support
      
      * results latest split fix
      
      * readme update and small refactor
      
      * fix grouping
      
      * add comments
      
      * added pathlib
      
      * corrected pathlib approach
      
      * check whether to create a metadata card
      
      * convert posix paths to str
      
      * default hf org from token
      
      * hf token value error
      
      * Add logs after successful upload
      
      * logging updates
      
      * dataset card example in the readme
      
      ---------
      Co-authored-by: default avatarNathan Habib <nathan.habib@huggingface.com>
      Co-authored-by: default avatarAlina Lozovskaia <alinailozovskaya@gmail.com>
      f4f59251
  29. 07 May, 2024 2 commits
    • Yoav Katz's avatar
      Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6
      Yoav Katz authored
      * Initial support for Unitxt datasets in LM Eval Harness
      
      See  https://github.com/IBM/unitxt
      
      
      
      The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.
      
      The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.
      
      * Added dataset loading check to generate_yaml
      
      Improved error messages.
      
      * Speed up generate_yaml
      
      Added printouts and improved error message
      
      * Added output printout
      
      * Simplified integration of unitxt datasets
      
      Store all the common yaml configuration in a yaml include shared by all datasets of the same task.
      
      * Post code review comments - part 1
      
      1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
      2. Added more datasets and tasks (NER, GEC)
      3. Added README
      
      * Post code review comments - part 2
      
      1. Added install unitxt install option in pyproject.toml:
      pip install 'lm_eval[unitxt]'
      2. Added a check that unitxt is installed and print a clear error message if not
      
      * Commited missing pyproject change
      
      * Added documentation on adding datasets
      
      * More doc changes
      
      * add unitxt extra to readme
      
      * run precommit
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      885f48d6
    • KonradSzafer's avatar
      20be169b
  30. 05 May, 2024 1 commit
  31. 03 May, 2024 1 commit
    • KonradSzafer's avatar
      evaluation tracker implementation (#1766) · 59cf408a
      KonradSzafer authored
      * evaluation tracker implementation
      
      * OVModelForCausalLM test fix
      
      * typo fix
      
      * moved methods args
      
      * multiple args in one flag
      
      * loggers moved to dedicated dir
      
      * improved filename sanitization
      59cf408a
  32. 25 Apr, 2024 1 commit
  33. 16 Apr, 2024 2 commits
  34. 08 Apr, 2024 1 commit
  35. 05 Apr, 2024 1 commit
    • Seungwoo Ryu's avatar
      Anthropic Chat API (#1594) · 27924d77
      Seungwoo Ryu authored
      
      
      * claude3
      
      * supply for anthropic claude3
      
      * supply for anthropic claude3
      
      * anthropic config changes
      
      * add callback options on anthropic
      
      * line passed
      
      * claude3 tiny change
      
      * help anthropic installation
      
      * mention sysprompt / being careful with format in readme
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      27924d77