1. 03 Jun, 2024 1 commit
  2. 31 May, 2024 1 commit
    • KonradSzafer's avatar
      Add dataset card when pushing to HF hub (#1898) · f4f59251
      KonradSzafer authored
      
      
      * dataset card initial
      
      * few fixes
      
      * adds groups for math, mmlu, gpqa
      
      * added summary agrs
      
      * moved sanitize_list to utils
      
      * readme update
      
      * recreate metadata moved
      
      * multiple model support
      
      * results latest split fix
      
      * readme update and small refactor
      
      * fix grouping
      
      * add comments
      
      * added pathlib
      
      * corrected pathlib approach
      
      * check whether to create a metadata card
      
      * convert posix paths to str
      
      * default hf org from token
      
      * hf token value error
      
      * Add logs after successful upload
      
      * logging updates
      
      * dataset card example in the readme
      
      ---------
      Co-authored-by: default avatarNathan Habib <nathan.habib@huggingface.com>
      Co-authored-by: default avatarAlina Lozovskaia <alinailozovskaya@gmail.com>
      f4f59251
  3. 07 May, 2024 2 commits
    • Yoav Katz's avatar
      Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6
      Yoav Katz authored
      * Initial support for Unitxt datasets in LM Eval Harness
      
      See  https://github.com/IBM/unitxt
      
      
      
      The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.
      
      The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.
      
      * Added dataset loading check to generate_yaml
      
      Improved error messages.
      
      * Speed up generate_yaml
      
      Added printouts and improved error message
      
      * Added output printout
      
      * Simplified integration of unitxt datasets
      
      Store all the common yaml configuration in a yaml include shared by all datasets of the same task.
      
      * Post code review comments - part 1
      
      1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
      2. Added more datasets and tasks (NER, GEC)
      3. Added README
      
      * Post code review comments - part 2
      
      1. Added install unitxt install option in pyproject.toml:
      pip install 'lm_eval[unitxt]'
      2. Added a check that unitxt is installed and print a clear error message if not
      
      * Commited missing pyproject change
      
      * Added documentation on adding datasets
      
      * More doc changes
      
      * add unitxt extra to readme
      
      * run precommit
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      885f48d6
    • KonradSzafer's avatar
      20be169b
  4. 05 May, 2024 1 commit
  5. 03 May, 2024 1 commit
    • KonradSzafer's avatar
      evaluation tracker implementation (#1766) · 59cf408a
      KonradSzafer authored
      * evaluation tracker implementation
      
      * OVModelForCausalLM test fix
      
      * typo fix
      
      * moved methods args
      
      * multiple args in one flag
      
      * loggers moved to dedicated dir
      
      * improved filename sanitization
      59cf408a
  6. 25 Apr, 2024 1 commit
  7. 16 Apr, 2024 2 commits
  8. 08 Apr, 2024 1 commit
  9. 05 Apr, 2024 1 commit
    • Seungwoo Ryu's avatar
      Anthropic Chat API (#1594) · 27924d77
      Seungwoo Ryu authored
      
      
      * claude3
      
      * supply for anthropic claude3
      
      * supply for anthropic claude3
      
      * anthropic config changes
      
      * add callback options on anthropic
      
      * line passed
      
      * claude3 tiny change
      
      * help anthropic installation
      
      * mention sysprompt / being careful with format in readme
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      27924d77
  10. 26 Mar, 2024 1 commit
    • Sergio Perez's avatar
      Integration of NeMo models into LM Evaluation Harness library (#1598) · e9d429e1
      Sergio Perez authored
      * Integration of NeMo models into LM Evaluation Harness library
      
      * rename nemo model as nemo_lm
      
      * move nemo section in readme after hf section
      
      * use self.eot_token_id in get_until()
      
      * improve progress bar showing loglikelihood requests
      
      * data replication or tensor/pipeline replication working fine within one node
      
      * run pre-commit on modified files
      
      * check whether dependencies are installed
      
      * clarify usage of torchrun in README
      e9d429e1
  11. 25 Mar, 2024 1 commit
  12. 15 Mar, 2024 1 commit
  13. 01 Mar, 2024 1 commit
  14. 22 Feb, 2024 1 commit
    • Ayush Thakur's avatar
      feat: Add Weights and Biases support (#1339) · 2683fbbb
      Ayush Thakur authored
      
      
      * add wandb as extra dependency
      
      * wandb metrics logging
      
      * refactor
      
      * log samples as tables
      
      * fix linter
      
      * refactor: put in a class
      
      * change dir
      
      * add panels
      
      * log eval as table
      
      * improve tables logging
      
      * improve reports logging
      
      * precommit run
      
      * ruff check
      
      * handle importing reports api gracefully
      
      * ruff
      
      * compare results
      
      * minor pre-commit fixes
      
      * build comparison report
      
      * ruff check
      
      * log results as artifacts
      
      * remove comparison script
      
      * update dependency
      
      * type annotate and docstring
      
      * add example
      
      * update readme
      
      * fix typo
      
      * teardown
      
      * handle outside wandb run
      
      * gracefully fail reports creation
      
      * precommit checks
      
      * add report url to summary
      
      * use wandb  printer for better url stdout
      
      * fix ruff
      
      * handle N/A and groups
      
      * fix eval table
      
      * remove unused var
      
      * update wandb version req + disable reports stdout
      
      * remove reports feature to TODO
      
      * add label to multi-choice question data
      
      * log model predictions
      
      * lints
      
      * loglikelihood_rolling
      
      * log eval result for groups
      
      * log tables by group for better handling
      
      * precommit
      
      * choices column for multi-choice
      
      * graciously fail wandb
      
      * remove reports feature
      
      * track system metrics + total eval time + stdout
      
      ---------
      Co-authored-by: default avatarLintang Sutawika <lintang@eleuther.ai>
      2683fbbb
  15. 06 Feb, 2024 2 commits
  16. 05 Feb, 2024 1 commit
  17. 01 Feb, 2024 1 commit
  18. 31 Jan, 2024 1 commit
    • Baber Abbasi's avatar
      add bypass metric (#1156) · f8203de1
      Baber Abbasi authored
      * add bypass metric
      
      * fixed `bypass` metric.
      
      * add task attributes if predict_only
      
      * add `predict_only` checks
      
      * add docs
      
      * added `overide_metric`, `override_config` to `Task`
      
      * nits
      
      * nit
      
      * changed --predict_only to generations; nits
      
      * nits
      
      * nits
      
      * change gen_kwargs warning
      
      * add note about `--predict_only` in README.md
      
      * added `predict_only`
      
      * move table to bottom
      
      * nit
      
      * change null aggregation to bypass (conflict)
      
      * bugfix; default `temp=0.0`
      
      * typo
      f8203de1
  19. 26 Jan, 2024 1 commit
    • NoushNabi's avatar
      Add causalLM OpenVino models (#1290) · 97a67d27
      NoushNabi authored
      
      
      * added intel optimum
      
      * added intel optimum in readme
      
      * modified intel optimum
      
      * modified intel optimum
      
      * modified intel optimum
      
      * modified install optimum
      
      * modified path of IR file
      
      * added openvino_device
      
      * added openvino_device2
      
      * changed optimum-causal to openvino-causal
      
      * Update README.md
      
      * Update README.md
      
      * remove `lm_eval.base` import
      
      * update openvino-causal -> openvino ; pass device through super().__init__()
      
      * Update README.md
      
      * Add optimum to tests dependencies
      
      * apply pre-commit
      
      * fix so tests pass
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      97a67d27
  20. 25 Jan, 2024 1 commit
  21. 23 Jan, 2024 1 commit
  22. 22 Jan, 2024 2 commits
  23. 16 Jan, 2024 1 commit
  24. 15 Jan, 2024 2 commits
  25. 11 Jan, 2024 1 commit
  26. 08 Jan, 2024 1 commit
    • Stella Biderman's avatar
      Revert citation (#1257) · ecb1df28
      Stella Biderman authored
      Over a dozen papers have used the updated citation block, but Google Scholar has noticed none of them. Since it does understand this citation, I think we should use it going forward until we have a way to ensure the newer citations are actually logged.
      ecb1df28
  27. 30 Dec, 2023 1 commit
  28. 23 Dec, 2023 1 commit
  29. 22 Dec, 2023 2 commits
    • Hailey Schoelkopf's avatar
      Upstream Mamba Support (`mamba_ssm`) (#1110) · 5503b274
      Hailey Schoelkopf authored
      * modularize HFLM code
      
      * pass through extra kwargs to AutoModel.from_pretrained call
      
      * remove explicit model_kwargs
      
      * rename gptq -> autogptq
      
      * fix tokenizer pad token errors
      
      * ensure model always respects device_map and autogptq's selected devices
      
      * add a _get_config helper fn
      
      * add mambaLMWrapper
      
      * add mamba extra
      
      * add mamba extra
      
      * fix conditional import
      
      * Fix botched merge commit
      
      * Remove beginning-of-file comment for consistency
      
      * Add docstring for mambaLM re: supported kwargs
      
      * Alphabetize extras
      
      * Update extras table
      
      * appease precommit
      
      * run precommit on mamba_lm
      5503b274
    • Bram Vanroy's avatar
      Refer in README to main branch (#1200) · 25cefbc1
      Bram Vanroy authored
      25cefbc1
  30. 21 Dec, 2023 3 commits
  31. 20 Dec, 2023 2 commits
    • Vicki Boykis's avatar
      Implementing local OpenAI API-style chat completions on any given inference server (#1174) · fcfc0c60
      Vicki Boykis authored
      * LocalChatCompletionsLM add
      
      * clean up completions class
      
      * clean up completions class
      
      * update tokens
      
      * README
      
      * fix constructor
      
      * eos token
      
      * folding local-chat-completions into OpenAIChatCompletions
      
      * refactoring to include gen_kwargs as passable option
      
      * add todo on chat completion kwarg validation
      
      * Ruff and README fix
      
      * generalize to **kwargs
      
      * remove unnecessary kwargs
      
      * README and remove kwargs
      
      * README
      fcfc0c60
    • Baber Abbasi's avatar
      Switch Linting to `ruff` (#1166) · 65b8761d
      Baber Abbasi authored
      * add ruff and isort. remove black and flake8
      
      * remove unnecessary dependencies
      
      * remove dependency from table
      
      * change order
      
      * ran ruff
      
      * check 3.9
      
      * exclude evaluator
      
      * update CI workflow
      
      * use ruff config in pyproject.toml
      
      * test
      
      * add isort rules to ruff
      
      * sort imports
      
      * import `make_table`
      
      * try stages for no-commit-to-branch
      
      * turn on mypy for pre-commit
      
      * test
      
      * test
      
      * test
      
      * change no-commit-to-branch to default
      
      * nits
      
      * fixed dependency
      65b8761d