1. 07 Apr, 2024 1 commit
  2. 05 Apr, 2024 1 commit
    • ZoneTwelve's avatar
      TMMLU+ implementation (#1394) · 9ae96cdf
      ZoneTwelve authored
      
      
      * implementation of TMMLU+
      
      * implemented: TMMLU+
      
      ****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding****
      
      - 4 categories
          - STEM
          - Social Science
          - Humanities
          - Other
      
      The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models.
      
      ```markdown
      Total number of tasks in the 'test' sets: 20160
      Total number of tasks in the 'validation' sets: 2247
      Total number of tasks in the 'train' sets: 335
      ```
      
      * Remove print from __init__.py
      
      There was my mistake in forgetting to remove the debug print from the code.
      
      * update: move TMMLU+ config generation program into default
      
      * fix: we should use training set as few shots example
      
      * update: README for TMMLU+
      
      * update: a small changes of TMMLU+ README file
      
      * pre-commit run thought
      
      * Add README for TMMLU+ dataset
      
      * run precommit
      
      * trigger precommit again
      
      * trigger precommit again
      
      * isort is fussy
      
      * isort is fussy
      
      * format, again
      
      * oops
      
      * oops
      
      ---------
      Co-authored-by: default avatarlintang <lintang@eleuther.ai>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      9ae96cdf
  3. 04 Apr, 2024 1 commit
  4. 01 Apr, 2024 2 commits
    • Michael Goin's avatar
      Fix CLI --batch_size arg for openai-completions/local-completions (#1656) · 9516087b
      Michael Goin authored
      The OpenAI interface supports batch size as an argument to the completions API, but does not seem to support specification of this on the CLI i.e. `lm_eval --model openai-completions --batch_size 16 ...` because of a simple lack of str->int conversion.
      
      This is confirmed by my usage and stacktrace from running `OPENAI_API_KEY=dummy lm_eval --model local-completions --tasks gsm8k --batch_size 16 --model_args model=nm-
      testing/zephyr-beta-7b-gptq-g128,tokenizer_backend=huggingface,base_url=http://localhost:8000/v1`:
      ```
      Traceback (most recent call last):
        File "/home/michael/venv/bin/lm_eval", line 8, in <module>
          sys.exit(cli_evaluate())
        File "/home/michael/code/lm-evaluation-harness/lm_eval/__main__.py", line 341, in cli_evaluate
          results = evaluator.simple_evaluate(
        File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
          return fn(*args, **kwargs)
        File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 251, in simple_evaluate
          results = evaluate(
        File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
          return fn(*args, **kwargs)
        File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 390, in evaluate
          resps = getattr(lm, reqtype)(cloned_reqs)
        File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 263, in generate_until
          list(sameuntil_chunks(re_ord.get_reordered(), self.batch_size)),
        File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 251, in sameuntil_chunks
          if len(ret) >= size or x[1] != lastuntil:
      TypeError: '>=' not supported between instances of 'int' and 'str'
      ```
      9516087b
    • Julen Etxaniz's avatar
      Add Latxa paper evaluation tasks for Basque (#1654) · c2c8e238
      Julen Etxaniz authored
      * add basqueglue
      
      * add eus_exams
      
      * add eus_proficiency
      
      * add eus_reading
      
      * add eus_trivia
      
      * run pre-commit
      c2c8e238
  5. 28 Mar, 2024 1 commit
  6. 27 Mar, 2024 1 commit
  7. 26 Mar, 2024 1 commit
    • Sergio Perez's avatar
      Integration of NeMo models into LM Evaluation Harness library (#1598) · e9d429e1
      Sergio Perez authored
      * Integration of NeMo models into LM Evaluation Harness library
      
      * rename nemo model as nemo_lm
      
      * move nemo section in readme after hf section
      
      * use self.eot_token_id in get_until()
      
      * improve progress bar showing loglikelihood requests
      
      * data replication or tensor/pipeline replication working fine within one node
      
      * run pre-commit on modified files
      
      * check whether dependencies are installed
      
      * clarify usage of torchrun in README
      e9d429e1
  8. 25 Mar, 2024 3 commits
  9. 22 Mar, 2024 1 commit
  10. 21 Mar, 2024 2 commits
  11. 20 Mar, 2024 1 commit
  12. 19 Mar, 2024 3 commits
  13. 18 Mar, 2024 3 commits
  14. 17 Mar, 2024 3 commits
  15. 15 Mar, 2024 2 commits
  16. 13 Mar, 2024 1 commit
  17. 12 Mar, 2024 1 commit
  18. 11 Mar, 2024 4 commits
  19. 10 Mar, 2024 1 commit
  20. 09 Mar, 2024 2 commits
  21. 06 Mar, 2024 5 commits