1. 06 Jul, 2025 1 commit
  2. 04 Jul, 2025 1 commit
    • Baber's avatar
      nit · b9ee592b
      Baber authored
      b9ee592b
  3. 03 Jul, 2025 2 commits
  4. 08 Jun, 2025 1 commit
    • Baber Abbasi's avatar
      [longbench] fix metric calculation (#2983) · 147e9d61
      Baber Abbasi authored
      * use all answers
      
      * use middle truncation
      
      * maybe fix classification score
      
      * strip classification preds
      
      * [vllm] remove stop tokens post-hoc
      
      * strip all preds
      
      * pacify pre-commit
      
      * start on truncation utility
      
      * add to readme
      
      * add a footgun doc
      
      * fix newline in yaml templates
      
      * do not strip code_sim preds!
      
      * fix pre-commit config
      
      * fix instruction warning
      
      * add not to longbench readme
      147e9d61
  5. 28 Mar, 2025 1 commit
  6. 20 Mar, 2025 1 commit
  7. 18 Mar, 2025 1 commit
    • Baber Abbasi's avatar
      Add loncxt tasks (#2629) · 80a10075
      Baber Abbasi authored
      suport for longcontext (and other synthetic tasks)
      * add ruler
      * add longbench
      * pass `metadata` to TaskConfig
      80a10075
  8. 03 Mar, 2025 1 commit
  9. 21 Feb, 2025 1 commit
    • Lintang Sutawika's avatar
      Logging (#2203) · 1ba35e62
      Lintang Sutawika authored
      
      
      * changed source of eval_logger
      
      * allow eval_logger to be set from args
      
      * removed verbosity arg from non-main methods
      
      * fix logging
      
      * pre-commit
      
      * set verbosity in eval logger
      
      * replace utils.eval_logger
      
      * fix logging in main
      
      * add logging to docs
      
      * add logging message
      
      * nit
      
      * add logging to docs
      
      * refactor setup_logging to utils
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      1ba35e62
  10. 14 Feb, 2025 1 commit
  11. 07 Feb, 2025 1 commit
  12. 15 Jan, 2025 1 commit
    • Baber Abbasi's avatar
      assistant prefill (#2615) · 703fbffd
      Baber Abbasi authored
      * add assistant prefix
      
      * add arc_challenge from llama
      
      * nit
      
      * nit
      
      * nit
      
      * add assistant prefix
      
      * add mmlu_llama
      
      * nit
      
      * nit
      
      * Revert "nit"
      
      This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc.
      
      * fix regex bug
      
      * add assistant_prefix to vllm
      
      * add `Question:`
      
      * add mmlu_pro
      
      * add fewshot assistant_prefix
      
      * use `assistant_prefill`
      
      * typehints
      
      * nits
      
      * nits
      
      * add to docs
      
      * add readme
      703fbffd
  13. 20 Dec, 2024 1 commit
  14. 14 Dec, 2024 1 commit
  15. 03 Dec, 2024 1 commit
  16. 11 Nov, 2024 1 commit
  17. 05 Nov, 2024 1 commit
  18. 30 Oct, 2024 1 commit
  19. 20 Aug, 2024 1 commit
  20. 29 Jul, 2024 1 commit
    • Baber Abbasi's avatar
      bugfix and docs for API (#2139) · b70af4f5
      Baber Abbasi authored
      
      
      * encoding bugfix
      
      * encoding bugfix
      
      * overload logliklehood rather than loglikehood_tokens
      
      * add custom tokenizer
      
      * add docs
      
      * Update API_guide.md
      
      fix link; add note
      
      * Update API_guide.md
      
      typo
      
      * pre-commit
      
      * add link in readme
      
      * nit
      
      * nit
      
      * nit
      
      * Update API_guide.md
      
      nits
      
      * Update API_guide.md
      
      * Update API_guide.md
      
      * Update API_guide.md
      
      * Update API_guide.md
      
      * Update README.md
      
      * Update docs/API_guide.md
      
      * Update docs/API_guide.md
      
      * Update API_guide.md
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      b70af4f5
  21. 18 Jul, 2024 1 commit
  22. 15 Jul, 2024 1 commit
  23. 13 Jul, 2024 1 commit
  24. 08 Jul, 2024 3 commits
    • Nathan Habib's avatar
      Allow gating EvaluationTracker HF Hub results; customizability (#2051) · 563f7971
      Nathan Habib authored
      * batch commit
      
      * :Revert "batch commit"
      
      This reverts commit d859d1ca.
      
      * batch commit
      
      * checkout from main
      
      * checkout from main
      
      * checkout from main
      
      * checkout from main
      
      * checkout from main
      
      * cleanup
      
      * cleanup
      
      * cleanup
      
      * cleanup
      
      * cleanup
      
      * cleanup eval results
      
      * cleanup
      
      * add check for gated repo
      
      * fix jsonline issue
      
      * fix
      
      * add try catch when gating the details repo
      
      * add doc
      
      * adds back hub_repo_name
      
      * readds hub repo name
      563f7971
    • Elron Bandel's avatar
      Easier unitxt tasks loading and removal of unitxt library dependancy (#1933) · ad80f555
      Elron Bandel authored
      
      
      * Updated unitxt loading
      Signed-off-by: default avatarElron Bandel <elron.bandel@ibm.com>
      
      * Revert change to general Readme
      Signed-off-by: default avatarElron Bandel <elron.bandel@ibm.com>
      
      * Adjust fda,squadv2,squad_completion and swde to work accept config in the constructor
      Signed-off-by: default avatarElron Bandel <elron.bandel@ibm.com>
      
      * Fix scrolls
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      
      * Update documentation
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      
      * Enforce backward compatability
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      
      * Format unitxt class
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      
      ---------
      Signed-off-by: default avatarElron Bandel <elron.bandel@ibm.com>
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      ad80f555
    • Lintang Sutawika's avatar
      Group agg rework (#1741) · 517aadc4
      Lintang Sutawika authored
      
      
      * add greoup_config arg
      
      * add a group config that allows disabling table for group score and group aggregate in general
      
      * fixed size configuration
      
      * adjust config
      
      * add group config
      
      * adjust mmlu to use group_config
      
      * fixed args input in aggregate_subtask_metrics
      
      * fixed issues related to printing alias of group and updated yaml
      
      * update all mmlu variants to include group_config
      
      * edit format
      
      * modify mmlu tasks
      
      * adjust group to also be a configurable group
      
      * add configurable group
      
      * simplify get_task_list
      
      * adjust group scoring with using ConfigurableGroup
      
      * adjust args
      
      * update mmlu
      
      * update mmlu
      
      * update to work with new group and task configuration
      
      * readd group_agg
      
      * readd files
      
      * move prepare_print_tasks to evaluator_utils
      
      * sort set to False by default, fix predict_only arg
      
      * add version for groups
      
      * reversed task list
      
      * update additional condition when loading a group in a group yaml
      
      * update truthfulqa
      
      * add description regarding tags replacing group
      
      * replace group to tag
      
      * fixed conditional statement
      
      * remove warning
      
      * update loading of task group and newly added tags
      
      * reformat with pre-commit
      
      * fixed info log
      
      * update
      
      * fix bug
      
      * fix bug
      
      * use task id to differentiate tasks
      
      * convert all groups to configurable groups
      
      * use task_id
      
      * reformat
      
      * add task_id for python tasks as well
      
      * add task_id for python tasks as well
      
      * add task_id for python tasks as well
      
      * revert truthfulqa
      
      * revert mmlu tasks
      
      * new mmlu config
      
      * new group config parameter `tag_to_task`
      
      * Update truthfulqa_mc2.yaml
      
      * reformate
      
      * add _process_group_config
      
      * adjust task_id
      
      * add get_subtask_list function to get proper subtask list
      
      * group config to_dict update
      
      * remove tag check
      
      * update mmlu
      
      * fix config passing issues
      
      * add test yaml
      
      * format fix
      
      * add documentation
      
      * corner case for single tag being called
      
      * fix indentation
      
      * formatting
      
      * update all mmlu variants
      
      * Update docs/task_guide.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * remove group_alias
      
      * Update docs/task_guide.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * remove version for metadata
      
      * Update docs/task_guide.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * update mmlu/
      
      * removed " " in make_table
      
      * change how aggregate_metric is loaded
      
      * change how aggregate_metric is loaded
      
      * update aggregate_metric arg
      
      * update format
      
      * update format
      
      * some docs fixes
      
      * add groups for agieval, aexams, aclue
      
      * add more explicit aggregation groups
      
      * add more groupings / tags distinctions
      
      * add more groupings
      
      * more groupings
      
      * add many explicit group configs
      
      * add many explicit group configs
      
      * add more explicit group configs
      
      * add more explicit group configs
      
      * add more error msgs, agg_metric -> agg_metric_list
      
      * some docs updates
      
      * update task_id to be updateable and uses group:task format
      
      * make KMMLU a tag for now
      
      * update docs
      
      * don't duplicate task names
      
      * fix merge conflicts?
      
      * giving this a try
      
      * clean up diff
      
      * switch mmlu variants over to using
      
      * don't use to-be-deprecated group: config field in overview notebook
      
      * Python tasks which subclass ConfigurableTask now run
      
      * update mmlu
      
      * pre-commit format
      
      * fixed sorting for multi-level printing
      
      * move group api to separate file
      
      * fix bbh aggregation filter usage
      
      * track api/group.py
      
      * adjust group and tags loading
      
      * make explicit group configs for leaderboard and other newer tasks
      
      * fix arabicmmlu
      
      * update
      
      * change arabicmmlu template name???
      
      * update group alias
      
      * fix printing bugs
      
      * check table printing is correct ; update tests
      
      * use mmlu_stem to have a group included in print tests
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      517aadc4
  25. 25 Jun, 2024 1 commit
  26. 12 Jun, 2024 1 commit
  27. 03 Jun, 2024 2 commits
  28. 31 May, 2024 2 commits
  29. 24 May, 2024 1 commit
  30. 21 May, 2024 1 commit
  31. 14 May, 2024 1 commit
  32. 13 May, 2024 1 commit
  33. 08 May, 2024 1 commit
  34. 06 May, 2024 1 commit
  35. 26 Apr, 2024 1 commit