1. 14 Feb, 2025 2 commits
  2. 13 Feb, 2025 1 commit
  3. 12 Feb, 2025 2 commits
  4. 11 Feb, 2025 2 commits
    • Baber Abbasi's avatar
    • Michele Resta's avatar
      Adding the Evalita-LLM benchmark (#2681) · b7fccef5
      Michele Resta authored
      
      
      * feat: initial commit with templates for evalita evaluation
      
      * fix: change rule for generate_until
      
      * feat: modified yaml to use reduced version of NER test datasets
      
      * feat: added templates to use reduced dataset for summarization (fanpage and ilpost)
      
      * Add Six Prompts for Each Multiple-Choice Task
      
      * feat: modified fewshot split for textual entailment task
      
      * fix: new doc_to_target function for NER tasks
      
      * Update prompt
      
      * Add partition for few-shot evaluation
      
      * Add partition for few-shot evaluation
      
      * Add partition for few-shot evaluation
      
      * Add partition for few-shot evaluatio
      
      * Update prompt
      
      * Add partition for few-shot evaluation
      
      * Rename file
      
      Rename file from _evalita-mp_ner_adg_p1 .yaml to _evalita-mp_ner_adg_p1.yaml
      
      * Add partition for few-shot evaluation
      
      * Add partition for few-shot evaluation
      
      * Enhance lexical substitution management
      
      - Improve scorer calculation for better accuracy
      - Update model output postprocessing for clearer results
      - Add support for few-shot relation extraction task
      
      * Add F1 macro measure for the document dating task
      
      * Add F1-macro measure to evaluate document dating
      
      * Use the whole dataset
      
      * Small changes
      
      * Add the two prompts for the task of lexical substitution
      
      * Add few-shot split configuration
      
      * Add few-shot split configuration
      
      * Add function for handling few-shot learning setup
      
      * Fix prompt
      
      * Remove configuration file
      
      * Update dataset from test_same to test_cross for evaluations
      
      * Remove whitespace at end of prompt
      
      * Fix configuration error: corrected parameter name for the dataset used in few-shot
      
      * Fix: Check if results is not empty before processing in lexical substitution task
      
      * added the prompts and functions for correct NER and RE execution
      
      * Add accuracy measure
      
      * Add tasks for the EVALITA-LLM benchmark evaluation
      
      * Small changes
      
      Add the alias of the task name that will be printed in the final table results.
      
      * Updated the prompts to reflect changes made to the extended dataset for the Admission Test task
      
      * chore: cleaned templates before PR; feat: add configuration to run generation/ppl tasks.
      
      * fix: add information on Evalita-LLM for PR
      
      * fix: rename folders and files
      
      * fix: remove unused imports
      
      * chore: run pre-commit
      
      * chore: add task description
      
      ---------
      Co-authored-by: default avatarrzanoli <zanoli@fbk.eu>
      Co-authored-by: default avatarMarco Madeddu <marco.madeddu.bra@gmail.com>
      b7fccef5
  5. 07 Feb, 2025 3 commits
  6. 06 Feb, 2025 1 commit
  7. 31 Jan, 2025 1 commit
  8. 29 Jan, 2025 2 commits
  9. 28 Jan, 2025 5 commits
  10. 24 Jan, 2025 1 commit
  11. 21 Jan, 2025 3 commits
  12. 20 Jan, 2025 6 commits
  13. 19 Jan, 2025 1 commit
  14. 17 Jan, 2025 1 commit
  15. 15 Jan, 2025 4 commits
    • Baber Abbasi's avatar
      assistant prefill (#2615) · 703fbffd
      Baber Abbasi authored
      * add assistant prefix
      
      * add arc_challenge from llama
      
      * nit
      
      * nit
      
      * nit
      
      * add assistant prefix
      
      * add mmlu_llama
      
      * nit
      
      * nit
      
      * Revert "nit"
      
      This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc.
      
      * fix regex bug
      
      * add assistant_prefix to vllm
      
      * add `Question:`
      
      * add mmlu_pro
      
      * add fewshot assistant_prefix
      
      * use `assistant_prefill`
      
      * typehints
      
      * nits
      
      * nits
      
      * add to docs
      
      * add readme
      703fbffd
    • Shivansh Pachnanda's avatar
      Add MLQA (#2622) · e86cece6
      Shivansh Pachnanda authored
      * Add MLQA
      * add mlqa_common_yaml
      
      * add 49 tests of mlqa family
      
      * update tasks/README.md
      
      ---------
      
      * fix: mlqa ast error
      
      * nit: removed .yaml ext from template_yaml
      
      * nit changes: minor modifications generate_tasks.py
      
      * deleted    lm_eval/tasks/mlqa/mlqa_common_yaml.yaml
      
      * tests updated
      
      * nit
      e86cece6
    • Hojin Lee's avatar
      Add MBPP (#2247) · 5db23e2c
      Hojin Lee authored
      
      
      * add mbpp
      
      * fix some bugs
      
      * add README for mbpp
      
      * update README
      
      * nits
      
      ---------
      Co-authored-by: default avatarHojin Lee <19949034+hjlee1371@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      5db23e2c
    • Hojin Lee's avatar
      Add HumanEval (#1992) · 4c11206b
      Hojin Lee authored
      
      
      * add custom filter
      
      * fix type casting of references
      
      * add humaneval
      
      * fix a bug in humaneval
      
      * add greedy version of humaneval
      
      * update tasks README
      
      * test humaneval
      
      * return multiple metrics
      
      * nit
      
      * add confirmation to run code tasks
      
      * nit
      
      * nit
      
      ---------
      Co-authored-by: default avatarHojin Lee <19949034+hjlee1371@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      4c11206b
  16. 07 Jan, 2025 3 commits
  17. 04 Jan, 2025 1 commit
    • Baber Abbasi's avatar
      some minor logging nits (#2609) · 888ac292
      Baber Abbasi authored
      * remove yaml extension from phraes_va_common
      
      * remove yaml extension from winogenerated
      
      * remove yaml extension from phrases_es
      
      * no cache debug logging when not used
      888ac292
  18. 02 Jan, 2025 1 commit
    • Baber Abbasi's avatar
      update scrolls (#2602) · 1044db95
      Baber Abbasi authored
      * update evaluate; update construct requests
      
      * update construct requests to handle `apply_chat_template` kwarg
      1044db95