Adding the Evalita-LLM benchmark (#2681)
* feat: initial commit with templates for evalita evaluation * fix: change rule for generate_until * feat: modified yaml to use reduced version of NER test datasets * feat: added templates to use reduced dataset for summarization (fanpage and ilpost) * Add Six Prompts for Each Multiple-Choice Task * feat: modified fewshot split for textual entailment task * fix: new doc_to_target function for NER tasks * Update prompt * Add partition for few-shot evaluation * Add partition for few-shot evaluation * Add partition for few-shot evaluation * Add partition for few-shot evaluatio * Update prompt * Add partition for few-shot evaluation * Rename file Rename file from _evalita-mp_ner_adg_p1 .yaml to _evalita-mp_ner_adg_p1.yaml * Add partition for few-shot evaluation * Add partition for few-shot evaluation * Enhance lexical substitution management - Improve scorer calculation for better accuracy - Update model output postprocessing for clearer results - Add support for few-shot relation extraction task * Add F1 macro measure for the document dating task * Add F1-macro measure to evaluate document dating * Use the whole dataset * Small changes * Add the two prompts for the task of lexical substitution * Add few-shot split configuration * Add few-shot split configuration * Add function for handling few-shot learning setup * Fix prompt * Remove configuration file * Update dataset from test_same to test_cross for evaluations * Remove whitespace at end of prompt * Fix configuration error: corrected parameter name for the dataset used in few-shot * Fix: Check if results is not empty before processing in lexical substitution task * added the prompts and functions for correct NER and RE execution * Add accuracy measure * Add tasks for the EVALITA-LLM benchmark evaluation * Small changes Add the alias of the task name that will be printed in the final table results. * Updated the prompts to reflect changes made to the extended dataset for the Admission Test task * chore: cleaned templates before PR; feat: add configuration to run generation/ppl tasks. * fix: add information on Evalita-LLM for PR * fix: rename folders and files * fix: remove unused imports * chore: run pre-commit * chore: add task description --------- Co-authored-by:rzanoli <zanoli@fbk.eu> Co-authored-by:
Marco Madeddu <marco.madeddu.bra@gmail.com>
Showing
Please register or sign in to comment