• Michele Resta's avatar
    Adding the Evalita-LLM benchmark (#2681) · b7fccef5
    Michele Resta authored
    
    
    * feat: initial commit with templates for evalita evaluation
    
    * fix: change rule for generate_until
    
    * feat: modified yaml to use reduced version of NER test datasets
    
    * feat: added templates to use reduced dataset for summarization (fanpage and ilpost)
    
    * Add Six Prompts for Each Multiple-Choice Task
    
    * feat: modified fewshot split for textual entailment task
    
    * fix: new doc_to_target function for NER tasks
    
    * Update prompt
    
    * Add partition for few-shot evaluation
    
    * Add partition for few-shot evaluation
    
    * Add partition for few-shot evaluation
    
    * Add partition for few-shot evaluatio
    
    * Update prompt
    
    * Add partition for few-shot evaluation
    
    * Rename file
    
    Rename file from _evalita-mp_ner_adg_p1 .yaml to _evalita-mp_ner_adg_p1.yaml
    
    * Add partition for few-shot evaluation
    
    * Add partition for few-shot evaluation
    
    * Enhance lexical substitution management
    
    - Improve scorer calculation for better accuracy
    - Update model output postprocessing for clearer results
    - Add support for few-shot relation extraction task
    
    * Add F1 macro measure for the document dating task
    
    * Add F1-macro measure to evaluate document dating
    
    * Use the whole dataset
    
    * Small changes
    
    * Add the two prompts for the task of lexical substitution
    
    * Add few-shot split configuration
    
    * Add few-shot split configuration
    
    * Add function for handling few-shot learning setup
    
    * Fix prompt
    
    * Remove configuration file
    
    * Update dataset from test_same to test_cross for evaluations
    
    * Remove whitespace at end of prompt
    
    * Fix configuration error: corrected parameter name for the dataset used in few-shot
    
    * Fix: Check if results is not empty before processing in lexical substitution task
    
    * added the prompts and functions for correct NER and RE execution
    
    * Add accuracy measure
    
    * Add tasks for the EVALITA-LLM benchmark evaluation
    
    * Small changes
    
    Add the alias of the task name that will be printed in the final table results.
    
    * Updated the prompts to reflect changes made to the extended dataset for the Admission Test task
    
    * chore: cleaned templates before PR; feat: add configuration to run generation/ppl tasks.
    
    * fix: add information on Evalita-LLM for PR
    
    * fix: rename folders and files
    
    * fix: remove unused imports
    
    * chore: run pre-commit
    
    * chore: add task description
    
    ---------
    Co-authored-by: default avatarrzanoli <zanoli@fbk.eu>
    Co-authored-by: default avatarMarco Madeddu <marco.madeddu.bra@gmail.com>
    b7fccef5
README.md 40 KB