### Evaluation example Here, we use the `--predict_only` argument and compute the performance metrics as described below. **Step 1: Generate the predictions** ```bash lm_eval \ --model hf \ --model_args pretrained=AI-Sweden-Models/Llama-3-8B \ --tasks ask_gec \ --output results/ask_gec/0-shot/ \ --log_samples \ --show_config \ --write_out \ --predict_only \ --batch_size auto \ --num_fewshot 0 ``` **Step 2: Evaluate the predictions with ERRANT** * Please refer to the installation instructions [here](https://github.com/chrisjbryant/errant/tree/main). * Run the following: ```bash python3 ask_gec/errant.py --fpath results/ask_gec/0-shot/AI-Sweden-Models__Llama-3-8B/samples_ask_gec_p0_2025-01-28T01-08-13.454441.jsonl --out_fdir results/ask_gec/0-shot/AI-Sweden-Models__Llama-3-8B/ ``` * The results will be saved as `results/ask_gec/0-shot/AI-Sweden-Models__Llama-3-8B/samples_ask_gec_p0_2025-01-28T01-08-13.454441_errant.json`