# How to run lm-eval on Megatron-DeepSpeed checkpoint using the original setup A great portion of this eval harness feature is inherited from https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/212, but with code/doc changes (e.g., to support case without pipeline parallelism and MoE models). This particular setup uses the normal deepspeed checkpoint and requires no conversion to Megatron-LM. ## Prerequisites 1. Install software On login console with external network Get lm-eval harness (https://github.com/EleutherAI/lm-evaluation-harness) and `best-download==0.0.7` needed to download some tasks. Below package version numbers are what we tested that work. ``` (maybe need pip install --upgrade pip) pip install best-download==0.0.7 lm-eval==0.2.0 datasets==1.15.1 transformers==4.20.1 huggingface-hub==0.8.1 ``` 2. Pre-download needed datasets some symlinks due to lm-harness' issues with relative position of data ``` mkdir data cd ../../tasks/eval_harness/ ln -s ../../examples_deepspeed/MoE/data/ data cd ../../examples_deepspeed/MoE/ ``` Then install datasets for the tasks: ``` python ../../tasks/eval_harness/download.py --task_list hellaswag,lambada,triviaqa,webqs,winogrande,piqa,arc_challenge,arc_easy,openbookqa,race,boolq,cb,copa,rte,wic,wsc,multirc,record,anli_r1,anli_r2,anli_r3,wikitext,logiqa,mathqa,mc_taco,mrpc,prost,pubmedqa,qnli,qqp,sciq,sst,wnli ``` Previously we set `export HF_DATASETS_OFFLINE=1` to make the dataset offline after the above manual download. But somehow now this could trigger error on some kind of online verification for some of the datasets, so it's recommended to only set offline mode when necessary. 3. Prepare the script `ds_evalharness.sh` is the example script. 1. Edit: ``` PP_SIZE=1 TP_SIZE=1 NO_PP="true" EP_PARALLEL_SIZE=1 NUM_NODE=1 NUM_GPU_PER_NODE=1 ``` to match the eval topology. Edit: ``` CHECKPOINT_PATH= CONFIG_PATH= RESULT_PATH= ``` to the checkpoint/ds config you want to use, and where to save the results. 2. Adjust the following to fit the chosen GPU. As of last check for 1.3B model the settings are one of: ``` EVAL_MICRO_BATCH_SIZE=6 # 16GB GPU 1.3B model EVAL_MICRO_BATCH_SIZE=12 # 32GB GPU 1.3B model ``` If you get OOM lower it further. 3. If not using the Deepspeed path, disable it by removing: ``` --deepspeed \ --deepspeed_config ds_config.json \ ``` If you didn't disable it and the program crashed on checkpoint loading unable to find some key, disable deepspeed as explained above. Note that for MoE models and for models without pipeline parallelism, currently they might not work for the case without deepspeed.