"Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.8->lm-eval==1.0.0) (1.3.0)\n",
"Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.8->lm-eval==1.0.0) (1.3.0)\n",
"Building wheels for collected packages: lm-eval, rouge-score, sqlitedict\n",
"Building wheels for collected packages: lm-eval, rouge-score, sqlitedict\n",
" Building wheel for lm-eval (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
" Building wheel for lm-eval (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for lm-eval: filename=lm_eval-1.0.0-py3-none-any.whl size=986403 sha256=b7907b5840136fbd6ae948150ebab82776a0deb76359e59da24fbd2259de8094\n",
" Created wheel for lm-eval: filename=lm_eval-1.0.0-py3-none-any.whl size=994254 sha256=88356155b19f2891981ecef948326ad6ce8ca40a6009378410ec20d0e225995a\n",
" Stored in directory: /tmp/pip-ephem-wheel-cache-axf286nt/wheels/17/01/26/599c0779e9858a70a73fa8a306699b5b9a868f820c225457b0\n",
" Stored in directory: /tmp/pip-ephem-wheel-cache-9v6ye7h3/wheels/17/01/26/599c0779e9858a70a73fa8a306699b5b9a868f820c225457b0\n",
" Building wheel for rouge-score (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Building wheel for rouge-score (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=8983ef5e600070817b0114ebe7ad48d72891d918ab2beea00ed6c5c437a6f713\n",
" Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=6bb0d44e4881972c43ce194e7cb65233d309758cb15f0dec54590d3d2efcfc36\n",
" Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4\n",
" Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4\n",
" Building wheel for sqlitedict (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Building wheel for sqlitedict (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for sqlitedict: filename=sqlitedict-2.1.0-py3-none-any.whl size=16863 sha256=146f7dcb0d03ca6c21a08e9d93d74d6c07cd42d0e37ddb7e922127e5ddb5e213\n",
" Created wheel for sqlitedict: filename=sqlitedict-2.1.0-py3-none-any.whl size=16863 sha256=5747f7dd73ddf3d8fbcebf51b5e4f718fabe1e94bccdf16d2f22a2e65ee7fdf4\n",
" Stored in directory: /root/.cache/pip/wheels/79/d6/e7/304e0e6cb2221022c26d8161f7c23cd4f259a9e41e8bbcfabd\n",
" Stored in directory: /root/.cache/pip/wheels/79/d6/e7/304e0e6cb2221022c26d8161f7c23cd4f259a9e41e8bbcfabd\n",
"Successfully built lm-eval rouge-score sqlitedict\n",
"Successfully built lm-eval rouge-score sqlitedict\n",
"2023-11-27:08:14:53,517 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-29:11:54:55,156 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-27 08:14:54.499605: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-29 11:54:55.942051: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-27 08:14:54.499658: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-29 11:54:55.942108: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-27 08:14:54.499691: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-29 11:54:55.942142: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-27 08:14:56.139266: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-29 11:54:57.066802: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-29:11:55:11,047 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
"2023-11-29:11:56:18,658 WARNING [task.py:614] [Task: demo_boolq] metric acc is defined, but aggregation is not. using default aggregation=mean\n",
"2023-11-29:11:56:18,658 WARNING [task.py:626] [Task: demo_boolq] metric acc is defined, but higher_is_better is not. using default higher_is_better=True\n",
"Oftenly, tasks are part of a larger group used to meausre different capabilities. The dynamism of the field today means new dimensions of evaluation can come about which would mix and match new and older tasks alike. In LM-Eval, We can also group tasks and call that the group name to evaluate on a set of tasks easily. In this instance, let's evaluate the group `yes_or_no_tasks` which comprise of the tasks `demo_boolq` and `demo_cola`; tasks which are multiple choice tasks with options `yes` and `no` as the name suggests.\n",
"Often, tasks are part of a larger group used to meausre different capabilities. The dynamism of the field today means new dimensions of evaluation can come about which would mix and match new and older tasks alike. In LM-Eval, We can also group tasks and call that the group name to evaluate on a set of tasks easily. In this instance, let's evaluate the group `yes_or_no_tasks` which comprise of the tasks `demo_boolq` and `demo_cola`; tasks which are multiple choice tasks with options `yes` and `no` as the name suggests.\n",
"\n",
"\n",
"<!-- making new groups is easier than ever, allowing user to work bottom-up by makiing individual tasks and linking them to a group or Top-Down, making a new group by listing existing tasks.\n",
"<!-- making new groups is easier than ever, allowing user to work bottom-up by makiing individual tasks and linking them to a group or Top-Down, making a new group by listing existing tasks.\n",
"2023-11-27:08:15:04,956 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-29:11:56:33,016 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-27 08:15:06.059715: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-29 11:56:33.852995: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-27 08:15:06.059774: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-29 11:56:33.853050: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-27 08:15:06.059810: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-29 11:56:33.853087: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-27 08:15:07.648550: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-29 11:56:35.129047: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-27:08:15:10,952 INFO [__main__.py:124] Verbosity set to INFO\n",
"2023-11-29:11:56:38,546 INFO [__main__.py:132] Verbosity set to INFO\n",
"2023-11-27:08:15:18,895 WARNING [__main__.py:130] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
"2023-11-29:11:56:47,509 WARNING [__main__.py:138] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
"2023-11-27:08:15:18,895 INFO [__main__.py:135] Including path: ./\n",
"2023-11-29:11:56:47,509 INFO [__main__.py:143] Including path: ./\n",
"2023-11-27:08:15:18,915 INFO [__main__.py:197] Selected Tasks: ['yes_or_no_tasks']\n",
"2023-11-29:11:56:47,517 INFO [__main__.py:205] Selected Tasks: ['yes_or_no_tasks']\n",
"2023-11-27:08:15:18,979 INFO [huggingface.py:119] Using device 'cuda'\n",
"2023-11-29:11:56:47,520 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
"2023-11-29:11:57:08,743 WARNING [task.py:626] [Task: demo_cola] metric acc is defined, but higher_is_better is not. using default higher_is_better=True\n",
"2023-11-27:08:16:31,344 WARNING [task.py:624] [Task: demo_boolq] metric acc is defined, but higher_is_better is not. using default higher_is_better=True\n",
"Generating test split: 100% 3245/3245 [00:00<00:00, 20505.42 examples/s]\n",
"2023-11-27:08:16:35,225 WARNING [task.py:612] [Task: demo_cola] metric acc is defined, but aggregation is not. using default aggregation=mean\n",
"2023-11-27:08:16:35,225 WARNING [task.py:624] [Task: demo_cola] metric acc is defined, but higher_is_better is not. using default higher_is_better=True\n",
"2023-11-27:08:16:50,338 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-29:11:57:23,598 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-27 08:16:51.182014: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-29 11:57:24.719750: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-27 08:16:51.182064: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-29 11:57:24.719806: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-27 08:16:51.182100: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-29 11:57:24.719847: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-27 08:16:52.470541: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-29 11:57:26.656125: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-27:08:16:55,648 INFO [__main__.py:124] Verbosity set to INFO\n",
"2023-11-29:11:57:31,563 INFO [__main__.py:132] Verbosity set to INFO\n",
"2023-11-27:08:17:05,419 WARNING [__main__.py:130] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
"2023-11-29:11:57:40,541 WARNING [__main__.py:138] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
"2023-11-27:08:17:05,420 INFO [__main__.py:135] Including path: ./\n",
"2023-11-29:11:57:40,541 INFO [__main__.py:143] Including path: ./\n",
"2023-11-27:08:17:05,433 INFO [__main__.py:197] Selected Tasks: ['demo_mmlu_high_school_geography']\n",
"2023-11-29:11:57:40,558 INFO [__main__.py:205] Selected Tasks: ['demo_mmlu_high_school_geography']\n",
"2023-11-27:08:17:05,469 INFO [huggingface.py:119] Using device 'cuda'\n",
"2023-11-29:11:57:40,559 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
"We could also evaluate this task in a different way. For example, instead of observing the loglikelihood of the letters, we can instead evaluate on the choices themselves as the continuation. This is done by simply changing `doc_to_choice` from a list of letters to the corresponding `choices` field from the HF dataset. We write `\"{{choices}}\"` so that the string field is interpreted as jinja string that acquires the list from the HF dataset directly.\n",
"We could also evaluate this task in a different way. For example, instead of observing the loglikelihood of the letters, we can instead evaluate on the choices themselves as the continuation. This is done by simply changing `doc_to_choice` from a list of letters to the corresponding `choices` field from the HF dataset. We write `\"{{choices}}\"` so that the string field is interpreted as jinja string that acquires the list from the HF dataset directly.\n",
"\n",
"\n",
"Another convinient feature here is since we're only modifying the `doc_to_choice` and the rest of config is the same as the task above, we can use the above configuration as a template by using `include: mmlu_high_school_geography.yaml` to load the config from that file. We'll need to add a unique task name as to not colide with the existing yaml config we're including. For this case we'll simply name this one `mmlu_high_school_geography_continuation`. `doc_to_text` is added here just for sake of clarity."
"Another convenient feature here is since we're only modifying the `doc_to_choice` and the rest of config is the same as the task above, we can use the above configuration as a template by using `include: mmlu_high_school_geography.yaml` to load the config from that file. We'll need to add a unique task name as to not colide with the existing yaml config we're including. For this case we'll simply name this one `mmlu_high_school_geography_continuation`. `doc_to_text` is added here just for sake of clarity."
"2023-11-27:08:17:52,429 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-29:11:58:21,284 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-27 08:17:53.714003: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-29 11:58:22.850159: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-27 08:17:53.714080: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-29 11:58:22.850219: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-27 08:17:53.714119: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-29 11:58:22.850254: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-27 08:17:55.953811: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-29 11:58:24.948103: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-27:08:17:59,577 INFO [__main__.py:124] Verbosity set to INFO\n",
"2023-11-29:11:58:28,460 INFO [__main__.py:132] Verbosity set to INFO\n",
"2023-11-27:08:18:07,654 WARNING [__main__.py:130] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
"2023-11-29:11:58:37,935 WARNING [__main__.py:138] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
"2023-11-27:08:18:07,654 INFO [__main__.py:135] Including path: ./\n",
"2023-11-29:11:58:37,935 INFO [__main__.py:143] Including path: ./\n",
"2023-11-27:08:18:07,686 INFO [__main__.py:197] Selected Tasks: ['demo_mmlu_high_school_geography_continuation']\n",
"2023-11-29:11:58:37,969 INFO [__main__.py:205] Selected Tasks: ['demo_mmlu_high_school_geography_continuation']\n",
"2023-11-27:08:18:07,735 INFO [huggingface.py:119] Using device 'cuda'\n",
"2023-11-29:11:58:37,972 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
"2023-11-27:08:18:25,609 INFO [task.py:353] Building contexts for task on rank 0...\n",
"2023-11-29:11:58:38,008 INFO [huggingface.py:120] Using device 'cuda'\n",
"2023-11-27:08:18:25,628 INFO [evaluator.py:290] Running loglikelihood requests\n",
"2023-11-29:11:58:59,758 INFO [task.py:355] Building contexts for task on rank 0...\n",
"100% 40/40 [00:05<00:00, 7.68it/s]\n",
"2023-11-29:11:58:59,777 INFO [evaluator.py:319] Running loglikelihood requests\n",
"100% 40/40 [00:02<00:00, 16.23it/s]\n",
"fatal: not a git repository (or any of the parent directories): .git\n",
"fatal: not a git repository (or any of the parent directories): .git\n",
"2023-11-27:08:18:37,991 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-29:11:59:08,312 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
"2023-11-27 08:18:39.666844: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-29 11:59:09.348327: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
"2023-11-27 08:18:39.666894: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-29 11:59:09.348387: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
"2023-11-27 08:18:39.666949: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-29 11:59:09.348421: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
"2023-11-27 08:18:41.101009: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-29 11:59:10.573752: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"2023-11-27:08:18:44,452 INFO [__main__.py:124] Verbosity set to INFO\n",
"2023-11-29:11:59:14,044 INFO [__main__.py:132] Verbosity set to INFO\n",
"2023-11-27:08:18:54,278 WARNING [__main__.py:130] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
"2023-11-29:11:59:23,654 WARNING [__main__.py:138] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
"2023-11-27:08:18:54,279 INFO [__main__.py:135] Including path: ./\n",
"2023-11-29:11:59:23,654 INFO [__main__.py:143] Including path: ./\n",
"2023-11-27:08:18:54,303 INFO [__main__.py:197] Selected Tasks: ['demo_mmlu_high_school_geography_function_prompt']\n",
"2023-11-29:11:59:23,678 INFO [__main__.py:205] Selected Tasks: ['demo_mmlu_high_school_geography_function_prompt']\n",
"2023-11-27:08:18:54,333 INFO [huggingface.py:119] Using device 'cuda'\n",
"2023-11-29:11:59:23,679 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
"2023-11-27:08:19:12,078 INFO [task.py:353] Building contexts for task on rank 0...\n",
"2023-11-29:11:59:23,708 INFO [huggingface.py:120] Using device 'cuda'\n",
"2023-11-27:08:19:12,084 INFO [evaluator.py:290] Running loglikelihood requests\n",
"2023-11-29:11:59:44,516 INFO [task.py:355] Building contexts for task on rank 0...\n",
"100% 40/40 [00:02<00:00, 16.47it/s]\n",
"2023-11-29:11:59:44,524 INFO [evaluator.py:319] Running loglikelihood requests\n",
"100% 40/40 [00:02<00:00, 15.41it/s]\n",
"fatal: not a git repository (or any of the parent directories): .git\n",
"fatal: not a git repository (or any of the parent directories): .git\n",