Add non-programmatic BIG-bench-hard tasks (#406)

* Support bigbench-hard json tasks using multiple_choice_grade * Add support for greedy decoding in bigbench tasks * move bigbench_resources to datasets * rectify changes to rf.greedy_until w upstream * make path to resource import reflect new location --------- Co-authored-by: haileyschoelkopf <hailey.schoelkopf@yale.edu>

Add non-programmatic BIG-bench-hard tasks (#406)
* Support bigbench-hard json tasks using multiple_choice_grade * Add support for greedy decoding in bigbench tasks * move bigbench_resources to datasets * rectify changes to rf.greedy_until w upstream * make path to resource import reflect new location --------- Co-authored-by: haileyschoelkopf <hailey.schoelkopf@yale.edu>
602abceb · yurodiviy · GitHub · e47e01be · 602abceb · 602abceb
Unverified Commit 602abceb authored Apr 28, 2023 by yurodiviy Committed by GitHub Apr 28, 2023
20 changed files
--- a/lm_eval/base.py
+++ b/lm_eval/base.py
@@ -342,18 +342,25 @@ class BaseLM(LM):
        re_ord = utils.Reorderer(requests, _collate)
-        for context, until in tqdm(re_ord.get_reordered()):
+        for context, request_args in tqdm(re_ord.get_reordered()):
+            until = request_args['until']
            if isinstance(until, str):
                until = [until]
+            if until:
                (primary_until,) = self.tok_encode(until[0])
+            else:
+                primary_until = None
            context_enc = torch.tensor(
                [self.tok_encode(context)[self.max_gen_toks - self.max_length :]]
            ).to(self.device)
+            max_gen_tokens = min(
+                self.max_gen_toks, request_args.get('max_length', self.max_gen_toks)
+            )
            cont = self._model_generate(
-                context_enc, context_enc.shape[1] + self.max_gen_toks, primary_until
+                context_enc, context_enc.shape[1] + max_gen_tokens, primary_until
            )
            s = self.tok_decode(cont[0].tolist()[context_enc.shape[1] :])

--- a/lm_eval/datasets/bigbench_resources/causal_judgement.json
+++ b/lm_eval/datasets/bigbench_resources/causal_judgement.json
--- a/lm_eval/datasets/bigbench_resources/date_understanding.json
+++ b/lm_eval/datasets/bigbench_resources/date_understanding.json
--- a/lm_eval/datasets/bigbench_resources/disambiguation_qa.json
+++ b/lm_eval/datasets/bigbench_resources/disambiguation_qa.json
--- a/lm_eval/datasets/bigbench_resources/dyck_languages.json
+++ b/lm_eval/datasets/bigbench_resources/dyck_languages.json
--- a/lm_eval/datasets/bigbench_resources/formal_fallacies_syllogisms_negation.json
+++ b/lm_eval/datasets/bigbench_resources/formal_fallacies_syllogisms_negation.json
--- a/lm_eval/datasets/bigbench_resources/geometric_shapes.json
+++ b/lm_eval/datasets/bigbench_resources/geometric_shapes.json
--- a/lm_eval/datasets/bigbench_resources/hyperbaton.json
+++ b/lm_eval/datasets/bigbench_resources/hyperbaton.json
--- a/lm_eval/datasets/bigbench_resources/logical_deduction_five_objects.json
+++ b/lm_eval/datasets/bigbench_resources/logical_deduction_five_objects.json
--- a/lm_eval/datasets/bigbench_resources/logical_deduction_seven_objects.json
+++ b/lm_eval/datasets/bigbench_resources/logical_deduction_seven_objects.json
--- a/lm_eval/datasets/bigbench_resources/logical_deduction_three_objects.json
+++ b/lm_eval/datasets/bigbench_resources/logical_deduction_three_objects.json
--- a/lm_eval/datasets/bigbench_resources/movie_recommendation.json
+++ b/lm_eval/datasets/bigbench_resources/movie_recommendation.json
--- a/lm_eval/datasets/bigbench_resources/navigate.json
+++ b/lm_eval/datasets/bigbench_resources/navigate.json
--- a/lm_eval/datasets/bigbench_resources/reasoning_about_colored_objects.json
+++ b/lm_eval/datasets/bigbench_resources/reasoning_about_colored_objects.json
--- a/lm_eval/datasets/bigbench_resources/ruin_names.json
+++ b/lm_eval/datasets/bigbench_resources/ruin_names.json
--- a/lm_eval/datasets/bigbench_resources/salient_translation_error_detection.json
+++ b/lm_eval/datasets/bigbench_resources/salient_translation_error_detection.json
--- a/lm_eval/datasets/bigbench_resources/snarks.json
+++ b/lm_eval/datasets/bigbench_resources/snarks.json
--- a/lm_eval/datasets/bigbench_resources/sports_understanding.json
+++ b/lm_eval/datasets/bigbench_resources/sports_understanding.json
--- a/lm_eval/datasets/bigbench_resources/temporal_sequences.json
+++ b/lm_eval/datasets/bigbench_resources/temporal_sequences.json
--- a/lm_eval/datasets/bigbench_resources/tracking_shuffled_objects_five_objects.json
+++ b/lm_eval/datasets/bigbench_resources/tracking_shuffled_objects_five_objects.json