test

c1c44651 · Stella Biderman · 15953e42 · 3c75a2c7 · c1c44651 · c1c44651
Commit c1c44651 authored Dec 04, 2023 by Stella Biderman
Hide whitespace changes
Inline Side-by-side

Showing with 21 additions and 19 deletions

README.md README.md +21 -0

lm_eval/api/samplers.py lm_eval/api/samplers.py +0 -19

No files found.
--- a/README.md
+++ b/README.md
 # Language Model Evaluation Harness
+<<<<<<< HEAD
 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10256836.svg)](https://doi.org/10.5281/zenodo.10256836)
+=======
+## Announcement
+**A new v0.4.0 release of lm-evaluation-harness is available** ! 
+New updates and features include:
+- Internal refactoring
+- Config-based task creation and configuration
+- Easier import of externally-defined task config files (--include_path, passing path to YAML directly, etc)
+- Support for Jinja2 prompt design, easy modification of prompts + prompt imports from Promptsource
+- More advanced configuration options, including output post-processing, answer extraction, and multiple LM generations per document, configurable fewshot settings, and more
+- Speedups and new modeling libraries supported, including: faster data-parallel HF model usage, vLLM support, MPS support with HuggingFace, and more
+- Logging and usability changes
+- New tasks including CoT BIG-Bench-Hard, Belebele, user-defined task groupings, and more
+Please see our updated documentation pages in `docs/` for more details.
+Development will be continuing on the `main` branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub, or in the [EleutherAI discord](discord.gg/eleutherai)!
+>>>>>>> 3c75a2c77375769f5c483f084e95ea8b8514c9bb
 ## Overview

--- a/lm_eval/api/samplers.py
+++ b/lm_eval/api/samplers.py
@@ -112,22 +112,3 @@ def get_sampler(name):
        raise ValueError(
            f"Attempted to use contextsampler '{name}', but no sampling strategy for this name found! Supported model names: {', '.join(SAMPLER_REGISTRY.keys())}"
        )
-# TODO: how should we do design here? might be better to have a single sampler and pass more kwargs at init.
-# Depends what's easier for new user to add own functionality on top of
-# types of sampler:
-# - class-balanced, randomly shuffled
-# - class-balanced, one particular set of fewshot examples for all evaled instances
-# - hand-specify number of fewshot examples per class?
-# - random, varies per example (check that this is curr. default in old repo)
-# - random, unified per example
-# - enforce a specific fixed fewshot string! (or should we not use this, in favor of including it in prompt template directly)
-# - user-specified doc indices to restrict fewshot doc options to
-# - user specifies split to use for drawing fewshot instances (TODO: manually prevent this from being same split you eval!)
-# - user specifies a prepended "description"/string to add in front of the (prompted) input
-# - user specifies a location to draw fewshot samples from? DO THIS IN TASK CLASS