"vscode:/vscode.git/clone" did not exist on "4a565cbf9417ab6ec4560d334d556b5e97e23be9"
Unverified Commit 42dc2448 authored by Baber Abbasi's avatar Baber Abbasi Committed by GitHub
Browse files

Refactor API models (#2008)



* refactor pad_token handling to fn

* fix docs

* add pad_token_handling to vllm

* start on API superclass

* don't detokenize the returned logits

* streamline vllm tokenizer

* add type hint

* pre-commit

* seems to be in working order

* add model to init

* refactor api models

* nit

* cleanup

* add pbar

* fix type hints

* change optional dependencies

* json encode chat template

* add type hints

* deal with different prompt input requiremnts

* nits

* fix

* cache inside async

* fix

* fix

* nits

* nits

* nits

* nit

* fixup

* fixup

* nit

* add dummy retry

* add dummy retry

* handle imports; skip failing test

* add type hint

* add tests

* add dependency to tests

* add package names to exception

* nit

* docs; type hints

* handle api key

* nit

* tokenizer bug

* fix tokenizer

* nit

* nit

* add better error messages

* nit

* remove decorator

* CI: install api dep

* revert evaluator.py

* consolidate

* consolidate

* nits

* nit

* fix typealias

* nit

* nit

* nit

* Update lm_eval/models/api_models.py

typo
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/openai_completions.py
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/anthropic_llms.py
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/api_models.py
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix typo

* add news section

* add info for API

* pre-commit

* typo

* fix bug: unpack logliklehood requests

* fix bug: shared gen_kwargs mutated

* nit: handle copy properly

* Update README.md

* Update README.md

* Update README.md

* Update api_models.py

* Update README.md

---------
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
parent 4a62757d
...@@ -56,7 +56,7 @@ jobs: ...@@ -56,7 +56,7 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: | run: |
python -m pip install --upgrade pip python -m pip install --upgrade pip
pip install -e '.[dev,anthropic,sentencepiece]' --extra-index-url https://download.pytorch.org/whl/cpu pip install -e '.[dev,sentencepiece,api]' --extra-index-url https://download.pytorch.org/whl/cpu
# Install optional git dependencies # Install optional git dependencies
# pip install bleurt@https://github.com/google-research/bleurt/archive/b610120347ef22b494b6d69b4316e303f5932516.zip#egg=bleurt # pip install bleurt@https://github.com/google-research/bleurt/archive/b610120347ef22b494b6d69b4316e303f5932516.zip#egg=bleurt
# if [ -f requirements.txt ]; then pip install -r requirements.txt; fi # if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
...@@ -84,7 +84,7 @@ jobs: ...@@ -84,7 +84,7 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: | run: |
python -m pip install --upgrade pip python -m pip install --upgrade pip
pip install -e '.[dev,optimum,deepsparse,sparseml]' --extra-index-url https://download.pytorch.org/whl/cpu pip install -e '.[dev,optimum,deepsparse,sparseml,api]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Test with pytest - name: Test with pytest
run: python -m pytest tests/models --showlocals -s -vv run: python -m pytest tests/models --showlocals -s -vv
- name: Archive artifacts - name: Archive artifacts
......
...@@ -2,6 +2,15 @@ ...@@ -2,6 +2,15 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10256836.svg)](https://doi.org/10.5281/zenodo.10256836) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10256836.svg)](https://doi.org/10.5281/zenodo.10256836)
---
*Latest News 📣*
- [2024/07] API model support has been updated and refactored, introducing support for batched and async requests, and making it significantly easier to customize and use for your own purposes. **To run Llama 405B, we recommend using VLLM's OpenAI-compliant API to host the model, and use the `local-completions` model type to evaluate the model.**
- [2024/07] New Open LLM Leaderboard tasks have been added ! You can find them under the [leaderboard](lm_eval/tasks/leaderboard/README.md) task group.
---
## Announcement ## Announcement
**A new v0.4.0 release of lm-evaluation-harness is available** ! **A new v0.4.0 release of lm-evaluation-harness is available** !
...@@ -21,6 +30,8 @@ Please see our updated documentation pages in `docs/` for more details. ...@@ -21,6 +30,8 @@ Please see our updated documentation pages in `docs/` for more details.
Development will be continuing on the `main` branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub, or in the [EleutherAI discord](https://discord.gg/eleutherai)! Development will be continuing on the `main` branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub, or in the [EleutherAI discord](https://discord.gg/eleutherai)!
---
## Overview ## Overview
This project provides a unified framework to test generative language models on a large number of different evaluation tasks. This project provides a unified framework to test generative language models on a large number of different evaluation tasks.
...@@ -112,7 +123,7 @@ For cases where your model can fit on a single GPU, this allows you to evaluate ...@@ -112,7 +123,7 @@ For cases where your model can fit on a single GPU, this allows you to evaluate
The second way of using `accelerate` for multi-GPU evaluation is when your model is *too large to fit on a single GPU.* The second way of using `accelerate` for multi-GPU evaluation is when your model is *too large to fit on a single GPU.*
In this setting, run the library *outside of the `accelerate` launcher*, but passing `parallelize=True` to `--model_args` as follows: In this setting, run the library *outside the `accelerate` launcher*, but passing `parallelize=True` to `--model_args` as follows:
``` ```
lm_eval --model hf \ lm_eval --model hf \
...@@ -217,12 +228,12 @@ lm_eval --model openai-completions \ ...@@ -217,12 +228,12 @@ lm_eval --model openai-completions \
We also support using your own local inference server with servers that mirror the OpenAI Completions and ChatCompletions APIs. We also support using your own local inference server with servers that mirror the OpenAI Completions and ChatCompletions APIs.
```bash ```bash
lm_eval --model local-chat-completions --tasks gsm8k --model_args model=facebook/opt-125m,base_url=http://{yourip}:8000/v1 lm_eval --model local-completions --tasks gsm8k --model_args model=facebook/opt-125m,base_url=http://{yourip}:8000/v1,num_concurrent=1,max_retries=3,tokenized_requests=False
``` ```
Note that for externally hosted models, configs such as `--device` and `--batch_size` should not be used and do not function. Just like you can use `--model_args` to pass arbitrary arguments to the model constructor for local models, you can use it to pass arbitrary arguments to the model API for hosted models. See the documentation of the hosting service for information on what arguments they support. Note that for externally hosted models, configs such as `--device` and `--batch_size` should not be used and do not function. Just like you can use `--model_args` to pass arbitrary arguments to the model constructor for local models, you can use it to pass arbitrary arguments to the model API for hosted models. See the documentation of the hosting service for information on what arguments they support.
| API or Inference Server | Implemented? | `--model <xxx>` name | Models supported: | Request Types: | | API or Inference Server | Implemented? | `--model <xxx>` name | Models supported: | Request Types: |
|---------------------------------------------------------------------------------------------------------------------------|---------------------------------|---------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|------------------------------------------------------------| |---------------------------------------------------------------------------------------------------------------------------|---------------------------------|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| OpenAI Completions | :heavy_check_mark: | `openai-completions`, `local-completions` | All OpenAI Completions API models | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | | OpenAI Completions | :heavy_check_mark: | `openai-completions`, `local-completions` | All OpenAI Completions API models | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
| OpenAI ChatCompletions | :heavy_check_mark: | `openai-chat-completions`, `local-chat-completions` | [All ChatCompletions API models](https://platform.openai.com/docs/guides/gpt) | `generate_until` (no logprobs) | | OpenAI ChatCompletions | :heavy_check_mark: | `openai-chat-completions`, `local-chat-completions` | [All ChatCompletions API models](https://platform.openai.com/docs/guides/gpt) | `generate_until` (no logprobs) |
| Anthropic | :heavy_check_mark: | `anthropic` | [Supported Anthropic Engines](https://docs.anthropic.com/claude/reference/selecting-a-model) | `generate_until` (no logprobs) | | Anthropic | :heavy_check_mark: | `anthropic` | [Supported Anthropic Engines](https://docs.anthropic.com/claude/reference/selecting-a-model) | `generate_until` (no logprobs) |
...@@ -236,7 +247,7 @@ Note that for externally hosted models, configs such as `--device` and `--batch_ ...@@ -236,7 +247,7 @@ Note that for externally hosted models, configs such as `--device` and `--batch_
| Neuron via AWS Inf2 (Causal LMs) | ✔️ | `neuronx` | Any decoder-only AutoModelForCausalLM supported to run on [huggingface-ami image for inferentia2](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... | | Neuron via AWS Inf2 (Causal LMs) | ✔️ | `neuronx` | Any decoder-only AutoModelForCausalLM supported to run on [huggingface-ami image for inferentia2](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... |
| [Neural Magic DeepSparse](https://github.com/neuralmagic/deepsparse) | ✔️ | `deepsparse` | Any LM from [SparseZoo](https://sparsezoo.neuralmagic.com/) or on [HF Hub with the "deepsparse" tag](https://huggingface.co/models?other=deepsparse) | `generate_until`, `loglikelihood` | ... | | [Neural Magic DeepSparse](https://github.com/neuralmagic/deepsparse) | ✔️ | `deepsparse` | Any LM from [SparseZoo](https://sparsezoo.neuralmagic.com/) or on [HF Hub with the "deepsparse" tag](https://huggingface.co/models?other=deepsparse) | `generate_until`, `loglikelihood` | ... |
| [Neural Magic SparseML](https://github.com/neuralmagic/sparseml) | ✔️ | `sparseml` | Any decoder-only AutoModelForCausalLM from [SparseZoo](https://sparsezoo.neuralmagic.com/) or on [HF Hub](https://huggingface.co/neuralmagic). Especially useful for models with quantization like [`zoo:llama2-7b-gsm8k_llama2_pretrain-pruned60_quantized`](https://sparsezoo.neuralmagic.com/models/llama2-7b-gsm8k_llama2_pretrain-pruned60_quantized) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... | | [Neural Magic SparseML](https://github.com/neuralmagic/sparseml) | ✔️ | `sparseml` | Any decoder-only AutoModelForCausalLM from [SparseZoo](https://sparsezoo.neuralmagic.com/) or on [HF Hub](https://huggingface.co/neuralmagic). Especially useful for models with quantization like [`zoo:llama2-7b-gsm8k_llama2_pretrain-pruned60_quantized`](https://sparsezoo.neuralmagic.com/models/llama2-7b-gsm8k_llama2_pretrain-pruned60_quantized) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... |
| Your local inference server! | :heavy_check_mark: | `local-completions` or `local-chat-completions` (using `openai-chat-completions` model type) | Any server address that accepts GET requests using HF models and mirror's OpenAI's Completions or ChatCompletions interface | `generate_until` | | ... | | Your local inference server! | :heavy_check_mark: | `local-completions` or `local-chat-completions` | Support for OpenAI API-compatible servers, with easy customization for other APIs. | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | | ... |
Models which do not supply logits or logprobs can be used with tasks of type `generate_until` only, while local models, or APIs that supply logprobs/logits of their prompts, can be run on all task types: `generate_until`, `loglikelihood`, `loglikelihood_rolling`, and `multiple_choice`. Models which do not supply logits or logprobs can be used with tasks of type `generate_until` only, while local models, or APIs that supply logprobs/logits of their prompts, can be run on all task types: `generate_until`, `loglikelihood`, `loglikelihood_rolling`, and `multiple_choice`.
...@@ -437,8 +448,8 @@ The best way to get support is to open an issue on this repo or join the [Eleuth ...@@ -437,8 +448,8 @@ The best way to get support is to open an issue on this repo or join the [Eleuth
Extras dependencies can be installed via `pip install -e ".[NAME]"` Extras dependencies can be installed via `pip install -e ".[NAME]"`
| Name | Use | | Name | Use |
|---------------|---------------------------------------| |-----------------|----------------------------------------------|
| anthropic | For using Anthropic's models | | api | For using api models (Anthropic, OpenAI API) |
| deepsparse | For running NM's DeepSparse models | | deepsparse | For running NM's DeepSparse models |
| dev | For linting PRs and contributions | | dev | For linting PRs and contributions |
| gptq | For loading models with GPTQ | | gptq | For loading models with GPTQ |
...@@ -448,7 +459,6 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"` ...@@ -448,7 +459,6 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"`
| mamba | For loading Mamba SSM models | | mamba | For loading Mamba SSM models |
| math | For running math task answer checking | | math | For running math task answer checking |
| multilingual | For multilingual tokenizers | | multilingual | For multilingual tokenizers |
| openai | For using OpenAI's models |
| optimum | For running Intel OpenVINO models | | optimum | For running Intel OpenVINO models |
| promptsource | For using PromptSource prompts | | promptsource | For using PromptSource prompts |
| sentencepiece | For using the sentencepiece tokenizer | | sentencepiece | For using the sentencepiece tokenizer |
...@@ -456,7 +466,7 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"` ...@@ -456,7 +466,7 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"`
| testing | For running library test suite | | testing | For running library test suite |
| vllm | For loading models with vLLM | | vllm | For loading models with vLLM |
| zeno | For visualizing results with Zeno | | zeno | For visualizing results with Zeno |
|---------------|---------------------------------------| | --------------- | --------------------------------------- |
| all | Loads all extras (not recommended) | | all | Loads all extras (not recommended) |
## Cite as ## Cite as
......
...@@ -55,7 +55,7 @@ class LM(abc.ABC): ...@@ -55,7 +55,7 @@ class LM(abc.ABC):
pass pass
@abc.abstractmethod @abc.abstractmethod
def loglikelihood_rolling(self, requests) -> List[Tuple[float]]: def loglikelihood_rolling(self, requests) -> List[float]:
"""Compute full log-likelihood of a string, with no truncation, for perplexity computation """Compute full log-likelihood of a string, with no truncation, for perplexity computation
- We will use the full max context length of the model. - We will use the full max context length of the model.
- For inputs that exceed the max context length, we divide the tokenized string into chunks of up to - For inputs that exceed the max context length, we divide the tokenized string into chunks of up to
...@@ -101,14 +101,13 @@ class LM(abc.ABC): ...@@ -101,14 +101,13 @@ class LM(abc.ABC):
"""Generate greedily until a stopping sequence """Generate greedily until a stopping sequence
:param requests: list[Instance] :param requests: list[Instance]
A list of Instance objects with property `args` which returns a tuple (context, until). A list of Instance objects with property `args` which returns a tuple (context, gen_kwargs).
context: str context: str
Context string Context string
until: [str] gen_kwargs: dict
The string sequences to generate until. These string sequences A dictionary of keyword arguments to pass to the generation function e.g. top_k, until, etc.
may each span across multiple tokens, or may be part of one token.
:return: list[str] :return: list[str]
A list of strings continuation A list of model generated continuations.
continuation: str continuation: str
The generated continuation. The generated continuation.
""" """
...@@ -325,14 +324,19 @@ class TemplateLM(LM): ...@@ -325,14 +324,19 @@ class TemplateLM(LM):
return self.eot_token_id return self.eot_token_id
@abc.abstractmethod @abc.abstractmethod
def tok_encode(self, string: str, **kwargs): def tok_encode(self, string: str, **kwargs) -> List[int]:
"""
Tokenize a string using the model's tokenizer and return a list of token IDs.
"""
pass pass
@abc.abstractmethod @abc.abstractmethod
def _loglikelihood_tokens(self, requests, **kwargs): def _loglikelihood_tokens(self, requests, **kwargs) -> List[Tuple[float, bool]]:
pass pass
def _encode_pair(self, context, continuation): def _encode_pair(
self, context: str, continuation: str
) -> Tuple[List[int], List[int]]:
n_spaces = len(context) - len(context.rstrip()) n_spaces = len(context) - len(context.rstrip())
if n_spaces > 0: if n_spaces > 0:
continuation = context[-n_spaces:] + continuation continuation = context[-n_spaces:] + continuation
...@@ -373,7 +377,7 @@ class TemplateLM(LM): ...@@ -373,7 +377,7 @@ class TemplateLM(LM):
@abc.abstractmethod @abc.abstractmethod
def loglikelihood_rolling( def loglikelihood_rolling(
self, requests, disable_tqdm: bool = False self, requests, disable_tqdm: bool = False
) -> List[Tuple[float, bool]]: ) -> List[float]:
pass pass
@abc.abstractmethod @abc.abstractmethod
......
from . import ( from . import (
anthropic_llms, anthropic_llms,
api_models,
dummy, dummy,
gguf, gguf,
huggingface, huggingface,
......
from typing import Any, List, Tuple import os
from functools import cached_property
from typing import Any, Dict, List, Tuple, Union
from tqdm import tqdm from tqdm import tqdm
from lm_eval import utils from lm_eval import utils
from lm_eval.api.model import LM from lm_eval.api.model import LM
from lm_eval.api.registry import register_model from lm_eval.api.registry import register_model
from lm_eval.models.openai_completions import LocalCompletionsAPI
from lm_eval.models.utils import retry_on_specific_exceptions from lm_eval.models.utils import retry_on_specific_exceptions
...@@ -138,7 +141,7 @@ please install anthropic via `pip install 'lm-eval[anthropic]'` or `pip install ...@@ -138,7 +141,7 @@ please install anthropic via `pip install 'lm-eval[anthropic]'` or `pip install
return messages() return messages()
@register_model("anthropic") @register_model("anthropic-completions")
class AnthropicLM(LM): class AnthropicLM(LM):
REQ_CHUNK_SIZE = 20 # TODO: not used REQ_CHUNK_SIZE = 20 # TODO: not used
...@@ -271,90 +274,89 @@ please install anthropic via `pip install 'lm-eval[anthropic]'` or `pip install ...@@ -271,90 +274,89 @@ please install anthropic via `pip install 'lm-eval[anthropic]'` or `pip install
@register_model("anthropic-chat", "anthropic-chat-completions") @register_model("anthropic-chat", "anthropic-chat-completions")
class AnthropicChatLM(AnthropicLM): class AnthropicChat(LocalCompletionsAPI):
REQ_CHUNK_SIZE = 20 # TODO: not used
def __init__( def __init__(
self, self,
model: str, base_url="https://api.anthropic.com/v1/messages",
batch_size: int = 1, tokenizer_backend=None,
max_tokens: int = 256, **kwargs,
temperature: float = 0, # defaults to 1 ):
**kwargs, # top_p, top_k, etc. super().__init__(
) -> None: base_url=base_url, tokenizer_backend=tokenizer_backend, **kwargs
"""Anthropic API wrapper.
:param model: str
Anthropic model e.g. 'claude-3-opus-20240229', 'claude-3-sonnet-20240229'
:param max_tokens: int
Maximum number of tokens to sample from the model
:param temperature: float
Sampling temperature
:param kwargs: Any
Additional model_args to pass to the API client
"""
super().__init__()
try:
import anthropic
except ModuleNotFoundError:
raise Exception(
"attempted to use 'anthropic' LM type, but package `anthropic` is not installed. \
please install anthropic via `pip install 'lm-eval[anthropic]'` or `pip install -e '.[anthropic]'`",
) )
eval_logger.warning(
self.model = model "Chat completions does not support batching. Defaulting to batch size 1."
# defaults to os.environ.get("ANTHROPIC_API_KEY") )
self.client = anthropic.Anthropic() self._batch_size = 1
self.temperature = temperature self.anthropic_version = "2023-06-01"
self.max_tokens = max_tokens eval_logger.warning(
self.tokenizer = self.client.get_tokenizer() f"Using Anthropic Version: {self.anthropic_version}. Confirm the current version here: https://docs.anthropic.com/en/api/versioning"
self.kwargs = kwargs
@property
def max_gen_toks(self) -> int:
return self.max_tokens
def generate_until(self, requests) -> List[str]:
try:
import anthropic
except ModuleNotFoundError:
raise Exception(
"attempted to use 'anthropic' LM type, but package `anthropic` is not installed. \
please install anthropic via `pip install 'lm-eval[anthropic]'` or `pip install -e '.[anthropic]'`",
) )
if not requests: @cached_property
return [] def api_key(self):
"""Override this property to return the API key for the API request."""
_requests: List[Tuple[str, dict]] = [req.args for req in requests] key = os.environ.get("ANTHROPIC_API_KEY", None)
if key is None:
res = [] raise ValueError(
for request in tqdm(_requests): "API key not found. Please set the ANTHROPIC_API_KEY environment variable."
try:
inp = request[0]
request_args = request[1]
# generation_kwargs
until = request_args.get("until")
max_tokens = request_args.get("max_gen_toks", self.max_length)
temperature = request_args.get("temperature", self.temperature)
response = anthropic_chat(
client=self.client,
model=self.model,
prompt=inp,
max_tokens=max_tokens,
temperature=temperature, # TODO: implement non-greedy sampling for Anthropic
stop=until, # type: ignore
**self.kwargs,
) )
res.append(response) return key
@cached_property
def header(self):
return {
"x-api-key": f"{self.api_key}",
"anthropic-version": self.anthropic_version,
}
def _create_payload(
self, messages: List[Dict], generate=True, gen_kwargs: dict = None, **kwargs
) -> dict:
system = (
messages[0].get("content") if messages[0].get("role") == "system" else None
)
if system:
messages = messages[1:]
gen_kwargs.pop("do_sample", False)
max_tokens = gen_kwargs.pop("max_gen_toks", self._max_gen_toks)
temperature = gen_kwargs.pop("temperature", 0)
stop = gen_kwargs.pop("until", ["\n\nHuman:"])
if not isinstance(stop, list):
stop = [stop]
out = {
"messages": messages,
"model": self.model,
"max_tokens": max_tokens,
"temperature": temperature,
"stop_sequences": stop,
**gen_kwargs,
}
if system:
out["system"] = system
return out
def parse_generations(
self, outputs: Union[Dict, List[Dict]], **kwargs
) -> List[str]:
res = []
if not isinstance(outputs, list):
outputs = [outputs]
for out in outputs:
for choices in out["content"]:
res.append(choices["text"])
return res
self.cache_hook.add_partial("generate_until", request, response) def tok_encode(
except anthropic.APIConnectionError as e: # type: ignore # noqa: F821 self,
eval_logger.critical(f"Server unreachable: {e.__cause__}") string: str,
break left_truncate_len=None,
except anthropic.APIStatusError as e: # type: ignore # noqa: F821 add_special_tokens=None,
eval_logger.critical(f"API error {e.status_code}: {e.message}") **kwargs,
break ) -> List[str]:
return [string]
return res def _loglikelihood_tokens(self, requests, **kwargs):
raise NotImplementedError(
"Anthropic Chat Completions API does not support the return of log"
)
This diff is collapsed.
This diff is collapsed.
...@@ -57,7 +57,7 @@ Homepage = "https://github.com/EleutherAI/lm-evaluation-harness" ...@@ -57,7 +57,7 @@ Homepage = "https://github.com/EleutherAI/lm-evaluation-harness"
Repository = "https://github.com/EleutherAI/lm-evaluation-harness" Repository = "https://github.com/EleutherAI/lm-evaluation-harness"
[project.optional-dependencies] [project.optional-dependencies]
anthropic = ["anthropic"] api = ["requests", "aiohttp", "tenacity", "tqdm", "tiktoken"]
dev = ["pytest", "pytest-cov", "pytest-xdist", "pre-commit", "mypy"] dev = ["pytest", "pytest-cov", "pytest-xdist", "pre-commit", "mypy"]
deepsparse = ["deepsparse-nightly[llm]>=1.8.0.20240404"] deepsparse = ["deepsparse-nightly[llm]>=1.8.0.20240404"]
gptq = ["auto-gptq[triton]>=0.6.0"] gptq = ["auto-gptq[triton]>=0.6.0"]
...@@ -67,7 +67,6 @@ neuronx = ["optimum[neuronx]"] ...@@ -67,7 +67,6 @@ neuronx = ["optimum[neuronx]"]
mamba = ["mamba_ssm", "causal-conv1d==1.0.2"] mamba = ["mamba_ssm", "causal-conv1d==1.0.2"]
math = ["sympy>=1.12", "antlr4-python3-runtime==4.11"] math = ["sympy>=1.12", "antlr4-python3-runtime==4.11"]
multilingual = ["nagisa>=0.2.7", "jieba>=0.42.1", "pycountry"] multilingual = ["nagisa>=0.2.7", "jieba>=0.42.1", "pycountry"]
openai = ["openai==1.3.9", "tiktoken"]
optimum = ["optimum[openvino]"] optimum = ["optimum[openvino]"]
promptsource = ["promptsource>=0.2.3"] promptsource = ["promptsource>=0.2.3"]
sentencepiece = ["sentencepiece>=0.1.98"] sentencepiece = ["sentencepiece>=0.1.98"]
......
from unittest.mock import MagicMock, patch
import pytest
from lm_eval.models.openai_completions import LocalCompletionsAPI
@pytest.fixture
def api():
return LocalCompletionsAPI(
base_url="http://test-url.com", tokenizer_backend=None, model="gpt-3.5-turbo"
)
@pytest.fixture
def api_tokenized():
return LocalCompletionsAPI(
base_url="http://test-url.com",
model="EleutherAI/pythia-1b",
tokenizer_backend="huggingface",
)
def test_create_payload_generate(api):
messages = ["Generate a story"]
gen_kwargs = {
"max_tokens": 100,
"temperature": 0.7,
"until": ["The End"],
"do_sample": True,
}
payload = api._create_payload(messages, generate=True, gen_kwargs=gen_kwargs)
assert payload == {
"prompt": ["Generate a story"],
"model": "gpt-3.5-turbo",
"max_tokens": 100,
"temperature": 0.7,
"stop": ["The End"],
}
def test_create_payload_loglikelihood(api):
messages = ["The capital of France is"]
payload = api._create_payload(messages, generate=False, gen_kwargs=None)
assert payload == {
"model": "gpt-3.5-turbo",
"prompt": ["The capital of France is"],
"max_tokens": 1,
"logprobs": 1,
"echo": True,
}
@pytest.mark.parametrize(
"input_messages, generate, gen_kwargs, expected_payload",
[
(
["Hello, how are"],
True,
{"max_gen_toks": 100, "temperature": 0.7},
{
"prompt": "Hello, how are",
"model": "gpt-3.5-turbo",
"max_tokens": 100,
"temperature": 0.7,
"stop": ["<|endoftext|>"],
},
),
(
["Hello, how are", "you"],
True,
{},
{
"prompt": "Hello, how are",
"model": "gpt-3.5-turbo",
"max_tokens": 256,
"temperature": 0,
"stop": ["<|endoftext|>"],
},
),
],
)
def test_model_generate_call_usage(
api, input_messages, generate, gen_kwargs, expected_payload
):
with patch("requests.post") as mock_post:
mock_response = MagicMock()
mock_response.json.return_value = {"result": "success"}
mock_post.return_value = mock_response
# Act
result = api.model_call(
input_messages, generate=generate, gen_kwargs=gen_kwargs
)
# Assert
mock_post.assert_called_once()
_, kwargs = mock_post.call_args
assert "json" in kwargs
assert kwargs["json"] == expected_payload
assert result == {"result": "success"}
@pytest.mark.parametrize(
"input_messages, generate, gen_kwargs, expected_payload",
[
(
[[1, 2, 3, 4, 5]],
False,
None,
{
"model": "EleutherAI/pythia-1b",
"prompt": [[1, 2, 3, 4, 5]],
"max_tokens": 1,
"logprobs": 1,
"echo": True,
},
),
],
)
def test_model_tokenized_call_usage(
api_tokenized, input_messages, generate, gen_kwargs, expected_payload
):
with patch("requests.post") as mock_post:
mock_response = MagicMock()
mock_response.json.return_value = {"result": "success"}
mock_post.return_value = mock_response
# Act
result = api_tokenized.model_call(
input_messages, generate=generate, gen_kwargs=gen_kwargs
)
# Assert
mock_post.assert_called_once()
_, kwargs = mock_post.call_args
assert "json" in kwargs
assert kwargs["json"] == expected_payload
assert result == {"result": "success"}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment