Unverified Commit b70af4f5 authored by Baber Abbasi's avatar Baber Abbasi Committed by GitHub
Browse files

bugfix and docs for API (#2139)



* encoding bugfix

* encoding bugfix

* overload logliklehood rather than loglikehood_tokens

* add custom tokenizer

* add docs

* Update API_guide.md

fix link; add note

* Update API_guide.md

typo

* pre-commit

* add link in readme

* nit

* nit

* nit

* Update API_guide.md

nits

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update README.md

* Update docs/API_guide.md

* Update docs/API_guide.md

* Update API_guide.md

---------
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
parent 42dc2448
......@@ -6,7 +6,7 @@
*Latest News 📣*
- [2024/07] API model support has been updated and refactored, introducing support for batched and async requests, and making it significantly easier to customize and use for your own purposes. **To run Llama 405B, we recommend using VLLM's OpenAI-compliant API to host the model, and use the `local-completions` model type to evaluate the model.**
- [2024/07] [API model](docs/API_guide.md) support has been updated and refactored, introducing support for batched and async requests, and making it significantly easier to customize and use for your own purposes. **To run Llama 405B, we recommend using VLLM's OpenAI-compliant API to host the model, and use the `local-completions` model type to evaluate the model.**
- [2024/07] New Open LLM Leaderboard tasks have been added ! You can find them under the [leaderboard](lm_eval/tasks/leaderboard/README.md) task group.
---
......
# TemplateAPI Usage Guide
The `TemplateAPI` class is a versatile superclass designed to facilitate the integration of various API-based language models into the lm-evaluation-harness framework. This guide will explain how to use and extend the `TemplateAPI` class to implement your own API models. If your API implements the OpenAI API you can use the `local-completions` or the `local-chat-completions` (defined [here](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/openai_completions.py)) model types, which can also serve as examples of how to effectively subclass this template.
## Overview
The `TemplateAPI` class provides a template for creating API-based model implementations. It handles common functionalities such as:
- Tokenization (optional)
- Batch processing
- Caching
- Retrying failed requests
- Parsing API responses
To use this class, you typically need to subclass it and implement specific methods for your API.
## Key Methods to Implement
When subclassing `TemplateAPI`, you need to implement the following methods:
1. `_create_payload`: Creates the JSON payload for API requests.
2. `parse_logprobs`: Parses log probabilities from API responses.
3. `parse_generations`: Parses generated text from API responses.
4. `headers`: Returns the headers for the API request.
You may also need to override other methods or properties depending on your API's specific requirements.
> [!NOTE]
> Currently loglikelihood and MCQ based tasks (such as MMLU) are only supported for completion endpoints. Not for chat-completion — those that expect a list of dicts — endpoints! Completion APIs which support instruct tuned models can be evaluated with the `--apply_chat_template` option in order to simultaneously evaluate models using a chat template format while still being able to access the model logits needed for loglikelihood-based tasks.
# TemplateAPI Usage Guide
## TemplateAPI Arguments
When initializing a `TemplateAPI` instance or a subclass, you can provide several arguments to customize its behavior. Here's a detailed explanation of some important arguments:
- `model` or `pretrained` (str):
- The name or identifier of the model to use.
- `model` takes precedence over `pretrained` when both are provided.
- `base_url` (str):
- The base URL for the API endpoint.
- `tokenizer` (str, optional):
- The name or path of the tokenizer to use.
- If not provided, it defaults to using the same tokenizer name as the model.
- `num_concurrent` (int):
- Number of concurrent requests to make to the API.
- Useful for APIs that support parallel processing.
- Default is 1 (sequential processing).
- `tokenized_requests` (bool):
- Determines whether the input is pre-tokenized. Defaults to `True`.
- Requests can be sent in either tokenized form (`list[list[int]]`) or as text (`list[str]`, or `str` for batch_size=1).
- For loglikelihood-based tasks, prompts require tokenization to calculate the context length. If `False` prompts are decoded back to text before being sent to the API.
- Not as important for `generate_until` tasks.
- Ignored for chat formatted inputs (list[dict...]) or if tokenizer_backend is None.
- `tokenizer_backend` (str, optional):
- Required for loglikelihood-based or MCQ tasks.
- Specifies the tokenizer library to use. Options are "tiktoken", "huggingface", or None.
- Default is "huggingface".
- `max_length` (int, optional):
- Maximum length of input + output.
- Default is 2048.
- `max_retries` (int, optional):
- Maximum number of retries for failed API requests.
- Default is 3.
- `max_gen_toks` (int, optional):
- Maximum number of tokens to generate in completion tasks.
- Default is 256 or set in task yaml.
- `batch_size` (int or str, optional):
- Number of requests to batch together (if the API supports batching).
- Can be an integer or "auto" (which defaults to 1 for API models).
- Default is 1.
- `seed` (int, optional):
- Random seed for reproducibility.
- Default is 1234.
- `add_bos_token` (bool, optional):
- Whether to add the beginning-of-sequence token to inputs (when tokenizing).
- Default is False.
- `custom_prefix_token_id` (int, optional):
- Custom token ID to use as a prefix for inputs.
- If not provided, uses the model's default BOS or EOS token (if `add_bos_token` is True).
Example usage:
```python
class MyAPIModel(TemplateAPI):
def __init__(self, **kwargs):
super().__init__(
model="my-model",
base_url="https://api.mymodel.com/v1/completions",
tokenizer_backend="huggingface",
num_concurrent=5,
max_retries=5,
batch_size=10,
**kwargs
)
# Implement other required methods...
```
When subclassing `TemplateAPI`, you can override these arguments in your `__init__` method to set default values specific to your API. You can also add additional (potentially user-specified) arguments as needed for your specific implementation.
## Example Implementation: OpenAI API
The `OpenAICompletionsAPI` and `OpenAIChatCompletion` ([here](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/openai_completions.py) classes demonstrate how to implement API models using the `TemplateAPI` class. Here's a breakdown of the key components:
### 1. Subclassing and Initialization
```python
@register_model("openai-completions")
class OpenAICompletionsAPI(LocalCompletionsAPI):
def __init__(
self,
base_url="https://api.openai.com/v1/completions",
tokenizer_backend="tiktoken",
**kwargs,
):
super().__init__(
base_url=base_url, tokenizer_backend=tokenizer_backend, **kwargs
)
```
### 2. Implementing API Key Retrieval
```python
@cached_property
def api_key(self):
key = os.environ.get("OPENAI_API_KEY", None)
if key is None:
raise ValueError(
"API key not found. Please set the OPENAI_API_KEY environment variable."
)
return key
```
### 3. Creating the Payload
```python
def _create_payload(
self,
messages: Union[List[List[int]], List[dict], List[str], str],
generate=False,
gen_kwargs: Optional[dict] = None,
**kwargs,
) -> dict:
if generate:
# ... (implementation for generation)
else:
# ... (implementation for log likelihood)
```
### 4. Parsing API Responses
```python
@staticmethod
def parse_logprobs(
outputs: Union[Dict, List[Dict]],
tokens: List[List[int]] = None,
ctxlens: List[int] = None,
**kwargs,
) -> List[Tuple[float, bool]]:
# ... (implementation)
@staticmethod
def parse_generations(outputs: Union[Dict, List[Dict]], **kwargs) -> List[str]:
# ... (implementation)
```
The requests are initiated in the `model_call` or the `amodel_call` methods.
## Implementing Your Own API Model
To implement your own API model:
1. Subclass `TemplateAPI` or one of its subclasses (e.g., `LocalCompletionsAPI`).
2. Override the `__init__` method if you need to set specific parameters.
3. Implement the `_create_payload` and `header` methods to create the appropriate payload for your API.
4. Implement the `parse_logprobs` and `parse_generations` methods to parse your API's responses.
5. Override the `api_key` property if your API requires authentication.
6. Override any other methods as necessary to match your API's behavior.
## Best Practices
1. Use the `@register_model` decorator to register your model with the framework (and import it in `lm_eval/models/__init__.py`!).
3. Use environment variables for sensitive information like API keys.
4. Properly handle batching and concurrent requests if supported by your API.
......@@ -4,7 +4,8 @@ Welcome to the docs for the LM Evaluation Harness!
## Table of Contents
* To learn about the public interface of the library, as well as how to evaluate via the commandline or as integrated into an external library, see the [Interface](./interface.md)
* To learn about the public interface of the library, as well as how to evaluate via the command line or as integrated into an external library, see the [Interface](./interface.md).
* To learn how to add a new library, API, or model type to the library, as well as a quick explainer on the types of ways to evaluate an LM, see the [Model Guide](./model_guide.md).
* For an extended description of how to extend the library to new model classes served over an API, see the [API Guide](./API_guide.md).
* For a crash course on adding new tasks to the library, see our [New Task Guide](./new_task_guide.md).
* To learn more about pushing the limits of task configuration that the Eval Harness supports, see the [Task Configuration Guide](./task_guide.md).
......@@ -356,7 +356,7 @@ class AnthropicChat(LocalCompletionsAPI):
) -> List[str]:
return [string]
def _loglikelihood_tokens(self, requests, **kwargs):
def loglikelihood(self, requests, **kwargs):
raise NotImplementedError(
"Anthropic Chat Completions API does not support the return of log"
"Anthropic Chat Completions API does not support the return of loglikelihood"
)
......@@ -3,7 +3,6 @@ import asyncio
import copy
import itertools
import json
from collections import namedtuple
from functools import cached_property
from typing import (
Any,
......@@ -13,6 +12,7 @@ from typing import (
Iterable,
List,
Literal,
NamedTuple,
Optional,
Tuple,
Union,
......@@ -38,7 +38,15 @@ from lm_eval.models.utils import Collator, chunks, configure_pad_token
LogLikelihoodInputs = Tuple[Tuple[str, str], List[int], List[int]]
JsonChatStr = namedtuple("JsonChatStr", ["prompt"])
# utility class to keep track of json encoded chats
class JsonChatStr(NamedTuple):
prompt: str
def encode(self, encoding):
return self.prompt.encode(encoding)
eval_logger = utils.eval_logger
......@@ -113,29 +121,37 @@ class TemplateAPI(TemplateLM):
self.tokenizer = None
self.tokenized_requests = False
else:
if self.tokenizer_backend == "huggingface":
if self.tokenizer is None:
if self.tokenizer_backend == "huggingface":
import transformers
self.tokenizer = transformers.AutoTokenizer.from_pretrained(
self.tokenizer if self.tokenizer else self.model
)
# Not used as the API will handle padding but to mirror the behavior of the HFLM
self.tokenizer = configure_pad_token(self.tokenizer)
elif self.tokenizer_backend == "tiktoken":
try:
import tiktoken
self.tokenizer = tiktoken.encoding_for_model(self.model)
except ModuleNotFoundError as e:
raise Exception(
"Attempted to use 'openai' LM type, but the package `tiktoken` is not installed. "
"Please install it via `pip install lm-eval[api]` or `pip install -e .[api]`."
) from e
if "openai" not in self.base_url:
eval_logger.warning(
f"Passed `base_url={self.base_url}` but using (OpenAI) Tiktoken tokenizer backend. "
"Pass `tokenizer_backend=huggingface` and provide the HF tokenizer name if your model does not use Tiktoken."
)
else:
import transformers
assert isinstance(tokenizer, str), "tokenizer must be a string"
self.tokenizer = transformers.AutoTokenizer.from_pretrained(
self.tokenizer if self.tokenizer else self.model
tokenizer,
)
# Not used as the API will handle padding but to mirror the behavior of the HFLM
self.tokenizer = configure_pad_token(self.tokenizer)
elif self.tokenizer_backend == "tiktoken":
try:
import tiktoken
self.tokenizer = tiktoken.encoding_for_model(self.model)
except ModuleNotFoundError as e:
raise Exception(
"Attempted to use 'openai' LM type, but the package `tiktoken` is not installed. "
"Please install it via `pip install lm-eval[api]` or `pip install -e .[api]`."
) from e
if "openai" not in self.base_url:
eval_logger.warning(
f"Passed `base_url={self.base_url}` but using (OpenAI) Tiktoken tokenizer backend. "
"Pass `tokenizer_backend=huggingface` and provide the HF tokenizer name if your model does not use Tiktoken."
)
@abc.abstractmethod
def _create_payload(
......
......@@ -145,7 +145,7 @@ class LocalChatCompletion(LocalCompletionsAPI):
) -> Union[List[str], List[int], Any]:
return string
def _loglikelihood_tokens(self, requests, **kwargs):
def loglikelihood(self, requests, **kwargs):
raise NotImplementedError(
"Loglikelihood is not supported for chat completions. Consider using the completions API instead."
)
......@@ -175,14 +175,14 @@ class OpenAICompletionsAPI(LocalCompletionsAPI):
)
return key
def _loglikelihood_tokens(self, requests, **kwargs):
def loglikelihood(self, requests, **kwargs):
assert (
self.model != "gpt-3.5-turbo"
), "Loglikelihood is not supported for gpt-3.5-turbo"
return super()._loglikelihood_tokens(requests, **kwargs)
return super().loglikelihood(requests, **kwargs)
@register_model("openai-chatcompletions")
@register_model("openai-chat-completions")
class OpenAIChatCompletion(LocalChatCompletion):
def __init__(
self,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment