Unverified Commit 0ba15ced authored by Aymeric Roucher's avatar Aymeric Roucher Committed by GitHub
Browse files

Reboot Agents (#30387)



* Create CodeAgent and ReactAgent

* Fix formatting errors

* Update documentation for agents

* Add custom errors, improve logging

* Support variable usage in ReactAgent

* add messages

* Add message passing format

* Create React Code Agent

* Update

* Refactoring

* Fix errors

* Improve python interpreter

* Only non-tensor inputs should be sent to device

* Calculator tool slight refactor

* Improve docstrings

* Refactor

* Fix tests

* Fix more tests

* Fix even more tests

* Fix tests by replacing output and input types

* Fix operand type issue

* two small fixes

* EM TTS

* Fix agent running type errors

* Change text to speech tests to allow changed outputs

* Update doc with new agent types

* Improve code interpreter

* If max iterations reached, provide a real answer instead of an error

* Add edge case in interpreter

* Add safe imports to the interpreter

* Interpreter tweaks: tuples and listcomp

* Make style

* Make quality

* Add dictcomp to interpreter

* Rename ReactJSONAgent to ReactJsonAgent

* Misc changes

* ToolCollection

* Rename agent's logger to self.logger

* Add while loops to interpreter

* Update doc with new tools. still need to mention collections

* Add collections to the doc

* Small fixes on logs and interpretor

* Fix toolbox return type

* Docs + fixup

* Skip doctests

* Correct prompts with improved examples and formatting

* Update prompt

* Remove outdated docs

* Change agent to accept Toolbox object for tools

* Remove calculator tool

* Propagate removal of calculator in doc

* Fix 2 failing workflows

* Simplify additional argument passing

* AgentType audio

* Minor changes: function name, types

* Remove calculator tests

* Fix test

* Fix torch requirement

* Fix final answer tests

* Style fixes

* Fix tests

* Update docstrings with calculator removal

* Small type hint fixes

* Update tests/agents/test_translation.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/agents/test_python_interpreter.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/default_tools.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/tools.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/agents/test_agents.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bert/configuration_bert.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/tools.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/speech_to_text.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/agents/test_speech_to_text.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/agents/test_tools_common.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* pygments

* Answer comments

* Cleaning up

* Simplifying init for all agents

* Improving prompts and making code nicer

* Style fixes

* Add multiple comparator test in interpreter

* Style fixes

* Improve BERT example in documentation

* Add examples to doc

* Fix python interpreter quality

* Logging improvements

* Change test flag to agents

* Quality fix

* Add example for HfEngine

* Improve conversation example for HfEngine

* typo fix

* Verify doc

* Update docs/source/en/agents.md
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/agents.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/prompts.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/agents/python_interpreter.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/agents.md
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix style issues

* local s2t tool

---------
Co-authored-by: default avatarCyril Kondratenko <kkn1993@gmail.com>
Co-authored-by: default avatarLysandre <lysandre@huggingface.co>
Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
parent 3733391c
...@@ -71,7 +71,7 @@ NOT_DEVICE_TESTS = { ...@@ -71,7 +71,7 @@ NOT_DEVICE_TESTS = {
"ModelTester::test_pipeline_", "ModelTester::test_pipeline_",
"/repo_utils/", "/repo_utils/",
"/utils/", "/utils/",
"/tools/", "/agents/",
} }
# allow having multiple repository checkouts and not needing to remember to rerun # allow having multiple repository checkouts and not needing to remember to rerun
...@@ -94,7 +94,7 @@ def pytest_configure(config): ...@@ -94,7 +94,7 @@ def pytest_configure(config):
config.addinivalue_line("markers", "is_pipeline_test: mark test to run only when pipelines are tested") config.addinivalue_line("markers", "is_pipeline_test: mark test to run only when pipelines are tested")
config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment") config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment")
config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate") config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate")
config.addinivalue_line("markers", "tool_tests: mark the tool tests that are run on their specific schedule") config.addinivalue_line("markers", "agent_tests: mark the agent tests that are run on their specific schedule")
config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu") config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu")
......
...@@ -23,7 +23,7 @@ ...@@ -23,7 +23,7 @@
title: Load and train adapters with 🤗 PEFT title: Load and train adapters with 🤗 PEFT
- local: model_sharing - local: model_sharing
title: Share your model title: Share your model
- local: transformers_agents - local: agents
title: Agents title: Agents
- local: llm_tutorial - local: llm_tutorial
title: Generation with LLMs title: Generation with LLMs
...@@ -133,8 +133,6 @@ ...@@ -133,8 +133,6 @@
title: Notebooks with examples title: Notebooks with examples
- local: community - local: community
title: Community resources title: Community resources
- local: custom_tools
title: Custom Tools and Prompts
- local: troubleshooting - local: troubleshooting
title: Troubleshoot title: Troubleshoot
- local: hf_quantizer - local: hf_quantizer
......
This diff is collapsed.
This diff is collapsed.
...@@ -28,30 +28,27 @@ contains the API docs for the underlying classes. ...@@ -28,30 +28,27 @@ contains the API docs for the underlying classes.
## Agents ## Agents
We provide three types of agents: [`HfAgent`] uses inference endpoints for opensource models, [`LocalAgent`] uses a model of your choice locally and [`OpenAiAgent`] uses OpenAI closed models. We provide two types of agents, based on the main [`Agent`] class:
- [`CodeAgent`] acts in one shot, generating code to solve the task, then executes it at once.
- [`ReactAgent`] acts step by step, each step consisting of one thought, then one tool call and execution. It has two classes:
- [`ReactJsonAgent`] writes its tool calls in JSON.
- [`ReactCodeAgent`] writes its tool calls in Python code.
### HfAgent ### Agent
[[autodoc]] HfAgent
### LocalAgent
[[autodoc]] LocalAgent [[autodoc]] Agent
### OpenAiAgent ### CodeAgent
[[autodoc]] OpenAiAgent [[autodoc]] CodeAgent
### AzureOpenAiAgent ### React agents
[[autodoc]] AzureOpenAiAgent [[autodoc]] ReactAgent
### Agent [[autodoc]] ReactJsonAgent
[[autodoc]] Agent [[autodoc]] ReactCodeAgent
- chat
- run
- prepare_for_new_chat
## Tools ## Tools
...@@ -63,18 +60,50 @@ We provide three types of agents: [`HfAgent`] uses inference endpoints for opens ...@@ -63,18 +60,50 @@ We provide three types of agents: [`HfAgent`] uses inference endpoints for opens
[[autodoc]] Tool [[autodoc]] Tool
### PipelineTool ### Toolbox
[[autodoc]] PipelineTool [[autodoc]] Toolbox
### RemoteTool ### PipelineTool
[[autodoc]] RemoteTool [[autodoc]] PipelineTool
### launch_gradio_demo ### launch_gradio_demo
[[autodoc]] launch_gradio_demo [[autodoc]] launch_gradio_demo
### ToolCollection
[[autodoc]] ToolCollection
## Engines
You're free to create and use your own engines to be usable by the Agents framework.
These engines have the following specification:
1. Follow the [messages format](../chat_templating.md) for its input (`List[Dict[str, str]]`) and return a string.
2. Stop generating outputs *before* the sequences passed in the argument `stop_sequences`
### HfEngine
For convenience, we have added a `HfEngine` that implements the points above and uses an inference endpoint for the execution of the LLM.
```python
>>> from transformers import HfEngine
>>> messages = [
... {"role": "user", "content": "Hello, how are you?"},
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
... {"role": "user", "content": "No need to help, take it easy."},
... ]
>>> HfEngine()(messages, stop_sequences=["conversation"])
"That's very kind of you to say! It's always nice to have a relaxed "
```
[[autodoc]] HfEngine
## Agent Types ## Agent Types
Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return
...@@ -94,12 +123,12 @@ These types have three specific purposes: ...@@ -94,12 +123,12 @@ These types have three specific purposes:
### AgentText ### AgentText
[[autodoc]] transformers.tools.agent_types.AgentText [[autodoc]] transformers.agents.agent_types.AgentText
### AgentImage ### AgentImage
[[autodoc]] transformers.tools.agent_types.AgentImage [[autodoc]] transformers.agents.agent_types.AgentImage
### AgentAudio ### AgentAudio
[[autodoc]] transformers.tools.agent_types.AgentAudio [[autodoc]] transformers.agents.agent_types.AgentAudio
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Transformers Agents
<Tip warning={true}>
Transformers Agents is an experimental API which is subject to change at any time. Results returned by the agents
can vary as the APIs or underlying models are prone to change.
</Tip>
Transformers version v4.29.0, building on the concept of *tools* and *agents*. You can play with in
[this colab](https://colab.research.google.com/drive/1c7MHD-T1forUPGcC_jlwsIptOzpG3hSj).
In short, it provides a natural language API on top of transformers: we define a set of curated tools and design an
agent to interpret natural language and to use these tools. It is extensible by design; we curated some relevant tools,
but we'll show you how the system can be extended easily to use any tool developed by the community.
Let's start with a few examples of what can be achieved with this new API. It is particularly powerful when it comes
to multimodal tasks, so let's take it for a spin to generate images and read text out loud.
```py
agent.run("Caption the following image", image=image)
```
| **Input** | **Output** |
|-----------------------------------------------------------------------------------------------------------------------------|-----------------------------------|
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/beaver.png" width=200> | A beaver is swimming in the water |
---
```py
agent.run("Read the following text out loud", text=text)
```
| **Input** | **Output** |
|-------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
| A beaver is swimming in the water | <audio controls><source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tts_example.wav" type="audio/wav"> your browser does not support the audio element. </audio>
---
```py
agent.run(
"In the following `document`, where will the TRRF Scientific Advisory Council Meeting take place?",
document=document,
)
```
| **Input** | **Output** |
|-----------------------------------------------------------------------------------------------------------------------------|----------------|
| <img src="https://datasets-server.huggingface.co/assets/hf-internal-testing/example-documents/--/hf-internal-testing--example-documents/test/0/image/image.jpg" width=200> | ballroom foyer |
## Quickstart
Before being able to use `agent.run`, you will need to instantiate an agent, which is a large language model (LLM).
We provide support for openAI models as well as opensource alternatives from BigCode and OpenAssistant. The openAI
models perform better (but require you to have an openAI API key, so cannot be used for free); Hugging Face is
providing free access to endpoints for BigCode and OpenAssistant models.
To start with, please install the `agents` extras in order to install all default dependencies.
```bash
pip install transformers[agents]
```
To use openAI models, you instantiate an [`OpenAiAgent`] after installing the `openai` dependency:
```bash
pip install openai
```
```py
from transformers import OpenAiAgent
agent = OpenAiAgent(model="text-davinci-003", api_key="<your_api_key>")
```
To use BigCode or OpenAssistant, start by logging in to have access to the Inference API:
```py
from huggingface_hub import login
login("<YOUR_TOKEN>")
```
Then, instantiate the agent
```py
from transformers import HfAgent
# Starcoder
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
# StarcoderBase
# agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoderbase")
# OpenAssistant
# agent = HfAgent(url_endpoint="https://api-inference.huggingface.co/models/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5")
```
This is using the inference API that Hugging Face provides for free at the moment. If you have your own inference
endpoint for this model (or another one) you can replace the URL above with your URL endpoint.
<Tip>
StarCoder and OpenAssistant are free to use and perform admirably well on simple tasks. However, the checkpoints
don't hold up when handling more complex prompts. If you're facing such an issue, we recommend trying out the OpenAI
model which, while sadly not open-source, performs better at this given time.
</Tip>
You're now good to go! Let's dive into the two APIs that you now have at your disposal.
### Single execution (run)
The single execution method is when using the [`~Agent.run`] method of the agent:
```py
agent.run("Draw me a picture of rivers and lakes.")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
It automatically selects the tool (or tools) appropriate for the task you want to perform and runs them appropriately. It
can perform one or several tasks in the same instruction (though the more complex your instruction, the more likely
the agent is to fail).
```py
agent.run("Draw me a picture of the sea then transform the picture to add an island")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/sea_and_island.png" width=200>
<br/>
Every [`~Agent.run`] operation is independent, so you can run it several times in a row with different tasks.
Note that your `agent` is just a large-language model, so small variations in your prompt might yield completely
different results. It's important to explain as clearly as possible the task you want to perform. We go more in-depth
on how to write good prompts [here](custom_tools#writing-good-user-inputs).
If you'd like to keep a state across executions or to pass non-text objects to the agent, you can do so by specifying
variables that you would like the agent to use. For example, you could generate the first image of rivers and lakes,
and ask the model to update that picture to add an island by doing the following:
```python
picture = agent.run("Generate a picture of rivers and lakes.")
updated_picture = agent.run("Transform the image in `picture` to add an island to it.", picture=picture)
```
<Tip>
This can be helpful when the model is unable to understand your request and mixes tools. An example would be:
```py
agent.run("Draw me the picture of a capybara swimming in the sea")
```
Here, the model could interpret in two ways:
- Have the `text-to-image` generate a capybara swimming in the sea
- Or, have the `text-to-image` generate capybara, then use the `image-transformation` tool to have it swim in the sea
In case you would like to force the first scenario, you could do so by passing it the prompt as an argument:
```py
agent.run("Draw me a picture of the `prompt`", prompt="a capybara swimming in the sea")
```
</Tip>
### Chat-based execution (chat)
The agent also has a chat-based approach, using the [`~Agent.chat`] method:
```py
agent.chat("Generate a picture of rivers and lakes")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
```py
agent.chat("Transform the picture so that there is a rock in there")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes_and_beaver.png" width=200>
<br/>
This is an interesting approach when you want to keep the state across instructions. It's better for experimentation,
but will tend to be much better at single instructions rather than complex instructions (which the [`~Agent.run`]
method is better at handling).
This method can also take arguments if you would like to pass non-text types or specific prompts.
### ⚠️ Remote execution
For demonstration purposes and so that it could be used with all setups, we had created remote executors for several
of the default tools the agent has access for the release. These are created using
[inference endpoints](https://huggingface.co/inference-endpoints).
We have turned these off for now, but in order to see how to set up remote executors tools yourself,
we recommend reading the [custom tool guide](./custom_tools).
### What's happening here? What are tools, and what are agents?
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/diagram.png">
#### Agents
The "agent" here is a large language model, and we're prompting it so that it has access to a specific set of tools.
LLMs are pretty good at generating small samples of code, so this API takes advantage of that by prompting the
LLM gives a small sample of code performing a task with a set of tools. This prompt is then completed by the
task you give your agent and the description of the tools you give it. This way it gets access to the doc of the
tools you are using, especially their expected inputs and outputs, and can generate the relevant code.
#### Tools
Tools are very simple: they're a single function, with a name, and a description. We then use these tools' descriptions
to prompt the agent. Through the prompt, we show the agent how it would leverage tools to perform what was
requested in the query.
This is using brand-new tools and not pipelines, because the agent writes better code with very atomic tools.
Pipelines are more refactored and often combine several tasks in one. Tools are meant to be focused on
one very simple task only.
#### Code-execution?!
This code is then executed with our small Python interpreter on the set of inputs passed along with your tools.
We hear you screaming "Arbitrary code execution!" in the back, but let us explain why that is not the case.
The only functions that can be called are the tools you provided and the print function, so you're already
limited in what can be executed. You should be safe if it's limited to Hugging Face tools.
Then, we don't allow any attribute lookup or imports (which shouldn't be needed anyway for passing along
inputs/outputs to a small set of functions) so all the most obvious attacks (and you'd need to prompt the LLM
to output them anyway) shouldn't be an issue. If you want to be on the super safe side, you can execute the
run() method with the additional argument return_code=True, in which case the agent will just return the code
to execute and you can decide whether to do it or not.
The execution will stop at any line trying to perform an illegal operation or if there is a regular Python error
with the code generated by the agent.
### A curated set of tools
We identify a set of tools that can empower such agents. Here is an updated list of the tools we have integrated
in `transformers`:
- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](./model_doc/donut))
- **Text question answering**: given a long text and a question, answer the question in the text ([Flan-T5](./model_doc/flan-t5))
- **Unconditional image captioning**: Caption the image! ([BLIP](./model_doc/blip))
- **Image question answering**: given an image, answer a question on this image ([VILT](./model_doc/vilt))
- **Image segmentation**: given an image and a prompt, output the segmentation mask of that prompt ([CLIPSeg](./model_doc/clipseg))
- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](./model_doc/whisper))
- **Text to speech**: convert text to speech ([SpeechT5](./model_doc/speecht5))
- **Zero-shot text classification**: given a text and a list of labels, identify to which label the text corresponds the most ([BART](./model_doc/bart))
- **Text summarization**: summarize a long text in one or a few sentences ([BART](./model_doc/bart))
- **Translation**: translate the text into a given language ([NLLB](./model_doc/nllb))
These tools have an integration in transformers, and can be used manually as well, for example:
```py
from transformers import load_tool
tool = load_tool("text-to-speech")
audio = tool("This is a text to speech tool")
```
### Custom tools
While we identify a curated set of tools, we strongly believe that the main value provided by this implementation is
the ability to quickly create and share custom tools.
By pushing the code of a tool to a Hugging Face Space or a model repository, you're then able to leverage the tool
directly with the agent. We've added a few
**transformers-agnostic** tools to the [`huggingface-tools` organization](https://huggingface.co/huggingface-tools):
- **Text downloader**: to download a text from a web URL
- **Text to image**: generate an image according to a prompt, leveraging stable diffusion
- **Image transformation**: modify an image given an initial image and a prompt, leveraging instruct pix2pix stable diffusion
- **Text to video**: generate a small video according to a prompt, leveraging damo-vilab
The text-to-image tool we have been using since the beginning is a remote tool that lives in
[*huggingface-tools/text-to-image*](https://huggingface.co/spaces/huggingface-tools/text-to-image)! We will
continue releasing such tools on this and other organizations, to further supercharge this implementation.
The agents have by default access to tools that reside on [`huggingface-tools`](https://huggingface.co/huggingface-tools).
We explain how to you can write and share your tools as well as leverage any custom tool that resides on the Hub in [following guide](custom_tools).
### Code generation
So far we have shown how to use the agents to perform actions for you. However, the agent is only generating code
that we then execute using a very restricted Python interpreter. In case you would like to use the code generated in
a different setting, the agent can be prompted to return the code, along with tool definition and accurate imports.
For example, the following instruction
```python
agent.run("Draw me a picture of rivers and lakes", return_code=True)
```
returns the following code
```python
from transformers import load_tool
image_generator = load_tool("huggingface-tools/text-to-image")
image = image_generator(prompt="rivers and lakes")
```
that you can then modify and execute yourself.
This diff is collapsed.
...@@ -18,88 +18,9 @@ rendered properly in your Markdown viewer. ...@@ -18,88 +18,9 @@ rendered properly in your Markdown viewer.
<Tip warning={true}> <Tip warning={true}>
Transformers Agents は実験的な API であり、いつでも変更される可能性があります。エージェントから返される結果 The Agents framework has significantly changed in version v4.41.0.
API または基礎となるモデルは変更される傾向があるため、変更される可能性があります。 This document has been removed as it was referencing an older API.
</Tip> We eagerly welcome new contributions for the updated API.
エージェントとツールの詳細については、[入門ガイド](../transformers_agents) を必ずお読みください。このページ </Tip>
基礎となるクラスの API ドキュメントが含まれています。 \ No newline at end of file
## エージェント
私たちは 3 種類のエージェントを提供します。[`HfAgent`] はオープンソース モデルの推論エンドポイントを使用し、[`LocalAgent`] は選択したモデルをローカルで使用し、[`OpenAiAgent`] は OpenAI クローズド モデルを使用します。
### HfAgent
[[autodoc]] HfAgent
### LocalAgent
[[autodoc]] LocalAgent
### OpenAiAgent
[[autodoc]] OpenAiAgent
### AzureOpenAiAgent
[[autodoc]] AzureOpenAiAgent
### Agent
[[autodoc]] Agent
- chat
- run
- prepare_for_new_chat
## Tools
### load_tool
[[autodoc]] load_tool
### Tool
[[autodoc]] Tool
### PipelineTool
[[autodoc]] PipelineTool
### RemoteTool
[[autodoc]] RemoteTool
### launch_gradio_demo
[[autodoc]] launch_gradio_demo
## エージェントの種類
エージェントはツール間であらゆる種類のオブジェクトを処理できます。ツールは完全にマルチモーダルであるため、受け取りと返品が可能です
テキスト、画像、オーディオ、ビデオなどのタイプ。ツール間の互換性を高めるためだけでなく、
これらの戻り値を ipython (jupyter、colab、ipython ノートブックなど) で正しくレンダリングするには、ラッパー クラスを実装します。
このタイプの周り。
ラップされたオブジェクトは最初と同じように動作し続けるはずです。テキストオブジェクトは依然として文字列または画像として動作する必要があります
オブジェクトは依然として `PIL.Image` として動作するはずです。
これらのタイプには、次の 3 つの特定の目的があります。
- 型に対して `to_raw` を呼び出すと、基になるオブジェクトが返されるはずです
- 型に対して `to_string` を呼び出すと、オブジェクトを文字列として返す必要があります。`AgentText` の場合は文字列になる可能性があります。
ただし、他のインスタンスのオブジェクトのシリアル化されたバージョンのパスになります。
- ipython カーネルで表示すると、オブジェクトが正しく表示されるはずです
### AgentText
[[autodoc]] transformers.tools.agent_types.AgentText
### AgentImage
[[autodoc]] transformers.tools.agent_types.AgentImage
### AgentAudio
[[autodoc]] transformers.tools.agent_types.AgentAudio
This diff is collapsed.
...@@ -18,84 +18,9 @@ rendered properly in your Markdown viewer. ...@@ -18,84 +18,9 @@ rendered properly in your Markdown viewer.
<Tip warning={true}> <Tip warning={true}>
Transformers Agents是一个实验性的API,它随时可能发生变化。由于API或底层模型容易发生变化,因此由agents返回的结果可能会有所不同。 The Agents framework has significantly changed in version v4.41.0.
This document has been removed as it was referencing an older API.
We eagerly welcome new contributions for the updated API.
</Tip> </Tip>
要了解更多关于agents和工具的信息,请确保阅读[介绍指南](../transformers_agents)。此页面包含底层类的API文档。
## Agents
我们提供三种类型的agents:[`HfAgent`]使用开源模型的推理端点,[`LocalAgent`]使用您在本地选择的模型,[`OpenAiAgent`]使用OpenAI封闭模型。
### HfAgent
[[autodoc]] HfAgent
### LocalAgent
[[autodoc]] LocalAgent
### OpenAiAgent
[[autodoc]] OpenAiAgent
### AzureOpenAiAgent
[[autodoc]] AzureOpenAiAgent
### Agent
[[autodoc]] Agent
- chat
- run
- prepare_for_new_chat
## 工具
### load_tool
[[autodoc]] load_tool
### Tool
[[autodoc]] Tool
### PipelineTool
[[autodoc]] PipelineTool
### RemoteTool
[[autodoc]] RemoteTool
### launch_gradio_demo
[[autodoc]] launch_gradio_demo
## Agent类型
Agents可以处理工具之间任何类型的对象;工具是多模态的,可以接受和返回文本、图像、音频、视频等类型。为了增加工具之间的兼容性,以及正确地在ipython(jupyter、colab、ipython notebooks等)中呈现这些返回值,我们实现了这些类型的包装类。
被包装的对象应该继续按照最初的行为方式运作;文本对象应该仍然像字符串一样运作,图像对象应该仍然像`PIL.Image`一样运作。
这些类型有三个特定目的:
- 对类型调用 `to_raw` 应该返回底层对象
- 对类型调用 `to_string` 应该将对象作为字符串返回:在`AgentText`的情况下可能是字符串,但在其他情况下可能是对象序列化版本的路径
- 在ipython内核中显示它应该正确显示对象
### AgentText
[[autodoc]] transformers.tools.agent_types.AgentText
### AgentImage
[[autodoc]] transformers.tools.agent_types.AgentImage
### AgentAudio
[[autodoc]] transformers.tools.agent_types.AgentAudio
...@@ -54,6 +54,20 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name ...@@ -54,6 +54,20 @@ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
# Base objects, independent of any specific backend # Base objects, independent of any specific backend
_import_structure = { _import_structure = {
"agents": [
"Agent",
"CodeAgent",
"HfEngine",
"PipelineTool",
"ReactAgent",
"ReactCodeAgent",
"ReactJsonAgent",
"Tool",
"Toolbox",
"ToolCollection",
"launch_gradio_demo",
"load_tool",
],
"audio_utils": [], "audio_utils": [],
"benchmark": [], "benchmark": [],
"commands": [], "commands": [],
...@@ -129,8 +143,8 @@ _import_structure = { ...@@ -129,8 +143,8 @@ _import_structure = {
"load_tf2_model_in_pytorch_model", "load_tf2_model_in_pytorch_model",
"load_tf2_weights_in_pytorch_model", "load_tf2_weights_in_pytorch_model",
], ],
"models": [],
# Models # Models
"models": [],
"models.albert": ["ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "AlbertConfig"], "models.albert": ["ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "AlbertConfig"],
"models.align": [ "models.align": [
"ALIGN_PRETRAINED_CONFIG_ARCHIVE_MAP", "ALIGN_PRETRAINED_CONFIG_ARCHIVE_MAP",
...@@ -1050,18 +1064,6 @@ _import_structure = { ...@@ -1050,18 +1064,6 @@ _import_structure = {
"SpecialTokensMixin", "SpecialTokensMixin",
"TokenSpan", "TokenSpan",
], ],
"tools": [
"Agent",
"AzureOpenAiAgent",
"HfAgent",
"LocalAgent",
"OpenAiAgent",
"PipelineTool",
"RemoteTool",
"Tool",
"launch_gradio_demo",
"load_tool",
],
"trainer_callback": [ "trainer_callback": [
"DefaultFlowCallback", "DefaultFlowCallback",
"EarlyStoppingCallback", "EarlyStoppingCallback",
...@@ -5039,6 +5041,21 @@ else: ...@@ -5039,6 +5041,21 @@ else:
# Direct imports for type-checking # Direct imports for type-checking
if TYPE_CHECKING: if TYPE_CHECKING:
# Configuration # Configuration
# Agents
from .agents import (
Agent,
CodeAgent,
HfEngine,
PipelineTool,
ReactAgent,
ReactCodeAgent,
ReactJsonAgent,
Tool,
Toolbox,
ToolCollection,
launch_gradio_demo,
load_tool,
)
from .configuration_utils import PretrainedConfig from .configuration_utils import PretrainedConfig
# Data # Data
...@@ -6010,20 +6027,6 @@ if TYPE_CHECKING: ...@@ -6010,20 +6027,6 @@ if TYPE_CHECKING:
TokenSpan, TokenSpan,
) )
# Tools
from .tools import (
Agent,
AzureOpenAiAgent,
HfAgent,
LocalAgent,
OpenAiAgent,
PipelineTool,
RemoteTool,
Tool,
launch_gradio_demo,
load_tool,
)
# Trainer # Trainer
from .trainer_callback import ( from .trainer_callback import (
DefaultFlowCallback, DefaultFlowCallback,
......
...@@ -24,8 +24,9 @@ from ..utils import ( ...@@ -24,8 +24,9 @@ from ..utils import (
_import_structure = { _import_structure = {
"agents": ["Agent", "AzureOpenAiAgent", "HfAgent", "LocalAgent", "OpenAiAgent"], "agents": ["Agent", "CodeAgent", "ReactAgent", "ReactCodeAgent", "ReactJsonAgent", "Toolbox"],
"base": ["PipelineTool", "RemoteTool", "Tool", "launch_gradio_demo", "load_tool"], "llm_engine": ["HfEngine"],
"tools": ["PipelineTool", "Tool", "ToolCollection", "launch_gradio_demo", "load_tool"],
} }
try: try:
...@@ -34,20 +35,17 @@ try: ...@@ -34,20 +35,17 @@ try:
except OptionalDependencyNotAvailable: except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["default_tools"] = ["FinalAnswerTool", "PythonInterpreterTool"]
_import_structure["document_question_answering"] = ["DocumentQuestionAnsweringTool"] _import_structure["document_question_answering"] = ["DocumentQuestionAnsweringTool"]
_import_structure["image_captioning"] = ["ImageCaptioningTool"]
_import_structure["image_question_answering"] = ["ImageQuestionAnsweringTool"] _import_structure["image_question_answering"] = ["ImageQuestionAnsweringTool"]
_import_structure["image_segmentation"] = ["ImageSegmentationTool"]
_import_structure["speech_to_text"] = ["SpeechToTextTool"] _import_structure["speech_to_text"] = ["SpeechToTextTool"]
_import_structure["text_classification"] = ["TextClassificationTool"]
_import_structure["text_question_answering"] = ["TextQuestionAnsweringTool"]
_import_structure["text_summarization"] = ["TextSummarizationTool"]
_import_structure["text_to_speech"] = ["TextToSpeechTool"] _import_structure["text_to_speech"] = ["TextToSpeechTool"]
_import_structure["translation"] = ["TranslationTool"] _import_structure["translation"] = ["TranslationTool"]
if TYPE_CHECKING: if TYPE_CHECKING:
from .agents import Agent, AzureOpenAiAgent, HfAgent, LocalAgent, OpenAiAgent from .agents import Agent, CodeAgent, ReactAgent, ReactCodeAgent, ReactJsonAgent, Toolbox
from .base import PipelineTool, RemoteTool, Tool, launch_gradio_demo, load_tool from .llm_engine import HfEngine
from .tools import PipelineTool, Tool, ToolCollection, launch_gradio_demo, load_tool
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -55,14 +53,10 @@ if TYPE_CHECKING: ...@@ -55,14 +53,10 @@ if TYPE_CHECKING:
except OptionalDependencyNotAvailable: except OptionalDependencyNotAvailable:
pass pass
else: else:
from .default_tools import FinalAnswerTool, PythonInterpreterTool
from .document_question_answering import DocumentQuestionAnsweringTool from .document_question_answering import DocumentQuestionAnsweringTool
from .image_captioning import ImageCaptioningTool
from .image_question_answering import ImageQuestionAnsweringTool from .image_question_answering import ImageQuestionAnsweringTool
from .image_segmentation import ImageSegmentationTool
from .speech_to_text import SpeechToTextTool from .speech_to_text import SpeechToTextTool
from .text_classification import TextClassificationTool
from .text_question_answering import TextQuestionAnsweringTool
from .text_summarization import TextSummarizationTool
from .text_to_speech import TextToSpeechTool from .text_to_speech import TextToSpeechTool
from .translation import TranslationTool from .translation import TranslationTool
else: else:
......
# coding=utf-8 # coding=utf-8
# Copyright 2023 HuggingFace Inc. # Copyright 2024 HuggingFace Inc.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -25,7 +25,6 @@ from ..utils import is_soundfile_availble, is_torch_available, is_vision_availab ...@@ -25,7 +25,6 @@ from ..utils import is_soundfile_availble, is_torch_available, is_vision_availab
logger = logging.get_logger(__name__) logger = logging.get_logger(__name__)
if is_vision_available(): if is_vision_available():
import PIL.Image
from PIL import Image from PIL import Image
from PIL.Image import Image as ImageType from PIL.Image import Image as ImageType
else: else:
...@@ -33,6 +32,9 @@ else: ...@@ -33,6 +32,9 @@ else:
if is_torch_available(): if is_torch_available():
import torch import torch
from torch import Tensor
else:
Tensor = object
if is_soundfile_availble(): if is_soundfile_availble():
import soundfile as sf import soundfile as sf
...@@ -77,7 +79,7 @@ class AgentText(AgentType, str): ...@@ -77,7 +79,7 @@ class AgentText(AgentType, str):
return self._value return self._value
def to_string(self): def to_string(self):
return self._value return str(self._value)
class AgentImage(AgentType, ImageType): class AgentImage(AgentType, ImageType):
...@@ -211,10 +213,7 @@ class AgentAudio(AgentType): ...@@ -211,10 +213,7 @@ class AgentAudio(AgentType):
AGENT_TYPE_MAPPING = {"text": AgentText, "image": AgentImage, "audio": AgentAudio} AGENT_TYPE_MAPPING = {"text": AgentText, "image": AgentImage, "audio": AgentAudio}
INSTANCE_TYPE_MAPPING = {str: AgentText} INSTANCE_TYPE_MAPPING = {str: AgentText, float: AgentText, int: AgentText, Tensor: AgentAudio, ImageType: AgentImage}
if is_vision_available():
INSTANCE_TYPE_MAPPING[PIL.Image] = AgentImage
def handle_agent_inputs(*args, **kwargs): def handle_agent_inputs(*args, **kwargs):
...@@ -223,55 +222,14 @@ def handle_agent_inputs(*args, **kwargs): ...@@ -223,55 +222,14 @@ def handle_agent_inputs(*args, **kwargs):
return args, kwargs return args, kwargs
def handle_agent_outputs(outputs, output_types=None): def handle_agent_outputs(output, output_type=None):
if isinstance(outputs, dict): if output_type in AGENT_TYPE_MAPPING:
decoded_outputs = {} # If the class has defined outputs, we can map directly according to the class definition
for i, (k, v) in enumerate(outputs.items()): decoded_outputs = AGENT_TYPE_MAPPING[output_type](output)
if output_types is not None: return decoded_outputs
# If the class has defined outputs, we can map directly according to the class definition
if output_types[i] in AGENT_TYPE_MAPPING:
decoded_outputs[k] = AGENT_TYPE_MAPPING[output_types[i]](v)
else:
decoded_outputs[k] = AgentType(v)
else:
# If the class does not have defined output, then we map according to the type
for _k, _v in INSTANCE_TYPE_MAPPING.items():
if isinstance(v, _k):
decoded_outputs[k] = _v(v)
if k not in decoded_outputs:
decoded_outputs[k] = AgentType[v]
elif isinstance(outputs, (list, tuple)):
decoded_outputs = type(outputs)()
for i, v in enumerate(outputs):
if output_types is not None:
# If the class has defined outputs, we can map directly according to the class definition
if output_types[i] in AGENT_TYPE_MAPPING:
decoded_outputs.append(AGENT_TYPE_MAPPING[output_types[i]](v))
else:
decoded_outputs.append(AgentType(v))
else:
# If the class does not have defined output, then we map according to the type
found = False
for _k, _v in INSTANCE_TYPE_MAPPING.items():
if isinstance(v, _k):
decoded_outputs.append(_v(v))
found = True
if not found:
decoded_outputs.append(AgentType(v))
else: else:
if output_types[0] in AGENT_TYPE_MAPPING: # If the class does not have defined output, then we map according to the type
# If the class has defined outputs, we can map directly according to the class definition for _k, _v in INSTANCE_TYPE_MAPPING.items():
decoded_outputs = AGENT_TYPE_MAPPING[output_types[0]](outputs) if isinstance(output, _k):
return _v(output)
else: return AgentType(output)
# If the class does not have defined output, then we map according to the type
for _k, _v in INSTANCE_TYPE_MAPPING.items():
if isinstance(outputs, _k):
return _v(outputs)
return AgentType(outputs)
return decoded_outputs
This diff is collapsed.
#!/usr/bin/env python
# coding=utf-8
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import importlib.util
import json
import math
from dataclasses import dataclass
from math import sqrt
from typing import Dict
from huggingface_hub import hf_hub_download, list_spaces
from ..utils import is_offline_mode
from .python_interpreter import LIST_SAFE_MODULES, evaluate_python_code
from .tools import TASK_MAPPING, TOOL_CONFIG_FILE, Tool
def custom_print(*args):
return " ".join(map(str, args))
BASE_PYTHON_TOOLS = {
"print": custom_print,
"range": range,
"float": float,
"int": int,
"bool": bool,
"str": str,
"round": round,
"ceil": math.ceil,
"floor": math.floor,
"log": math.log,
"exp": math.exp,
"sin": math.sin,
"cos": math.cos,
"tan": math.tan,
"asin": math.asin,
"acos": math.acos,
"atan": math.atan,
"atan2": math.atan2,
"degrees": math.degrees,
"radians": math.radians,
"pow": math.pow,
"sqrt": sqrt,
"len": len,
"sum": sum,
"max": max,
"min": min,
"abs": abs,
"list": list,
"dict": dict,
"tuple": tuple,
"set": set,
"enumerate": enumerate,
"zip": zip,
"reversed": reversed,
"sorted": sorted,
"all": all,
"any": any,
"map": map,
"filter": filter,
"ord": ord,
"chr": chr,
}
@dataclass
class PreTool:
name: str
inputs: Dict[str, str]
output_type: type
task: str
description: str
repo_id: str
HUGGINGFACE_DEFAULT_TOOLS_FROM_HUB = [
"image-transformation",
"text-to-image",
]
def get_remote_tools(logger, organization="huggingface-tools"):
if is_offline_mode():
logger.info("You are in offline mode, so remote tools are not available.")
return {}
spaces = list_spaces(author=organization)
tools = {}
for space_info in spaces:
repo_id = space_info.id
resolved_config_file = hf_hub_download(repo_id, TOOL_CONFIG_FILE, repo_type="space")
with open(resolved_config_file, encoding="utf-8") as reader:
config = json.load(reader)
task = repo_id.split("/")[-1]
tools[config["name"]] = PreTool(
task=task,
description=config["description"],
repo_id=repo_id,
name=task,
inputs=config["inputs"],
output_type=config["output_type"],
)
return tools
def setup_default_tools(logger):
default_tools = {}
main_module = importlib.import_module("transformers")
tools_module = main_module.agents
for task_name, tool_class_name in TASK_MAPPING.items():
tool_class = getattr(tools_module, tool_class_name)
default_tools[tool_class.name] = PreTool(
name=tool_class.name,
inputs=tool_class.inputs,
output_type=tool_class.output_type,
task=task_name,
description=tool_class.description,
repo_id=None,
)
return default_tools
class PythonInterpreterTool(Tool):
name = "python_interpreter"
description = "This is a tool that evaluates python code. It can be used to perform calculations."
inputs = {
"code": {
"type": "text",
"description": (
"The code snippet to evaluate. All variables used in this snippet must be defined in this same snippet, "
f"else you will get an error. This code can only import the following python libraries: {LIST_SAFE_MODULES}."
),
}
}
output_type = "text"
available_tools = BASE_PYTHON_TOOLS.copy()
def forward(self, code):
output = str(evaluate_python_code(code, tools=self.available_tools))
return output
class FinalAnswerTool(Tool):
name = "final_answer"
description = "Provides a final answer to the given problem"
inputs = {"answer": {"type": "text", "description": "The final answer to the problem"}}
output_type = "any"
def forward(self, answer):
return answer
...@@ -16,10 +16,13 @@ ...@@ -16,10 +16,13 @@
# limitations under the License. # limitations under the License.
import re import re
import numpy as np
import torch
from ..models.auto import AutoProcessor from ..models.auto import AutoProcessor
from ..models.vision_encoder_decoder import VisionEncoderDecoderModel from ..models.vision_encoder_decoder import VisionEncoderDecoderModel
from ..utils import is_vision_available from ..utils import is_vision_available
from .base import PipelineTool from .tools import PipelineTool
if is_vision_available(): if is_vision_available():
...@@ -28,17 +31,19 @@ if is_vision_available(): ...@@ -28,17 +31,19 @@ if is_vision_available():
class DocumentQuestionAnsweringTool(PipelineTool): class DocumentQuestionAnsweringTool(PipelineTool):
default_checkpoint = "naver-clova-ix/donut-base-finetuned-docvqa" default_checkpoint = "naver-clova-ix/donut-base-finetuned-docvqa"
description = ( description = "This is a tool that answers a question about an document (pdf). It returns a text that contains the answer to the question."
"This is a tool that answers a question about an document (pdf). It takes an input named `document` which "
"should be the document containing the information, as well as a `question` that is the question about the "
"document. It returns a text that contains the answer to the question."
)
name = "document_qa" name = "document_qa"
pre_processor_class = AutoProcessor pre_processor_class = AutoProcessor
model_class = VisionEncoderDecoderModel model_class = VisionEncoderDecoderModel
inputs = ["image", "text"] inputs = {
outputs = ["text"] "document": {
"type": "image",
"description": "The image containing the information. Can be a PIL Image or a string path to the image.",
},
"question": {"type": "text", "description": "The question in English"},
}
output_type = "text"
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
if not is_vision_available(): if not is_vision_available():
...@@ -52,6 +57,10 @@ class DocumentQuestionAnsweringTool(PipelineTool): ...@@ -52,6 +57,10 @@ class DocumentQuestionAnsweringTool(PipelineTool):
decoder_input_ids = self.pre_processor.tokenizer( decoder_input_ids = self.pre_processor.tokenizer(
prompt, add_special_tokens=False, return_tensors="pt" prompt, add_special_tokens=False, return_tensors="pt"
).input_ids ).input_ids
if isinstance(document, str):
img = Image.open(document).convert("RGB")
img_array = np.array(img).transpose(2, 0, 1)
document = torch.tensor(img_array)
pixel_values = self.pre_processor(document, return_tensors="pt").pixel_values pixel_values = self.pre_processor(document, return_tensors="pt").pixel_values
return {"decoder_input_ids": decoder_input_ids, "pixel_values": pixel_values} return {"decoder_input_ids": decoder_input_ids, "pixel_values": pixel_values}
......
...@@ -14,32 +14,33 @@ ...@@ -14,32 +14,33 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from typing import TYPE_CHECKING
import torch import torch
from PIL import Image
from ..models.auto import AutoModelForVisualQuestionAnswering, AutoProcessor from ..models.auto import AutoModelForVisualQuestionAnswering, AutoProcessor
from ..utils import requires_backends from ..utils import requires_backends
from .base import PipelineTool from .tools import PipelineTool
if TYPE_CHECKING:
from PIL import Image
class ImageQuestionAnsweringTool(PipelineTool): class ImageQuestionAnsweringTool(PipelineTool):
default_checkpoint = "dandelin/vilt-b32-finetuned-vqa" default_checkpoint = "dandelin/vilt-b32-finetuned-vqa"
description = ( description = (
"This is a tool that answers a question about an image. It takes an input named `image` which should be the " "This is a tool that answers a question about an image. It "
"image containing the information, as well as a `question` which should be the question in English. It "
"returns a text that is the answer to the question." "returns a text that is the answer to the question."
) )
name = "image_qa" name = "image_qa"
pre_processor_class = AutoProcessor pre_processor_class = AutoProcessor
model_class = AutoModelForVisualQuestionAnswering model_class = AutoModelForVisualQuestionAnswering
inputs = ["image", "text"] inputs = {
outputs = ["text"] "image": {
"type": "image",
"description": "The image containing the information. Can be a PIL Image or a string path to the image.",
},
"question": {"type": "text", "description": "The question in English"},
}
output_type = "text"
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"]) requires_backends(self, ["vision"])
......
#!/usr/bin/env python
# coding=utf-8
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from copy import deepcopy
from enum import Enum
from typing import Dict, List
from huggingface_hub import InferenceClient
class MessageRole(str, Enum):
USER = "user"
ASSISTANT = "assistant"
SYSTEM = "system"
TOOL_CALL = "tool-call"
TOOL_RESPONSE = "tool-response"
@classmethod
def roles(cls):
return [r.value for r in cls]
def get_clean_message_list(message_list: List[Dict[str, str]], role_conversions: Dict[str, str] = {}):
"""
Subsequent messages with the same role will be concatenated to a single message.
Args:
message_list (`List[Dict[str, str]]`): List of chat messages.
"""
final_message_list = []
message_list = deepcopy(message_list) # Avoid modifying the original list
for message in message_list:
if not set(message.keys()) == {"role", "content"}:
raise ValueError("Message should contain only 'role' and 'content' keys!")
role = message["role"]
if role not in MessageRole.roles():
raise ValueError(f"Incorrect role {role}, only {MessageRole.roles()} are supported for now.")
if role in role_conversions:
message["role"] = role_conversions[role]
if len(final_message_list) > 0 and message["role"] == final_message_list[-1]["role"]:
final_message_list[-1]["content"] += "\n===\n" + message["content"]
else:
final_message_list.append(message)
return final_message_list
llama_role_conversions = {
MessageRole.SYSTEM: MessageRole.USER,
MessageRole.TOOL_RESPONSE: MessageRole.USER,
}
class HfEngine:
def __init__(self, model: str = "meta-llama/Meta-Llama-3-8B-Instruct"):
self.model = model
self.client = InferenceClient(model=self.model, timeout=120)
def __call__(self, messages: List[Dict[str, str]], stop_sequences=[]) -> str:
if "Meta-Llama-3" in self.model:
if "<|eot_id|>" not in stop_sequences:
stop_sequences.append("<|eot_id|>")
if "!!!!!" not in stop_sequences:
stop_sequences.append("!!!!!")
# Get clean message list
messages = get_clean_message_list(messages, role_conversions=llama_role_conversions)
# Get answer
response = self.client.chat_completion(messages, stop=stop_sequences, max_tokens=1500)
response = response.choices[0].message.content
# Remove stop sequences from the answer
for stop_seq in stop_sequences:
if response[-len(stop_seq) :] == stop_seq:
response = response[: -len(stop_seq)]
return response
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment