Unverified Commit 33357243 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Test composition (#23214)



* Remove nestedness in tool config

* Really do it

* Use remote tools descriptions

* Work

* Clean up eval

* Changes

* Tools

* Tools

* tool

* Fix everything

* Use last result/assign for evaluation

* Prompt

* Remove hardcoded selection

* Evaluation for chat agents

* correct some spelling

* Small fixes

* Change summarization model (#23172)

* Fix link displayed

* Update description of the tool

* Fixes in chat prompt

* Custom tools, custom prompt

* Tool clean up

* save_pretrained and push_to_hub for tool

* Fix init

* Tests

* Fix tests

* Tool save/from_hub/push_to_hub and tool->load_tool

* Clean push_to_hub and add app file

* Custom inference API for endpoints too

* Clean up

* old remote tool and new remote tool

* Make a requirements

* return_code adds tool creation

* Avoid redundancy between global variables

* Remote tools can be loaded

* Tests

* Text summarization tests

* Quality

* Properly mark tests

* Test the python interpreter

* And the CI shall be green.

* fix loading of additional tools

* Work on RemoteTool and fix tests

* General clean up

* Guard imports

* Fix tools

* docs: Fix broken link in 'How to add a model...'  (#23216)

fix link

* Get default endpoint from the Hub

* Add guide

* Simplify tool config

* Docs

* Some fixes

* Docs

* Docs

* Docs

* Fix code returned by agent

* Try this

* Match args with signature in remote tool

* Should fix python interpreter for Python 3.8

* Fix push_to_hub for tools

* Other fixes to push_to_hub

* Add API doc page

* Docs

* Docs

* Custom tools

* Pin tensorflow-probability (#23220)

* Pin tensorflow-probability

* [all-test]

* [all-test] Fix syntax for bash

* PoC for some chaining API

* Text to speech

* J'ai pris des libertés

* Rename

* Basic python interpreter

* Add agents

* Quality

* Add translation tool

* temp

* GenQA + LID + S2T

* Quality + word missing in translation

* Add open assistance, support f-strings in evaluate

* captioning + s2t fixes

* Style

* Refactor descriptions and remove chain

* Support errors and rename OpenAssistantAgent

* Add setup

* Deal with typos + example of inference API

* Some rename + README

* Fixes

* Update prompt

* Unwanted change

* Make sure everyone has a default

* One prompt to rule them all.

* SD

* Description

* Clean up remote tools

* More remote tools

* Add option to return code and update doc

* Image segmentation

* ControlNet

* Gradio demo

* Diffusers protection

* Lib protection

* ControlNet description

* Cleanup

* Style

* Remove accelerate and try to be reproducible

* No randomness

* Male Basic optional in token

* Clean description

* Better prompts

* Fix args eval in interpreter

* Add tool wrapper

* Tool on the Hub

* Style post-rebase

* Big refactor of descriptions, batch generation and evaluation for agents

* Make problems easier - interface to debug

* More problems, add python primitives

* Back to one prompt

* Remove dict for translation

* Be consistent

* Add prompts

* New version of the agent

* Evaluate new agents

* New endpoints agents

* Make all tools a dict variable

* Typo

* Add problems

* Add to big prompt

* Harmonize

* Add tools

* New evaluation

* Add more tools

* Build prompt with tools descriptions

* Tools on the Hub

* Let's chat!

* Cleanup

* Temporary bs4 safeguard

* Cache agents and clean up

* Blank init

* Fix evaluation for agents

* New format for tools on the Hub

* Add method to reset state

* Remove nestedness in tool config

* Really do it

* Use remote tools descriptions

* Work

* Clean up eval

* Changes

* Tools

* Tools

* tool

* Fix everything

* Use last result/assign for evaluation

* Prompt

* Remove hardcoded selection

* Evaluation for chat agents

* correct some spelling

* Small fixes

* Change summarization model (#23172)

* Fix link displayed

* Update description of the tool

* Fixes in chat prompt

* Custom tools, custom prompt

* Tool clean up

* save_pretrained and push_to_hub for tool

* Fix init

* Tests

* Fix tests

* Tool save/from_hub/push_to_hub and tool->load_tool

* Clean push_to_hub and add app file

* Custom inference API for endpoints too

* Clean up

* old remote tool and new remote tool

* Make a requirements

* return_code adds tool creation

* Avoid redundancy between global variables

* Remote tools can be loaded

* Tests

* Text summarization tests

* Quality

* Properly mark tests

* Test the python interpreter

* And the CI shall be green.

* Work on RemoteTool and fix tests

* fix loading of additional tools

* General clean up

* Guard imports

* Fix tools

* Get default endpoint from the Hub

* Simplify tool config

* Add guide

* Docs

* Some fixes

* Docs

* Docs

* Fix code returned by agent

* Try this

* Docs

* Match args with signature in remote tool

* Should fix python interpreter for Python 3.8

* Fix push_to_hub for tools

* Other fixes to push_to_hub

* Add API doc page

* Fixes

* Doc fixes

* Docs

* Fix audio

* Custom tools

* Audio fix

* Improve custom tools docstring

* Docstrings

* Trigger CI

* Mode docstrings

* More docstrings

* Improve custom tools

* Fix for remote tools

* Style

* Fix repo consistency

* Quality

* Tip

* Cleanup on doc

* Cleanup toc

* Add disclaimer for starcoder vs openai

* Remove disclaimer

* Small fixed in the prompts

* 4.29

* Update src/transformers/tools/agents.py
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>

* Complete documentation

* Small fixes

* Agent evaluation

* Note about gradio-tools & LC

* Clean up agents and prompt

* Apply suggestions from code review
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Apply suggestions from code review
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* Note about gradio-tools & LC

* Add copyrights and address review comments

* Quality

* Add all language codes

* Add remote tool tests

* Move custom prompts to other docs

* Apply suggestions from code review
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>

* TTS tests

* Quality

---------
Co-authored-by: default avatarLysandre <hi@lyand.re>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: default avatarPhilipp Schmid <32632186+philschmid@users.noreply.github.com>
Co-authored-by: default avatarConnor Henderson <connor.henderson@talkiatry.com>
Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: default avatarLysandre <lysandre@huggingface.co>
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
parent 366a8ca0
...@@ -43,6 +43,7 @@ def pytest_configure(config): ...@@ -43,6 +43,7 @@ def pytest_configure(config):
) )
config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment") config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment")
config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate") config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate")
config.addinivalue_line("markers", "tool_tests: mark the tool tests that are run on their specific schedule")
def pytest_addoption(parser): def pytest_addoption(parser):
......
...@@ -21,6 +21,8 @@ ...@@ -21,6 +21,8 @@
title: Set up distributed training with 🤗 Accelerate title: Set up distributed training with 🤗 Accelerate
- local: model_sharing - local: model_sharing
title: Share your model title: Share your model
- local: transformers_agents
title: Agents
title: Tutorials title: Tutorials
- sections: - sections:
- sections: - sections:
...@@ -99,6 +101,8 @@ ...@@ -99,6 +101,8 @@
title: Notebooks with examples title: Notebooks with examples
- local: community - local: community
title: Community resources title: Community resources
- local: custom_tools
title: Custom Tools
- local: troubleshooting - local: troubleshooting
title: Troubleshoot title: Troubleshoot
title: Developer guides title: Developer guides
...@@ -179,6 +183,8 @@ ...@@ -179,6 +183,8 @@
title: Conceptual guides title: Conceptual guides
- sections: - sections:
- sections: - sections:
- local: main_classes/agent
title: Agents and Tools
- local: model_doc/auto - local: model_doc/auto
title: Auto Classes title: Auto Classes
- local: main_classes/callback - local: main_classes/callback
......
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Custom Tools and Prompts
<Tip>
If you are not aware of what tools and agents are in the context of transformers, we recommend you read the
[Transformers Agents](transformers_agents) page first.
</Tip>
<Tip warning={true}>
Transformers Agent is an experimental API which is subject to change at any time. Results returned by the agents
can vary as the APIs or underlying models are prone to change.
</Tip>
Creating and using custom tools and prompts is paramount to empowering the agent and having it perform new tasks.
In this guide we'll take a look at:
- How to customize the prompt
- How to use custom tools
- How to create custom tools
## Customizing the prompt
As explained in [Transformers Agents](transformers_agents) agents can run in [`~Agent.run`] and [`~Agent.chat`] mode.
Both the run and chat mode underlie the same logic. The language model powering the agent is conditioned on a long prompt
and simply asked to complete the prompt by generating next tokens until the stop token is reached.
The only difference between the `run` and `chat` mode is that during the `chat` mode the prompt is extended with
previous user inputs and model generations, which seemingly gives the agent a memory and allows it to refer to
past interactions.
Let's take a closer look into how the prompt is structured to understand how it can be best customized.
The prompt is structured broadly into four parts.
- 1. Introduction: how the agent should behave, explanation of the concept of tools.
- 2. Description of all the tools. This is defined by a `<<all_tools>>` token that is dynamically replaced at runtime with the tools defined/chosen by the user.
- 3. A set of examples of tasks and their solution
- 4. Current example, and request for solution.
To better understand each part, let's look at a shortened version of how such a prompt can look like in practice.
```
I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task.
[...]
You can print intermediate results if it makes sense to do so.
Tools:
- document_qa: This is a tool that answers a question about an document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text that contains the answer to the question.
- image_captioner: This is a tool that generates a description of an image. It takes an input named `image` which should be the image to caption, and returns a text that contains the description in English.
[...]
Task: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French."
I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image.
Answer:
```py
translated_question = translator(question=question, src_lang="French", tgt_lang="English")
print(f"The translated question is {translated_question}.")
answer = image_qa(image=image, question=translated_question)
print(f"The answer is {answer}")
```
Task: "Identify the oldest person in the `document` and create an image showcasing the result as a banner."
I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.
Answer:
```py
answer = document_qa(document, question="What is the oldest person?")
print(f"The answer is {answer}.")
image = image_generator("A banner showing " + answer)
```
[...]
Task: "Draw me a picture of rivers and lakes"
I will use the following
```
The first part explains precisely how the model shall behave and what it should do. This part
most likely does not need to be customized.
TODO(PVP) - explain better how the .description and .name influence the prompt
### Customizing the tool descriptions
The performance of the agent is directly linked to the prompt itself. We structure the prompt so that it works well
with what we intend for the agent to do; but for maximum customization we also offer the ability to specify a different prompt when instantiating the agent.
### Customizing the single-execution prompt
In order to specify a custom single-execution prompt, one would so the following:
```py
template = """ [...] """
agent = HfAgent(your_endpoint, run_prompt_template=template)
```
<Tip>
Please make sure to have the `<<all_tools>>` string defined somewhere in the `template` so that the agent can be aware
of the tools it has available to it.
</Tip>
#### Chat-execution prompt
In order to specify a custom single-execution prompt, one would so the following:
```
template = """ [...] """
agent = HfAgent(
url_endpoint=your_endpoint,
token=your_hf_token,
chat_prompt_template=template
)
```
<Tip>
Please make sure to have the `<<all_tools>>` string defined somewhere in the `template` so that the agent can be
aware of the tools it has available to it.
</Tip>
## Using custom tools
In this section, we'll be leveraging two existing custom tools that are specific to image generation:
- We replace [huggingface-tools/image-transformation](https://huggingface.co/spaces/huggingface-tools/image-transformation),
with [diffusers/controlnet-canny-tool](https://huggingface.co/spaces/diffusers/controlnet-canny-tool)
to allow for more image modifications.
- We add a new tool for image upscaling to the default toolbox:
[diffusers/latent-upscaler-tool](https://huggingface.co/spaces/diffusers/latent-upscaler-tool) replace the existing image-transformation tool.
We'll start by loading the custom tools with the convenient [`load_tool`] function:
```py
from transformers import load_tool
controlnet_transformer = load_tool("diffusers/controlnet-canny-tool")
upscaler = load_tool("diffusers/latent-upscaler-tool")
```
Upon adding custom tools to an agent, the tools' descriptions and names are automatically
included in the agents' prompts. Thus, it is imperative that custom tools have
a well-written description and name in order for the agent to understand how to use them.
Let's take a look at the description and name of `controlnet_transformer`:
```py
print(f"Description: '{controlnet_transformer.description}'")
print(f"Name: '{controlnet_transformer.name}'")
```
gives
```
Description: 'This is a tool that transforms an image with ControlNet according to a prompt.
It takes two inputs: `image`, which should be the image to transform, and `prompt`, which should be the prompt to use to change it. It returns the modified image.'
Name: 'image_transformer'
```
The name and description is accurate and fits the style of the [curated set of tools](./transformers_agents#a-curated-set-of-tools).
Next, let's instantiate an agent with `controlnet_transformer` and `upscaler`:
```py
tools = [controlnet_transformer, upscaler]
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=tools)
```
This command should give you the following info:
```
image_transformer has been replaced by <transformers_modules.diffusers.controlnet-canny-tool.bd76182c7777eba9612fc03c0
8718a60c0aa6312.image_transformation.ControlNetTransformationTool object at 0x7f1d3bfa3a00> as provided in `additional_tools`
```
The set of curated tools already has a `image_transformer` tool which is hereby replaced with our custom tool.
<Tip>
Overwriting existing tools can be beneficial if we want to use a custom tool exactly for the same task as an existing tool
because the agent is well-versed in using the specific task. Beware that the custom tool should follow the exact same API
as the overwritten tool in this case.
</Tip>
The upscaler tool was given the name `image_upscaler` which is not yet present in the default toolbox and is therefore is simply added to the list of tools.
You can always have a look at the toolbox that is currently available to the agent via the `agent.toolbox` attribute:
```py
print("\n".join([f"- {a}" for a in agent.toolbox.keys()]))
```
```
- document_qa
- image_captioner
- image_qa
- image_segmenter
- transcriber
- summarizer
- text_classifier
- text_qa
- text_reader
- translator
- image_transformer
- text_downloader
- image_generator
- video_generator
- image_upscaler
```
Note how `image_upscaler` is now part of the agents' toolbox.
Let's now try out the new tools! We will re-use the image we generated in (Transformers Agents Quickstart)[./transformers_agents#single-execution-run].
```py
from diffusers.utils import load_image
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png"
)
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
Let's transform the image into a beautiful winter landscape:
```py
image = agent.run("Transform the image: 'A frozen lake and snowy forest'", image=image)
```
```
==Explanation from the agent==
I will use the following tool: `image_transformer` to transform the image.
==Code generated by the agent==
image = image_transformer(image, prompt="A frozen lake and snowy forest")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes_winter.png" width=200>
The new image processing tool is based on ControlNet which is can make very strong modifications to the image.
By default the image processing tool returns an image of size 512x512 pixels. Let's see if we can upscale it.
```py
image = agent.run("Upscale the image", image)
```
```
==Explanation from the agent==
I will use the following tool: `image_upscaler` to upscale the image.
==Code generated by the agent==
upscaled_image = image_upscaler(image)
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes_winter_upscale.png" width=400>
The agent automatically mapped our prompt "Upscale the image" to the just added upscaler tool purely based on the description and name of the upscaler tool
and was able to correctly run it.
Next, let's have a look into how you can create a new custom tool.
### Adding new tools
In this section we show how to create a new tool that can be added to the agent.
#### Creating a new tool
We'll first start by creating a tool. We'll add the not-so-useful yet fun task of fetching the model on the Hugging Face
Hub with the most downloads for a given task.
We can do that with the following code:
```python
from huggingface_hub import list_models
task = "text-classification"
model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
print(model.id)
```
For the task `text-classification`, this returns `'facebook/bart-large-mnli'`, for `translation` it returns `'t5-base`.
How do we convert this to a tool that the agent can leverage? All tools depend on the superclass `Tool` that holds the
main attributes necessary. We'll create a class that inherits from it:
```python
from transformers import Tool
class HFModelDownloadsTool(Tool):
pass
```
This class has a few needs:
- An attribute `name`, which corresponds to the name of the tool itself. To be in tune with other tools which have a
performative name, we'll name it `model_download_counter`.
- An attribute `description`, which will be used to populate the prompt of the agent.
- `inputs` and `outputs` attributes. Defining this will help the python interpreter make educated choices about types,
and will allow for a gradio-demo to be spawned when we push our tool to the Hub. They're both a list of expected
values, which can be `text`, `image`, or `audio`.
- A `__call__` method which contains the inference code. This is the code we've played with above!
Here's what our class looks like now:
```python
from transformers import Tool
from huggingface_hub import list_models
class HFModelDownloadsTool(Tool):
name = "model_download_counter"
description = (
"This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. "
"It takes the name of the category (such as text-classification, depth-estimation, etc), and "
"returns the name of the checkpoint."
)
inputs = ["text"]
outputs = ["text"]
def __call__(self, task: str):
model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
return model.id
```
We now have our tool handy. Save it in a file and import it from your main script. Let's name this file
`model_downloads.py`, so the resulting import code looks like this:
```python
from model_downloads import HFModelDownloadsTool
tool = HFModelDownloadsTool()
```
In order to let others benefit from it and for simpler initialization, we recommend pushing it to the Hub under your
namespace. To do so, just call `push_to_hub` on the `tool` variable:
```python
tool.push_to_hub("lysandre/hf-model-downloads")
```
You now have your code on the Hub! Let's take a look at the final step, which is to have the agent use it.
#### Having the agent use the tool
We now have our tool that lives on the Hub which can be instantiated as such:
```python
from transformers import load_tool
tool = load_tool("lysandre/hf-model-downloads")
```
In order to use it in the agent, simply pass it in the `additional_tools` parameter of the agent initialization method:
```python
from transformers import HfAgent
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=[tool])
agent.run(
"Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?"
)
```
which outputs the following:
```
==Code generated by the agent==
model = model_download_counter(task="text-to-video")
print(f"The model with the most downloads is {model}.")
audio_model = text_reader(model)
==Result==
The model with the most downloads is damo-vilab/text-to-video-ms-1.7b.
```
and generates the following audio.
| **Audio** |
|------------------------------------------------------------------------------------------------------------------------------------------------------|
| <audio controls><source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/damo.wav" type="audio/wav"/> |
<Tip>
Depending on the LLM, some are quite brittle and require very exact prompts in order to work well. Having a well-defined
description of the tool is paramount to having it be leveraged by the agent.
</Tip>
### Replacing existing tools
Replacing existing tools can be done simply by assigning a new item to the agent's toolbox. Here's how one would do so:
```python
from transformers import HfAgent, load_tool
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
agent.toolbox["image-transformation"] = load_tool("diffusers/controlnet-canny-tool")
```
<Tip>
Beware when replacing tools with others! This will also adjust the agent's prompt. This can be good if you have a better
prompt suited for the task, but it can also result in your tool being selected way more than others or for other
tools to be selected instead of the one you have defined.
</Tip>
## Leveraging gradio-tools
[gradio-tools](https://github.com/freddyaboulton/gradio-tools) is a powerful library that allows using Hugging
Face Spaces as tools. It supports many existing Spaces as well as custom Spaces to be designed with it.
We offer support for `gradio_tools` by using the `Tool.from_gradio` method. For example, we want to take
advantage of the `StableDiffusionPromptGeneratorTool` tool offered in the `gradio-tools` toolkit so as to
improve our prompts and generate better images.
We first import the tool from `gradio_tools` and instantiate it:
```python
from gradio_tools import StableDiffusionPromptGeneratorTool
gradio_tool = StableDiffusionPromptGeneratorTool()
```
We pass that instance to the `Tool.from_gradio` method:
```python
from transformers import Tool
tool = Tool.from_gradio(gradio_tools)
```
Now we can manage it exactly as we would a usual custom tool. We leverage it to improve our prompt
` a rabbit wearing a space suit`:
```python
from transformers import HfAgent
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=[tool])
agent.run("Generate an image of the `prompt` after improving it.", prompt="A rabbit wearing a space suit")
```
The model adequately leverages the tool:
```
==Explanation from the agent==
I will use the following tools: `StableDiffusionPromptGenerator` to improve the prompt, then `image_generator` to generate an image according to the improved prompt.
==Code generated by the agent==
improved_prompt = StableDiffusionPromptGenerator(prompt)
print(f"The improved prompt is {improved_prompt}.")
image = image_generator(improved_prompt)
```
Before finally generating the image:
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png">
<Tip warning={true}>
gradio-tools requires *textual* inputs and outputs, even when working with different modalities. This implementation
works with image and audio objects. The two are currently incompatible, but will rapidly become compatible as we
work to improve the support.
</Tip>
## Future compatibility with Langchain
We love Langchain and think it has a very compelling suite of tools. In order to handle these tools,
Langchain requires *textual* inputs and outputs, even when working with different modalities.
This is often the serialized version (i.e., saved to disk) of the objects.
This difference means that multi-modality isn't handled between transformers-agents and langchain.
We aim for this limitation to be resolved in future versions, and welcome any help from avid langchain
users to help us achieve this compatibility.
We would love to have better support. If you would like to help, please
[open an issue](https://github.com/huggingface/transformers/issues/new) and share what you have in mind.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Agents & Tools
<Tip warning={true}>
Transformers Agent is an experimental API which is subject to change at any time. Results returned by the agents
can vary as the APIs or underlying models are prone to change.
</Tip>
To learn more about agents and tools make sure to read the [introductory guide](../agents_and_tools). This page
contains the API docs for the underlying classes.
## Agents
We provide two types of agents: [`HfAgent`] uses inference endpoints for opensource models and [`OpenAiAgent`] uses OpenAI closed models.
### HfAgent
[[autodoc]] HfAgent
### OpenAiAgent
[[autodoc]] OpenAiAgent
### Agent
[[autodoc]] Agent
- chat
- run
- prepare_for_new_chat
## Tools
### load_tool
[[autodoc]] load_tool
### Tool
[[autodoc]] Tool
### PipelineTool
[[autodoc]] PipelineTool
### RemoteTool
[[autodoc]] RemoteTool
### launch_gradio_demo
[[autodoc]] launch_gradio_demo
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Transformers Agent
<Tip warning={true}>
Transformers Agent is an experimental API which is subject to change at any time. Results returned by the agents
can vary as the APIs or underlying models are prone to change.
</Tip>
Transformers version v4.29.0, building on the concept of *tools* and *agents*.
In short, it provides a natural language API on top of transformers: we define a set of curated tools, and design an
agent to interpret natural language and to use these tools. It is extensible by design; we curated some relevant tools,
but we'll show you how the system can be extended easily to use any tool developed by the community.
Let's start with a few examples of what can be achieved with this new API. It is particularly powerful when it comes
to multimodal tasks, so let's take it for a spin to generate images and read text out loud.
```py
agent.run("Caption the following image", image=image)
```
| **Input** | **Output** |
|-----------------------------------------------------------------------------------------------------------------------------|-----------------------------------|
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/beaver.png" width=200> | A beaver is swimming in the water |
---
```py
agent.run("Read the following text out loud", text=text)
```
| **Input** | **Output** |
|-------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
| A beaver is swimming in the water | <audio controls><source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tts_example.wav" type="audio/wav"> your browser does not support the audio element. </audio>
---
```py
agent.run(
"In the following `document`, where will the TRRF Scientific Advisory Council Meeting take place?",
document=document,
)
```
| **Input** | **Output** |
|-----------------------------------------------------------------------------------------------------------------------------|----------------|
| <img src="https://datasets-server.huggingface.co/assets/hf-internal-testing/example-documents/--/hf-internal-testing--example-documents/test/0/image/image.jpg" width=200> | ballroom foyer |
## Quickstart
Before being able to use `agent.run`, you will need to instantiate an agent, which is a large language model (LLM).
We recommend using the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) checkpoint as it works very well
for the task at hand and is open-source, but please find other examples below.
Start by logging-in to have access to the Inference API:
```py
from huggingface_hub import login
login("<YOUR_TOKEN>")
```
Then, instantiate the agent
```py
from transformers import HfAgent
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
```
This is using the inference API that Hugging Face provides for free at the moment, if you have your own inference
endpoint for this model (or another one) you can replace the url above by your url endpoint.
<Tip>
We're showcasing StarCoder as the default in the documentation as the model is free to use and performs admirably well
on simple tasks. However, the checkpoint doesn't hold up when handling more complex prompts. If you're facing such an
issue, we recommend trying out the OpenAI model which, while sadly not open-source, performs better at this given time.
</Tip>
You're now good to go! Let's dive into the two APIs that you now have at your disposal.
### Single execution (run)
The single execution method is when using the [`~Agent.run`] method of the agent:
```py
agent.run("Draw me a picture of rivers and lakes")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
It automatically select the tool (or tools) appropriate for the task you want to perform and run them appropriately. It
can perform one or several tasks in the same instruction (though the more complex your instruction, the more likely
the agent is to fail).
```py
agent.chat("Draw me a picture of the sea then transform the picture to add an island.")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/sea_and_island.png" width=200>
<br/>
Every [`~Agent.run`] operation is independent, so you can run it several times in a row with different tasks.
Note that your `agent` is just a large-language model, so small variations in your prompt might yield completely
different results. It's important to explain as clearly as possible the task you want to perform.
If you'd like to keep a state across executions or to pass non-text objects to the agent, you can do so by specifying
variables that you would like the agent to use. For example you could generate the first image of rivers and lakes,
and ask the model to update that picture to add an island by doing the following:
```python
picture = agent.run("Draw me a picture of rivers and lakes")
updated_picture = agent.chat("Take that `picture` and add an island to it", picture=picture)
```
<Tip>
This can be helpful when the model is unable to understand your request and mixes tools. An example would be:
```python
agent.run("Draw me the picture of a capybara swimming in the sea")
```
Here, the model could interpret it two ways:
- Have the `text-to-image` generate a capybara swimming in the sea
- Or, have the `text-to-image` generate capybara, then use the `image-transformation` tool to have it swim in the sea
In case you would like to force the first scenario, you could do so by passing it the prompt as an argument:
```python
agent.run("Draw me a picture of the `prompt`", prompt="a capybara swimming in the sea")
```
</Tip>
### Chat-based execution (chat)
The agent also has a chat-based approach, using the [`~Agent.chat`] method:
```py
agent.chat("Draw me a picture of rivers and lakes")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
```py
agent.chat("Transform the picture so that there is a rock in there")
```
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes_and_beaver.png" width=200>
<br/>
This is an interesting approach when you want to keep the state across instructions. It's better for experimentation,
but will tend to be much better at single instructions rather than complex instructions (which the [`~Agent.run`]
method is better at handling).
This method can also take arguments if you would like to pass non-text types or specific prompts.
### ⚠️ Remote execution
For demonstration purposes and so that this can be used with all setups, we have created remote executors for several
of the default tools the agent has access to. These are created using
[inference endpoints](https://huggingface.co/inference-endpoints). To see how to setup remote executors tools yourself,
we recommend reading the custom tool guide [TODO LINK].
In order to run with remote tools, specifying `remote=True` to either [`~Agent.run`] or [`~Agent.chat`] is sufficient.
For example, the following command could be run on any device efficiently, without needing significant RAM or GPU:
```python
agent.run("Draw me a picture of rivers and lakes", remote=True)
```
The same can be said for [`~Agent.chat`]:
```py
agent.chat("Draw me a picture of rivers and lakes", remote=True)
```
### What's happening here? What are tools, and what are agents?
#### Agents
The "agent" here is a large language model, and we're prompting it so that it has access to a specific set of tools.
LLMs are pretty good at generating small samples of code, so this API takes advantage of that by prompting the
LLM to give a small sample of code performing a task with a set of tools. This prompt is then completed by the
task you give your agent and the description of the tools you give it. This way it gets access to the doc of the
tools you are using, especially their expected inputs and outputs and can generate the relevant code.
#### Tools
Tools are very simple: they're a single function, with a name, and a description. We then use these tools description
to prompt the agent. Through the prompt, we show the agent how it would leverage tools in order to perform what was
requests in the query.
This is using brand-new tools and not pipelines, because the agent writes better code with very atomic tools.
Pipelines are more refactored and often combine several tasks in one. Tools are really meant to be focused on
one very simple task only.
#### Code-execution?!
This code is then executed with our small Python interpreter on the set of inputs passed along with your tools.
We hear you screaming "Arbitrary code execution!" in the back, but let us explain why that is not the case.
The only functions that can be called are the tools you provided and the print function, so you're already
limited in what can be executed. You should be safe if it's limited to Hugging Face tools.
Then, we don't allow any attribute lookup or imports (which shouldn't be needed anyway for passing along
inputs/outputs to a small set of functions) so all the most obvious attacks (and you'd need to prompt the LLM
to output them anyway) shouldn't be an issue. If you want to be on the super safe side, you can execute the
run() method with the additional argument return_code=True, in which case the agent will just return the code
to execute and you can decide whether to do it or not.
The execution will stop at any line trying to perform an illegal operation or if there is a regular Python error
with the code generated by the agent.
### A curated set of tools
We identify a set of tools that can empower such agents. Here is an updated list of the tools we have integrated
in `transformers`:
- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](../model_doc/donut))
- **Text question answering**: given a long text and a question, answer the question in the text ([Flan-T5](../model_doc/flan-t5))
- **Unconditional image captioning**: Caption the image! ([BLIP](../model_doc/blip))
- **Image question answering**: given an image, answer a question on this image ([VILT](../model_doc/vilt))
- **Image segmentation**: given an image and a prompt, output the segmentation mask of that prompt ([CLIPSeg](../model_doc/clipseg))
- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](../model_doc/whisper))
- **Text to speech**: convert text to speech ([SpeechT5](../model_doc/speecht5))
- **Zero-shot text classification**: given a text and a list of labels, identify to which label the text corresponds the most ([BART](../model_doc/bart))
- **Text summarization**: summarize a long text in one or a few sentences ([BART](../model_doc/bart))
- **Translation**: translate the text into a given language ([NLLB](../model_doc/nllb))
These tools have an integration in transformers, and can be used manually as well, for example:
```py
from transformers import load_tool
tool = load_tool("text-to-speech")
audio = tool("This is a text to speech tool")
```
### Custom tools
While we identify a curated set of tools, we strongly believe that the main value provided by this implementation is
the ability to quickly create and share custom tools.
By pushing the code of a tool to a Hugging Face Space or a model repository, you're then able to leverage the tool
directly with the agent. We've added a few
**transformers-agnostic** tools to the `huggingface-tools` organization:
- **Text downloader**: to download a text from a web URL
- **Text to image**: generate an image according to a prompt, leveraging stable diffusion
- **Image transformation**: modify an image given an initial image and a prompt, leveraging instruct pix2pix stable diffusion
The text-to-image tool we have been using since the beginning is actually a remote tool that lives in
[*huggingface-tools/text-to-image*](https://huggingface.co/spaces/huggingface-tools/text-to-image)! We will
continue releasing such tools on this and other organization, to further supercharge this implementation.
The agents have by default access to tools that reside on `huggingface-tools`.
We explain how to you can write and share your own tools as well as leverage any custom tool that resides on the Hub in [following guide](custom_tools).
[following guide](custom_tools).
### Leveraging different agents
We showcase here how to use the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) model as an LLM, but
it isn't the only model available. We also support the OpenAssistant model and OpenAI's davinci models (3.5 and 4).
We're planning on supporting local language models in an ulterior version.
The tools defined in this implementation are agnostic to the agent used; we are showcasing the agents that work with
our prompts below, but the tools can also be used with Langchain, Minichain, or any other Agent-based library.
#### Example code for the OpenAssistant model
```py
from transformers import HfAgent
agent = HfAgent(url_endpoint="https://OpenAssistant/oasst-sft-1-pythia-12b", token="<HF_TOKEN>")
```
#### Example code for OpenAI models
```py
from transformers import OpenAiAgent
agent = OpenAiAgent(model="text-davinci-003", api_key="<API_KEY>")
```
### Code generation
So far we have shown how to use the agents to perform actions for you. However, the agent is really only generating code
that we then execute using a very restricted Python interpreter. In case you would like to use the code generated in
a different setting, the agent can be prompted to return the code, along with tool definition and accurate imports.
For example, the following instruction
```python
agent.run("Draw me a picture of rivers and lakes", return_code=True)
```
returns the following code
```python
from transformers import load_tool
image_generator = load_tool("huggingface-tools/text-to-image")
image = image_generator(prompt="rivers and lakes")
```
that you can then modify and execute yourself.
...@@ -610,6 +610,16 @@ _import_structure = { ...@@ -610,6 +610,16 @@ _import_structure = {
"SpecialTokensMixin", "SpecialTokensMixin",
"TokenSpan", "TokenSpan",
], ],
"tools": [
"Agent",
"HfAgent",
"OpenAiAgent",
"PipelineTool",
"RemoteTool",
"Tool",
"launch_gradio_demo",
"load_tool",
],
"trainer_callback": [ "trainer_callback": [
"DefaultFlowCallback", "DefaultFlowCallback",
"EarlyStoppingCallback", "EarlyStoppingCallback",
...@@ -4340,6 +4350,9 @@ if TYPE_CHECKING: ...@@ -4340,6 +4350,9 @@ if TYPE_CHECKING:
TokenSpan, TokenSpan,
) )
# Tools
from .tools import Agent, HfAgent, OpenAiAgent, PipelineTool, RemoteTool, Tool, launch_gradio_demo, load_tool
# Trainer # Trainer
from .trainer_callback import ( from .trainer_callback import (
DefaultFlowCallback, DefaultFlowCallback,
......
...@@ -115,9 +115,9 @@ def get_relative_import_files(module_file): ...@@ -115,9 +115,9 @@ def get_relative_import_files(module_file):
return all_relative_imports return all_relative_imports
def check_imports(filename): def get_imports(filename):
""" """
Check if the current Python environment contains all the libraries that are imported in a file. Extracts all the libraries that are imported in a file.
""" """
with open(filename, "r", encoding="utf-8") as f: with open(filename, "r", encoding="utf-8") as f:
content = f.read() content = f.read()
...@@ -131,9 +131,14 @@ def check_imports(filename): ...@@ -131,9 +131,14 @@ def check_imports(filename):
imports += re.findall(r"^\s*from\s+(\S+)\s+import", content, flags=re.MULTILINE) imports += re.findall(r"^\s*from\s+(\S+)\s+import", content, flags=re.MULTILINE)
# Only keep the top-level module # Only keep the top-level module
imports = [imp.split(".")[0] for imp in imports if not imp.startswith(".")] imports = [imp.split(".")[0] for imp in imports if not imp.startswith(".")]
return list(set(imports))
# Unique-ify and test we got them all def check_imports(filename):
imports = list(set(imports)) """
Check if the current Python environment contains all the libraries that are imported in a file.
"""
imports = get_imports(filename)
missing_packages = [] missing_packages = []
for imp in imports: for imp in imports:
try: try:
...@@ -169,6 +174,7 @@ def get_cached_module_file( ...@@ -169,6 +174,7 @@ def get_cached_module_file(
use_auth_token: Optional[Union[bool, str]] = None, use_auth_token: Optional[Union[bool, str]] = None,
revision: Optional[str] = None, revision: Optional[str] = None,
local_files_only: bool = False, local_files_only: bool = False,
repo_type: Optional[str] = None,
_commit_hash: Optional[str] = None, _commit_hash: Optional[str] = None,
): ):
""" """
...@@ -207,6 +213,8 @@ def get_cached_module_file( ...@@ -207,6 +213,8 @@ def get_cached_module_file(
identifier allowed by git. identifier allowed by git.
local_files_only (`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
If `True`, will only try to load the tokenizer configuration from local files. If `True`, will only try to load the tokenizer configuration from local files.
repo_type (`str`, *optional*):
Specify the repo type (useful when downloading from a space for instance).
<Tip> <Tip>
...@@ -229,7 +237,7 @@ def get_cached_module_file( ...@@ -229,7 +237,7 @@ def get_cached_module_file(
else: else:
submodule = pretrained_model_name_or_path.replace("/", os.path.sep) submodule = pretrained_model_name_or_path.replace("/", os.path.sep)
cached_module = try_to_load_from_cache( cached_module = try_to_load_from_cache(
pretrained_model_name_or_path, module_file, cache_dir=cache_dir, revision=_commit_hash pretrained_model_name_or_path, module_file, cache_dir=cache_dir, revision=_commit_hash, repo_type=repo_type
) )
new_files = [] new_files = []
...@@ -245,6 +253,7 @@ def get_cached_module_file( ...@@ -245,6 +253,7 @@ def get_cached_module_file(
local_files_only=local_files_only, local_files_only=local_files_only,
use_auth_token=use_auth_token, use_auth_token=use_auth_token,
revision=revision, revision=revision,
repo_type=repo_type,
_commit_hash=_commit_hash, _commit_hash=_commit_hash,
) )
if not is_local and cached_module != resolved_module_file: if not is_local and cached_module != resolved_module_file:
...@@ -309,8 +318,10 @@ def get_cached_module_file( ...@@ -309,8 +318,10 @@ def get_cached_module_file(
if len(new_files) > 0: if len(new_files) > 0:
new_files = "\n".join([f"- {f}" for f in new_files]) new_files = "\n".join([f"- {f}" for f in new_files])
repo_type_str = "" if repo_type is None else f"{repo_type}/"
url = f"https://huggingface.co/{repo_type_str}{pretrained_model_name_or_path}"
logger.warning( logger.warning(
f"A new version of the following files was downloaded from {pretrained_model_name_or_path}:\n{new_files}" f"A new version of the following files was downloaded from {url}:\n{new_files}"
"\n. Make sure to double-check they do not contain any added malicious code. To avoid downloading new " "\n. Make sure to double-check they do not contain any added malicious code. To avoid downloading new "
"versions of the code file, you can pin a revision." "versions of the code file, you can pin a revision."
) )
...@@ -328,6 +339,7 @@ def get_class_from_dynamic_module( ...@@ -328,6 +339,7 @@ def get_class_from_dynamic_module(
use_auth_token: Optional[Union[bool, str]] = None, use_auth_token: Optional[Union[bool, str]] = None,
revision: Optional[str] = None, revision: Optional[str] = None,
local_files_only: bool = False, local_files_only: bool = False,
repo_type: Optional[str] = None,
**kwargs, **kwargs,
): ):
""" """
...@@ -377,6 +389,8 @@ def get_class_from_dynamic_module( ...@@ -377,6 +389,8 @@ def get_class_from_dynamic_module(
identifier allowed by git. identifier allowed by git.
local_files_only (`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
If `True`, will only try to load the tokenizer configuration from local files. If `True`, will only try to load the tokenizer configuration from local files.
repo_type (`str`, *optional*):
Specify the repo type (useful when downloading from a space for instance).
<Tip> <Tip>
...@@ -418,6 +432,7 @@ def get_class_from_dynamic_module( ...@@ -418,6 +432,7 @@ def get_class_from_dynamic_module(
use_auth_token=use_auth_token, use_auth_token=use_auth_token,
revision=revision, revision=revision,
local_files_only=local_files_only, local_files_only=local_files_only,
repo_type=repo_type,
) )
return get_class_in_module(class_name, final_module.replace(".py", "")) return get_class_in_module(class_name, final_module.replace(".py", ""))
...@@ -439,6 +454,7 @@ def custom_object_save(obj, folder, config=None): ...@@ -439,6 +454,7 @@ def custom_object_save(obj, folder, config=None):
"this code in a separate module so we can include it in the saved folder and make it easier to share via " "this code in a separate module so we can include it in the saved folder and make it easier to share via "
"the Hub." "the Hub."
) )
return
def _set_auto_map_in_config(_config): def _set_auto_map_in_config(_config):
module_name = obj.__class__.__module__ module_name = obj.__class__.__module__
...@@ -478,12 +494,17 @@ def custom_object_save(obj, folder, config=None): ...@@ -478,12 +494,17 @@ def custom_object_save(obj, folder, config=None):
elif config is not None: elif config is not None:
_set_auto_map_in_config(config) _set_auto_map_in_config(config)
result = []
# Copy module file to the output folder. # Copy module file to the output folder.
object_file = sys.modules[obj.__module__].__file__ object_file = sys.modules[obj.__module__].__file__
dest_file = Path(folder) / (Path(object_file).name) dest_file = Path(folder) / (Path(object_file).name)
shutil.copy(object_file, dest_file) shutil.copy(object_file, dest_file)
result.append(dest_file)
# Gather all relative imports recursively and make sure they are copied as well. # Gather all relative imports recursively and make sure they are copied as well.
for needed_file in get_relative_import_files(object_file): for needed_file in get_relative_import_files(object_file):
dest_file = Path(folder) / (Path(needed_file).name) dest_file = Path(folder) / (Path(needed_file).name)
shutil.copy(needed_file, dest_file) shutil.copy(needed_file, dest_file)
result.append(dest_file)
return result
...@@ -64,6 +64,10 @@ class ChannelDimension(ExplicitEnum): ...@@ -64,6 +64,10 @@ class ChannelDimension(ExplicitEnum):
LAST = "channels_last" LAST = "channels_last"
def is_pil_image(img):
return is_vision_available() and isinstance(img, PIL.Image.Image)
def is_valid_image(img): def is_valid_image(img):
return ( return (
(is_vision_available() and isinstance(img, PIL.Image.Image)) (is_vision_available() and isinstance(img, PIL.Image.Image))
......
...@@ -148,6 +148,7 @@ _run_custom_tokenizers = parse_flag_from_env("RUN_CUSTOM_TOKENIZERS", default=Fa ...@@ -148,6 +148,7 @@ _run_custom_tokenizers = parse_flag_from_env("RUN_CUSTOM_TOKENIZERS", default=Fa
_run_staging = parse_flag_from_env("HUGGINGFACE_CO_STAGING", default=False) _run_staging = parse_flag_from_env("HUGGINGFACE_CO_STAGING", default=False)
_tf_gpu_memory_limit = parse_int_from_env("TF_GPU_MEMORY_LIMIT", default=None) _tf_gpu_memory_limit = parse_int_from_env("TF_GPU_MEMORY_LIMIT", default=None)
_run_pipeline_tests = parse_flag_from_env("RUN_PIPELINE_TESTS", default=True) _run_pipeline_tests = parse_flag_from_env("RUN_PIPELINE_TESTS", default=True)
_run_tool_tests = parse_flag_from_env("RUN_TOOL_TESTS", default=False)
def is_pt_tf_cross_test(test_case): def is_pt_tf_cross_test(test_case):
...@@ -221,6 +222,21 @@ def is_pipeline_test(test_case): ...@@ -221,6 +222,21 @@ def is_pipeline_test(test_case):
return pytest.mark.is_pipeline_test()(test_case) return pytest.mark.is_pipeline_test()(test_case)
def is_tool_test(test_case):
"""
Decorator marking a test as a tool test. If RUN_TOOL_TESTS is set to a falsy value, those tests will be skipped.
"""
if not _run_tool_tests:
return unittest.skip("test is a tool test")(test_case)
else:
try:
import pytest # We don't need a hard dependency on pytest in the main library
except ImportError:
return test_case
else:
return pytest.mark.is_tool_test()(test_case)
def slow(test_case): def slow(test_case):
""" """
Decorator marking a test as slow. Decorator marking a test as slow.
......
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
from ..utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)
_import_structure = {
"agents": ["Agent", "HfAgent", "OpenAiAgent"],
"base": ["PipelineTool", "RemoteTool", "Tool", "launch_gradio_demo", "load_tool"],
}
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["document_question_answering"] = ["DocumentQuestionAnsweringTool"]
_import_structure["image_captioning"] = ["ImageCaptioningTool"]
_import_structure["image_question_answering"] = ["ImageQuestionAnsweringTool"]
_import_structure["image_segmentation"] = ["ImageSegmentationTool"]
_import_structure["language_identifier"] = ["LanguageIdentificationTool"]
_import_structure["speech_to_text"] = ["SpeechToTextTool"]
_import_structure["text_classification"] = ["TextClassificationTool"]
_import_structure["text_question_answering"] = ["TextQuestionAnsweringTool"]
_import_structure["text_summarization"] = ["TextSummarizationTool"]
_import_structure["text_to_speech"] = ["TextToSpeechTool"]
_import_structure["translation"] = ["TranslationTool"]
if TYPE_CHECKING:
from .agents import Agent, HfAgent, OpenAiAgent
from .base import PipelineTool, RemoteTool, Tool, launch_gradio_demo, load_tool
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .document_question_answering import DocumentQuestionAnsweringTool
from .image_captioning import ImageCaptioningTool
from .image_question_answering import ImageQuestionAnsweringTool
from .image_segmentation import ImageSegmentationTool
from .language_identifier import LanguageIdentificationTool
from .speech_to_text import SpeechToTextTool
from .text_classification import TextClassificationTool
from .text_question_answering import TextQuestionAnsweringTool
from .text_summarization import TextSummarizationTool
from .text_to_speech import TextToSpeechTool
from .translation import TranslationTool
else:
import sys
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import importlib.util
import json
import os
import time
from dataclasses import dataclass
import requests
from huggingface_hub import HfFolder, hf_hub_download, list_spaces
from ..utils import is_openai_available, logging
from .base import TASK_MAPPING, TOOL_CONFIG_FILE, Tool, load_tool, supports_remote
from .prompts import CHAT_MESSAGE_PROMPT, CHAT_PROMPT_TEMPLATE, RUN_PROMPT_TEMPLATE
from .python_interpreter import evaluate
logger = logging.get_logger(__name__)
if is_openai_available():
import openai
_tools_are_initialized = False
BASE_PYTHON_TOOLS = {
"print": print,
"float": float,
"int": int,
"bool": bool,
"str": str,
}
@dataclass
class PreTool:
task: str
description: str
repo_id: str
HUGGINGFACE_DEFAULT_TOOLS = {}
HUGGINGFACE_DEFAULT_TOOLS_FROM_HUB = [
"image-transformation",
"text-download",
"text-to-image",
"text-to-video",
]
def get_remote_tools(organization="huggingface-tools"):
spaces = list_spaces(author=organization)
tools = {}
for space_info in spaces:
repo_id = space_info.id
resolved_config_file = hf_hub_download(repo_id, TOOL_CONFIG_FILE, repo_type="space")
with open(resolved_config_file, encoding="utf-8") as reader:
config = json.load(reader)
task = repo_id.split("/")[-1]
tools[config["name"]] = PreTool(task=task, description=config["description"], repo_id=repo_id)
return tools
def _setup_default_tools():
global HUGGINGFACE_DEFAULT_TOOLS
global _tools_are_initialized
if _tools_are_initialized:
return
main_module = importlib.import_module("transformers")
tools_module = main_module.tools
remote_tools = get_remote_tools()
for task_name in TASK_MAPPING:
tool_class_name = TASK_MAPPING.get(task_name)
tool_class = getattr(tools_module, tool_class_name)
description = tool_class.description
HUGGINGFACE_DEFAULT_TOOLS[tool_class.name] = PreTool(task=task_name, description=description, repo_id=None)
for task_name in HUGGINGFACE_DEFAULT_TOOLS_FROM_HUB:
found = False
for tool_name, tool in remote_tools.items():
if tool.task == task_name:
HUGGINGFACE_DEFAULT_TOOLS[tool_name] = tool
found = True
break
if not found:
raise ValueError(f"{task_name} is not implemented on the Hub.")
_tools_are_initialized = True
def resolve_tools(code, toolbox, remote=False, cached_tools=None):
if cached_tools is None:
resolved_tools = BASE_PYTHON_TOOLS.copy()
else:
resolved_tools = cached_tools
for name, tool in toolbox.items():
if name not in code or name in resolved_tools:
continue
if isinstance(tool, Tool):
resolved_tools[name] = tool
else:
task_or_repo_id = tool.task if tool.repo_id is None else tool.repo_id
_remote = remote and supports_remote(task_or_repo_id)
resolved_tools[name] = load_tool(task_or_repo_id, remote=_remote)
return resolved_tools
def get_tool_creation_code(code, toolbox, remote=False):
code_lines = ["from transformers import load_tool", ""]
for name, tool in toolbox.items():
if name not in code or isinstance(tool, Tool):
continue
task_or_repo_id = tool.task if tool.repo_id is None else tool.repo_id
line = f'{name} = load_tool("{task_or_repo_id}"'
if remote:
line += ", remote=True"
line += ")"
code_lines.append(line)
return "\n".join(code_lines) + "\n"
def clean_code_for_chat(result):
lines = result.split("\n")
idx = 0
while idx < len(lines) and not lines[idx].lstrip().startswith("```"):
idx += 1
explanation = "\n".join(lines[:idx]).strip()
if idx == len(lines):
return explanation, None
idx += 1
start_idx = idx
while not lines[idx].lstrip().startswith("```"):
idx += 1
code = "\n".join(lines[start_idx:idx]).strip()
return explanation, code
def clean_code_for_run(result):
result = f"I will use the following {result}"
explanation, code = result.split("Answer:")
explanation = explanation.strip()
code = code.strip()
code_lines = code.split("\n")
if code_lines[0] in ["```", "```py", "```python"]:
code_lines = code_lines[1:]
if code_lines[-1] == "```":
code_lines = code_lines[:-1]
code = "\n".join(code_lines)
return explanation, code
class Agent:
"""
Base class for all agents which contains the main API methods.
Args:
chat_prompt_template (`str`, *optional*):
Pass along your own prompt if you want to override the default template for the `chat` method.
run_prompt_template (`str`, *optional*):
Pass along your own prompt if you want to override the default template for the `run` method.
additional_tools ([`Tool`], list of tools or dictionary with tool values, *optional*):
Any additional tools to include on top of the default ones. If you pass along a tool with the same name as
one of the default tools, that default tool will be overridden.
"""
def __init__(self, chat_prompt_template=None, run_prompt_template=None, additional_tools=None):
_setup_default_tools()
self.chat_prompt_template = CHAT_MESSAGE_PROMPT if chat_prompt_template is None else chat_prompt_template
self.run_prompt_template = RUN_PROMPT_TEMPLATE if run_prompt_template is None else run_prompt_template
self.toolbox = HUGGINGFACE_DEFAULT_TOOLS.copy()
if additional_tools is not None:
if isinstance(additional_tools, (list, tuple)):
additional_tools = {t.name: t for t in additional_tools}
elif not isinstance(additional_tools, dict):
additional_tools = {additional_tools.name: additional_tools}
replacements = {name: tool for name, tool in additional_tools.items() if name in HUGGINGFACE_DEFAULT_TOOLS}
self.toolbox.update(additional_tools)
if len(replacements) > 1:
names = "\n".join([f"- {n}: {t}" for n, t in replacements.items()])
logger.warn(
f"The following tools have been replaced by the ones provided in `additional_tools`:\n{names}."
)
elif len(replacements) == 1:
name = list(replacements.keys())[0]
logger.warn(f"{name} has been replaced by {replacements[name]} as provided in `additional_tools`.")
self.prepare_for_new_chat()
def format_prompt(self, task, chat_mode=False):
description = "\n".join([f"- {name}: {tool.description}" for name, tool in self.toolbox.items()])
if chat_mode:
if self.chat_history is None:
prompt = CHAT_PROMPT_TEMPLATE.replace("<<all_tools>>", description)
else:
prompt = self.chat_history
prompt += CHAT_MESSAGE_PROMPT.replace("<<task>>", task)
else:
prompt = self.run_prompt_template.replace("<<all_tools>>", description)
prompt = prompt.replace("<<prompt>>", task)
return prompt
def chat(self, task, *, return_code=False, remote=False, **kwargs):
"""
Sends a new request to the agent in a chat. Will use the previous ones in its history.
Args:
task (`str`): The task to perform
return_code (`bool`, *optional*, defaults to `False`):
Whether to just return code and not evaluate it.
remote (`bool`, *optional*, defaults to `False`):
Whether or not to use remote tools (inference endpoints) instead of local ones.
kwargs:
Any keyword argument to send to the agent when evaluating the code.
Example:
```py
from transformers import HfAgent
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
agent.chat("Draw me a picture of rivers and lakes")
agent.chat("Transform the picture so that there is a rock in there")
```
"""
prompt = self.format_prompt(task, chat_mode=True)
result = self.generate_one(prompt, stop=["Human:", "====="])
self.chat_history = prompt + result + "\n"
explanation, code = clean_code_for_chat(result)
print(f"==Explanation from the agent==\n{explanation}")
if code is not None:
print(f"\n\n==Code generated by the agent==\n{code}")
if not return_code:
print("\n\n==Result==")
self.cached_tools = resolve_tools(code, self.toolbox, remote=remote, cached_tools=self.cached_tools)
self.chat_state.update(kwargs)
return evaluate(code, self.cached_tools, self.chat_state, chat_mode=True)
else:
tool_code = get_tool_creation_code(code, self.toolbox, remote=remote)
return f"{tool_code}\n{code}"
def prepare_for_new_chat(self):
"""
Clears the history of prior calls to [`~Agent.chat`].
"""
self.chat_history = None
self.chat_state = {}
self.cached_tools = None
def run(self, task, *, return_code=False, remote=False, **kwargs):
"""
Sends a request to the agent.
Args:
task (`str`): The task to perform
return_code (`bool`, *optional*, defaults to `False`):
Whether to just return code and not evaluate it.
remote (`bool`, *optional*, defaults to `False`):
Whether or not to use remote tools (inference endpoints) instead of local ones.
kwargs:
Any keyword argument to send to the agent when evaluating the code.
Example:
```py
from transformers import HfAgent
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
agent.run("Draw me a picture of rivers and lakes")
```
"""
prompt = self.format_prompt(task)
result = self.generate_one(prompt, stop=["Task:"])
explanation, code = clean_code_for_run(result)
print(f"==Explanation from the agent==\n{explanation}")
print(f"\n\n==Code generated by the agent==\n{code}")
if not return_code:
print("\n\n==Result==")
self.cached_tools = resolve_tools(code, self.toolbox, remote=remote, cached_tools=self.cached_tools)
return evaluate(code, self.cached_tools, state=kwargs.copy())
else:
tool_code = get_tool_creation_code(code, self.toolbox, remote=remote)
return f"{tool_code}\n{code}"
def generate_one(self, prompt, stop):
# This is the method to implement in your custom agent.
raise NotImplementedError
def generate_many(self, prompts, stop):
# Override if you have a way to do batch generation faster than one by one
return [self.generate_one(prompt, stop) for prompt in prompts]
class OpenAiAgent(Agent):
"""
Agent that uses the openai API to generate code.
<Tip warning={true}>
The openAI models are used in generation mode, so even for the `chat()` API, it's better to use models like
`"text-davinci-003"` over the chat-GPT variant. Proper support for chat-GPT models will come in a next version.
</Tip>
Args:
model (`str`, *optional*, defaults to `"text-davinci-003"`):
The name of the OpenAI model to use.
api_key (`str`, *optional*):
The API key to use. If unset, will look for the environment variable `"OPENAI_API_KEY"`.
chat_prompt_template (`str`, *optional*):
Pass along your own prompt if you want to override the default template for the `chat` method.
run_prompt_template (`str`, *optional*):
Pass along your own prompt if you want to override the default template for the `run` method.
additional_tools ([`Tool`], list of tools or dictionary with tool values, *optional*):
Any additional tools to include on top of the default ones. If you pass along a tool with the same name as
one of the default tools, that default tool will be overridden.
Example:
```py
from transformers import OpenAiAgent
agent = OpenAiAgent(model="text-davinci-003", api_key=xxx)
agent.run("Is the following `text` (in Spanish) positive or negative?", text="¡Este es un API muy agradable!")
```
"""
def __init__(
self,
model="text-davinci-003",
api_key=None,
chat_prompt_template=None,
run_prompt_template=None,
additional_tools=None,
):
if not is_openai_available():
raise ImportError("Using `OpenAiAgent` requires `openai`: `pip install openai`.")
if api_key is None:
api_key = os.environ.get("OPENAI_API_KEY", None)
if api_key is None:
raise ValueError(
"You need an openai key to use `OpenAIAgent`. You can get one here: Get one here "
"https://openai.com/api/`. If you have one, set it in your env with `os.environ['OPENAI_API_KEY'] = "
"xxx."
)
else:
openai.api_key = api_key
self.model = model
super().__init__(
chat_prompt_template=chat_prompt_template,
run_prompt_template=run_prompt_template,
additional_tools=additional_tools,
)
def generate_many(self, prompts, stop):
if "gpt" in self.model:
return [self._chat_generate(prompt, stop) for prompt in prompts]
else:
return self._completion_generate(prompts, stop)
def generate_one(self, prompt, stop):
if "gpt" in self.model:
return self._chat_generate(prompt, stop)
else:
return self._completion_generate([prompt], stop)[0]
def _chat_generate(self, prompt, stop):
result = openai.ChatCompletion.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0,
stop=stop,
)
return result["choices"][0]["message"]["content"]
def _completion_generate(self, prompts, stop):
result = openai.Completion.create(
model=self.model,
prompt=prompts,
temperature=0,
stop=stop,
max_tokens=200,
)
return [answer["text"] for answer in result["choices"]]
class HfAgent(Agent):
"""
Agent that uses and inference endpoint to generate code.
Args:
url_endpoint (`str`):
The name of the url endpoint to use.
token (`str`, *optional*):
The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when
running `huggingface-cli login` (stored in `~/.huggingface`).
chat_prompt_template (`str`, *optional*):
Pass along your own prompt if you want to override the default template for the `chat` method.
run_prompt_template (`str`, *optional*):
Pass along your own prompt if you want to override the default template for the `run` method.
additional_tools ([`Tool`], list of tools or dictionary with tool values, *optional*):
Any additional tools to include on top of the default ones. If you pass along a tool with the same name as
one of the default tools, that default tool will be overridden.
Example:
```py
from transformers import HfAgent
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
agent.run("Is the following `text` (in Spanish) positive or negative?", text="¡Este es un API muy agradable!")
```
"""
def __init__(
self, url_endpoint, token=None, chat_prompt_template=None, run_prompt_template=None, additional_tools=None
):
self.url_endpoint = url_endpoint
if token is None:
self.token = f"Bearer {HfFolder().get_token()}"
elif token.startswith("Bearer") or token.startswith("Basic"):
self.token = token
else:
self.token = f"Bearer {token}"
super().__init__(
chat_prompt_template=chat_prompt_template,
run_prompt_template=run_prompt_template,
additional_tools=additional_tools,
)
def generate_one(self, prompt, stop):
headers = {"Authorization": self.token}
inputs = {
"inputs": prompt,
"parameters": {"max_new_tokens": 200, "return_full_text": False, "stop": stop},
}
response = requests.post(self.url_endpoint, json=inputs, headers=headers)
if response.status_code == 429:
print("Getting rate-limited, waiting a tiny bit before trying again.")
time.sleep(1)
return self._generate_one(prompt)
elif response.status_code != 200:
raise ValueError(f"Error {response.status_code}: {response.json()}")
result = response.json()[0]["generated_text"]
# Inference API returns the stop sequence
for stop_seq in stop:
if result.endswith(stop_seq):
result = result[: -len(stop_seq)]
return result
This diff is collapsed.
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import re
from ..models.auto import AutoProcessor
from ..models.vision_encoder_decoder import VisionEncoderDecoderModel
from ..utils import is_vision_available
from .base import PipelineTool
if is_vision_available():
from PIL import Image
class DocumentQuestionAnsweringTool(PipelineTool):
default_checkpoint = "naver-clova-ix/donut-base-finetuned-docvqa"
description = (
"This is a tool that answers a question about an document (pdf). It takes an input named `document` which "
"should be the document containing the information, as well as a `question` that is the question about the "
"document. It returns a text that contains the answer to the question."
)
name = "document_qa"
pre_processor_class = AutoProcessor
model_class = VisionEncoderDecoderModel
inputs = ["image", "text"]
outputs = ["text"]
def __init__(self, *args, **kwargs):
if not is_vision_available():
raise ValueError("Pillow must be installed to use the DocumentQuestionAnsweringTool.")
super().__init__(*args, **kwargs)
def encode(self, image: "Image", question: str):
task_prompt = "<s_docvqa><s_question>{user_input}</s_question><s_answer>"
prompt = task_prompt.replace("{user_input}", question)
decoder_input_ids = self.pre_processor.tokenizer(
prompt, add_special_tokens=False, return_tensors="pt"
).input_ids
pixel_values = self.pre_processor(image, return_tensors="pt").pixel_values
return {"decoder_input_ids": decoder_input_ids, "pixel_values": pixel_values}
def forward(self, inputs):
return self.model.generate(
inputs["pixel_values"].to(self.device),
decoder_input_ids=inputs["decoder_input_ids"].to(self.device),
max_length=self.model.decoder.config.max_position_embeddings,
early_stopping=True,
pad_token_id=self.pre_processor.tokenizer.pad_token_id,
eos_token_id=self.pre_processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=1,
bad_words_ids=[[self.pre_processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
).sequences
def decode(self, outputs):
sequence = self.pre_processor.batch_decode(outputs)[0]
sequence = sequence.replace(self.pre_processor.tokenizer.eos_token, "")
sequence = sequence.replace(self.pre_processor.tokenizer.pad_token, "")
sequence = re.sub(r"<.*?>", "", sequence, count=1).strip() # remove first task start token
sequence = self.pre_processor.token2json(sequence)
return sequence["answer"]
This diff is collapsed.
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
from ..models.auto import AutoModelForVision2Seq
from ..utils import requires_backends
from .base import PipelineTool
if TYPE_CHECKING:
from PIL import Image
class ImageCaptioningTool(PipelineTool):
default_checkpoint = "Salesforce/blip-image-captioning-base"
description = (
"This is a tool that generates a description of an image. It takes an input named `image` which should be the "
"image to caption, and returns a text that contains the description in English."
)
name = "image_captioner"
model_class = AutoModelForVision2Seq
inputs = ["image"]
outputs = ["text"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
super().__init__(*args, **kwargs)
def encode(self, image: "Image"):
return self.pre_processor(images=image, return_tensors="pt")
def forward(self, inputs):
return self.model.generate(**inputs)
def decode(self, outputs):
return self.pre_processor.batch_decode(outputs, skip_special_tokens=True)[0].strip()
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
import torch
from ..models.auto import AutoModelForVisualQuestionAnswering, AutoProcessor
from ..utils import requires_backends
from .base import PipelineTool
if TYPE_CHECKING:
from PIL import Image
class ImageQuestionAnsweringTool(PipelineTool):
default_checkpoint = "dandelin/vilt-b32-finetuned-vqa"
description = (
"This is a tool that answers a question about an image. It takes an input named `image` which should be the "
"image containing the information, as well as a `question` which should be the question in English. It "
"returns a text that is the answer to the question."
)
name = "image_qa"
pre_processor_class = AutoProcessor
model_class = AutoModelForVisualQuestionAnswering
inputs = ["image", "text"]
outputs = ["text"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
super().__init__(*args, **kwargs)
def encode(self, image: "Image", question: str):
return self.pre_processor(image, question, return_tensors="pt")
def forward(self, inputs):
with torch.no_grad():
return self.model(**inputs).logits
def decode(self, outputs):
idx = outputs.argmax(-1).item()
return self.model.config.id2label[idx]
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import torch
from ..models.clipseg import CLIPSegForImageSegmentation
from ..utils import is_vision_available, requires_backends
from .base import PipelineTool
if is_vision_available():
from PIL import Image
class ImageSegmentationTool(PipelineTool):
description = (
"This is a tool that creates a segmentation mask identifiying elements inside an image according to a prompt. "
"It takes two arguments named `image` which should be the original image, and `prompt` which should be a text "
"describing the elements what should be identified in the segmentation mask. The tool returns the mask as a "
"black-and-white image."
)
default_checkpoint = "CIDAS/clipseg-rd64-refined"
name = "image_segmenter"
model_class = CLIPSegForImageSegmentation
inputs = ["image", "text"]
outputs = ["image"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["vision"])
super().__init__(*args, **kwargs)
def encode(self, image: "Image", prompt: str):
self.pre_processor.image_processor.size = {"width": image.size[0], "height": image.size[1]}
return self.pre_processor(text=[prompt], images=[image], padding=True, return_tensors="pt")
def forward(self, inputs):
with torch.no_grad():
logits = self.model(**inputs).logits
return logits
def decode(self, outputs):
array = outputs.cpu().detach().numpy()
array[array <= 0] = 0
array[array > 0] = 1
return Image.fromarray((array * 255).astype(np.uint8))
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# docstyle-ignore
RUN_PROMPT_TEMPLATE = """I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task.
To help you, I will give you access to a set of tools that you can use. Each tool is a Python function and has a description explaining the task it performs, the inputs it expects and the outputs it returns.
You should first explain which tool you will use to perform the task and for what reason, then write the code in Python.
Each instruction in Python should be a simple assignment. You can print intermediate results if it makes sense to do so.
Tools:
<<all_tools>>
Task: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French."
I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image.
Answer:
```py
translated_question = translator(question=question, src_lang="French", tgt_lang="English")
print(f"The translated question is {translated_question}.")
answer = image_qa(image=image, question=translated_question)
print(f"The answer is {answer}")
```
Task: "Identify the oldest person in the `document` and create an image showcasing the result."
I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.
Answer:
```py
answer = document_qa(document, question="What is the oldest person?")
print(f"The answer is {answer}.")
image = image_generator(answer)
```
Task: "Generate an image using the text given in the variable `caption`."
I will use the following tool: `image_generator` to generate an image.
Answer:
```py
image = image_generator(prompt=caption)
```
Task: "Summarize the text given in the variable `text` and read it out loud."
I will use the following tools: `summarizer` to create a summary of the input text, then `text_reader` to read it out loud.
Answer:
```py
summarized_text = summarizer(text)
print(f"Summary: {summarized_text}")
audio_summary = text_reader(summarized_text)
```
Task: "Answer the question in the variable `question` about the text in the variable `text`. Use the answer to generate an image."
I will use the following tools: `text_qa` to create the answer, then `image_generator` to generate an image according to the answer.
Answer:
```py
answer = text_qa(text=text, question=question)
print(f"The answer is {answer}.")
image = image_generator(answer)
```
Task: "Caption the following `image`."
I will use the following tool: `image_captioner` to generate a caption for the image.
Answer:
```py
caption = image_captioner(image)
```
Task: "<<prompt>>"
I will use the following"""
# docstyle-ignore
CHAT_PROMPT_TEMPLATE = """Below are a series of dialogues between various people and an AI assistant specialized in coding. The AI assistant tries to be helpful, polite, honest, and humble-but-knowledgeable.
The job of the AI assistant is to come up with a series of simple commands in Python that will perform the task the human wants to perform.
To help with that, the AI assistant has access to a set of tools. Each tool is a Python function and has a description explaining the task it performs, the inputs it expects and the outputs it returns.
The AI assistant should first explain the tools it will use to perform the task and for what reason, then write the code in Python.
Each instruction in Python should be a simple assignment. The AI assistant can print intermediate results if it makes sense to do so.
Tools:
<<all_tools>>
=====
Human: Answer the question in the variable `question` about the image stored in the variable `image`.
Assistant: I will use the tool `image_qa` to answer the question on the input image.
```py
answer = image_qa(text=question, image=image)
print(f"The answer is {answer}")
```
Human: I tried this code, it worked but didn't give me a good result. The question is in French
Assistant: In this case, the question needs to be translated first. I will use the tool `translator` to do this.
```py
translated_question = translator(question=question, src_lang="French", tgt_lang="English")
print(f"The translated question is {translated_question}.")
answer = image_qa(text=translated_question, image=image)
print(f"The answer is {answer}")
```
=====
Human: Identify the oldest person in the `document`.
Assistant: I will use the tool `document_qa` to find the oldest person in the document.
```py
answer = document_qa(document, question="What is the oldest person?")
print(f"The answer is {answer}.")
```
Human: Can you generate an image with the result?
Assistant: I will use the tool `image_generator` to do that.
```py
image = image_generator(answer)
```
=====
Human: Summarize the text given in the variable `text` and read it out loud.
Assistant: I will use the tool `summarizer` to create a summary of the input text, then the tool `text_reader` to read it out loud.
```py
summarized_text = summarizer(text)
print(f"Summary: {summarized_text}")
audio_summary = text_reader(text=summary)
```
Human: I got the following error: "The variable `summary` is not defined."
Assistant: My bad! Let's try this code instead.
```py
summarized_text = summarizer(text)
print(f"Summary: {summarized_text}")
audio_summary = text_reader(text=summarized_text)
```
Human: It worked! Can you translate the summary in German?
Assistant: I will use the tool `translator` to translate the text in German.
```py
translated_summary = translator(summarized_text, src_lang="English", tgt_lang="German)
```
====
"""
# docstyle-ignore
CHAT_MESSAGE_PROMPT = """
Human: <<task>>
Assistant: """
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import ast
import difflib
from collections.abc import Mapping
from typing import Any, Callable, Dict
class InterpretorError(ValueError):
"""
An error raised when the interpretor cannot evaluate a Python expression, due to syntax error or unsupported
operations.
"""
pass
def evaluate(code: str, tools: Dict[str, Callable], state=None, chat_mode=False):
"""
Evaluate a python expression using the content of the variables stored in a state and only evaluating a given set
of functions.
This function will recurse through the nodes of the tree provided.
Args:
code (`str`):
The code to evaluate.
tools (`Dict[str, Callable]`):
The functions that may be called during the evaluation. Any call to another function will fail with an
`InterpretorError`.
state (`Dict[str, Any]`):
A dictionary mapping variable names to values. The `state` should contain the initial inputs but will be
updated by this function to contain all variables as they are evaluated.
chat_mode (`bool`, *optional*, defaults to `False`):
Whether or not the function is called from `Agent.chat`.
"""
try:
expression = ast.parse(code)
except SyntaxError as e:
print("The code generated by the agent is not valid.\n", e)
return
if state is None:
state = {}
result = None
for idx, node in enumerate(expression.body):
try:
line_result = evaluate_ast(node, state, tools)
except InterpretorError as e:
msg = f"Evaluation of the code stopped at line {idx} before the end because of the following error"
if chat_mode:
msg += (
f". Copy paste the following error message and send it back to the agent:\nI get an error: '{e}'"
)
else:
msg += f":\n{e}"
print(msg)
break
if line_result is not None:
result = line_result
return result
def evaluate_ast(expression: ast.AST, state: Dict[str, Any], tools: Dict[str, Callable]):
"""
Evaluate an absract syntax tree using the content of the variables stored in a state and only evaluating a given
set of functions.
This function will recurse trough the nodes of the tree provided.
Args:
expression (`ast.AST`):
The code to evaluate, as an abastract syntax tree.
state (`Dict[str, Any]`):
A dictionary mapping variable names to values. The `state` is updated if need be when the evaluation
encounters assignements.
tools (`Dict[str, Callable]`):
The functions that may be called during the evaluation. Any call to another function will fail with an
`InterpretorError`.
"""
if isinstance(expression, ast.Assign):
# Assignement -> we evaluate the assignement which should update the state
# We return the variable assigned as it may be used to determine the final result.
return evaluate_assign(expression, state, tools)
elif isinstance(expression, ast.Call):
# Function call -> we return the value of the function call
return evaluate_call(expression, state, tools)
elif isinstance(expression, ast.Constant):
# Constant -> just return the value
return expression.value
elif isinstance(expression, ast.Dict):
# Dict -> evaluate all keys and values
keys = [evaluate_ast(k, state, tools) for k in expression.keys]
values = [evaluate_ast(v, state, tools) for v in expression.values]
return dict(zip(keys, values))
elif isinstance(expression, ast.Expr):
# Expression -> evaluate the content
return evaluate_ast(expression.value, state, tools)
elif isinstance(expression, ast.FormattedValue):
# Formatted value (part of f-string) -> evaluate the content and return
return evaluate_ast(expression.value, state, tools)
elif isinstance(expression, ast.If):
# If -> execute the right branch
return evaluate_if(expression, state, tools)
elif hasattr(ast, "Index") and isinstance(expression, ast.Index):
return evaluate_ast(expression.value, state, tools)
elif isinstance(expression, ast.JoinedStr):
return "".join([str(evaluate_ast(v, state, tools)) for v in expression.values])
elif isinstance(expression, ast.List):
# List -> evaluate all elements
return [evaluate_ast(elt, state, tools) for elt in expression.elts]
elif isinstance(expression, ast.Name):
# Name -> pick up the value in the state
return evaluate_name(expression, state, tools)
elif isinstance(expression, ast.Subscript):
# Subscript -> return the value of the indexing
return evaluate_subscript(expression, state, tools)
else:
# For now we refuse anything else. Let's add things as we need them.
raise InterpretorError(f"{expression.__class__.__name__} is not supported.")
def evaluate_assign(assign, state, tools):
var_names = assign.targets
result = evaluate_ast(assign.value, state, tools)
if len(var_names) == 1:
state[var_names[0].id] = result
else:
if len(result) != len(var_names):
raise InterpretorError(f"Expected {len(var_names)} values but got {len(result)}.")
for var_name, r in zip(var_names, result):
state[var_name.id] = r
return result
def evaluate_call(call, state, tools):
if not isinstance(call.func, ast.Name):
raise InterpretorError(
f"It is not permitted to evaluate other functions than the provided tools (tried to execute {call.func} of "
f"type {type(call.func)}."
)
func_name = call.func.id
if func_name not in tools:
raise InterpretorError(
f"It is not permitted to evaluate other functions than the provided tools (tried to execute {call.func.id})."
)
func = tools[func_name]
# Todo deal with args
args = [evaluate_ast(arg, state, tools) for arg in call.args]
kwargs = {keyword.arg: evaluate_ast(keyword.value, state, tools) for keyword in call.keywords}
return func(*args, **kwargs)
def evaluate_subscript(subscript, state, tools):
index = evaluate_ast(subscript.slice, state, tools)
value = evaluate_ast(subscript.value, state, tools)
if isinstance(value, (list, tuple)):
return value[int(index)]
if index in value:
return value[index]
if isinstance(index, str) and isinstance(value, Mapping):
close_matches = difflib.get_close_matches(index, list(value.keys()))
if len(close_matches) > 0:
return value[close_matches[0]]
raise InterpretorError(f"Could not index {value} with '{index}'.")
def evaluate_name(name, state, tools):
if name.id in state:
return state[name.id]
close_matches = difflib.get_close_matches(name.id, list(state.keys()))
if len(close_matches) > 0:
return state[close_matches[0]]
raise InterpretorError(f"The variable `{name.id}` is not defined.")
def evaluate_condition(condition, state, tools):
if len(condition.ops) > 1:
raise InterpretorError("Cannot evaluate conditions with multiple operators")
left = evaluate_ast(condition.left, state, tools)
comparator = condition.ops[0]
right = evaluate_ast(condition.comparators[0], state, tools)
if isinstance(comparator, ast.Eq):
return left == right
elif isinstance(comparator, ast.NotEq):
return left != right
elif isinstance(comparator, ast.Lt):
return left < right
elif isinstance(comparator, ast.LtE):
return left <= right
elif isinstance(comparator, ast.Gt):
return left > right
elif isinstance(comparator, ast.GtE):
return left >= right
elif isinstance(comparator, ast.Is):
return left is right
elif isinstance(comparator, ast.IsNot):
return left is not right
elif isinstance(comparator, ast.In):
return left in right
elif isinstance(comparator, ast.NotIn):
return left not in right
else:
raise InterpretorError(f"Operator not supported: {comparator}")
def evaluate_if(if_statement, state, tools):
result = None
if evaluate_condition(if_statement.test, state, tools):
for line in if_statement.body:
line_result = evaluate_ast(line, state, tools)
if line_result is not None:
result = line_result
else:
for line in if_statement.orelse:
line_result = evaluate_ast(line, state, tools)
if line_result is not None:
result = line_result
return result
#!/usr/bin/env python
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ..models.whisper import WhisperForConditionalGeneration, WhisperProcessor
from .base import PipelineTool
class SpeechToTextTool(PipelineTool):
default_checkpoint = "openai/whisper-base"
description = (
"This is a tool that transcribes an audio into text. It takes an input named `audio` and returns the "
"transcribed text."
)
name = "transcriber"
pre_processor_class = WhisperProcessor
model_class = WhisperForConditionalGeneration
inputs = ["audio"]
outputs = ["text"]
def encode(self, audio):
return self.pre_processor(audio, return_tensors="pt").input_features
def forward(self, inputs):
return self.model.generate(inputs=inputs)
def decode(self, outputs):
return self.pre_processor.batch_decode(outputs, skip_special_tokens=True)[0]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment