Refine documentation for Tools (#23266)

* refine documentation for Tools * + one bugfix

Refine documentation for Tools (#23266)
* refine documentation for Tools * + one bugfix
f93509b1 · Sylvain Gugger · GitHub · 5f26a23d · f93509b1 · f93509b1
Unverified Commit f93509b1 authored May 10, 2023 by Sylvain Gugger Committed by GitHub May 10, 2023
4 changed files
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -102,7 +102,7 @@
    - local: community
      title: Community resources
    - local: custom_tools
-      title: Custom Tools
+      title: Custom Tools and Prompts
    - local: troubleshooting
      title: Troubleshoot
  title: Developer guides

--- a/docs/source/en/custom_tools.mdx
+++ b/docs/source/en/custom_tools.mdx
@@ -124,7 +124,9 @@ what the tool does and the second states what input arguments and return values

 A good tool name and tool description are very important for the agent to correctly use it. Note that the only
 information the agent has about the tool is its name and description, so one should make sure that both 
-are precisely written and match the style of the existing tools in the toolbox.
+are precisely written and match the style of the existing tools in the toolbox. In particular make sure the description
+mentions all the arguments expected by name in code-style, along with the expected type and a description of what they
+are.

 <Tip>

@@ -137,7 +139,7 @@ The third part includes a set of curated examples that show the agent exactly wh
 for what kind of user request. The large language models empowering the agent are extremely good at 
 recognizing patterns in a prompt and repeating the pattern with new data. Therefore, it is very important
 that the examples are written in a way that maximizes the likelihood of the agent to generating correct,
-  executable code in practice. 
+executable code in practice. 

 Let's have a look at one example:

@@ -466,7 +468,8 @@ The set of curated tools already has an `image_transformer` tool which is hereby

 Overwriting existing tools can be beneficial if we want to use a custom tool exactly for the same task as an existing tool 
 because the agent is well-versed in using the specific task. Beware that the custom tool should follow the exact same API 
-as the overwritten tool in this case.
+as the overwritten tool in this case, or you should adapt the prompt template to make sure all examples using that
+tool are updated.

 </Tip>

@@ -627,14 +630,14 @@ In order to let others benefit from it and for simpler initialization, we recomm
 namespace. To do so, just call `push_to_hub` on the `tool` variable:

 ```python
-tool.push_to_hub("lysandre/hf-model-downloads")
+tool.push_to_hub("hf-model-downloads")
 ```

 You now have your code on the Hub! Let's take a look at the final step, which is to have the agent use it.

 #### Having the agent use the tool

-We now have our tool that lives on the Hub which can be instantiated as such:
+We now have our tool that lives on the Hub which can be instantiated as such (change the user name for your tool):

 ```python
 from transformers import load_tool

--- a/docs/source/en/transformers_agents.mdx
+++ b/docs/source/en/transformers_agents.mdx
@@ -19,7 +19,8 @@ can vary as the APIs or underlying models are prone to change.

 </Tip>

-Transformers version v4.29.0, building on the concept of *tools* and *agents*.
+Transformers version v4.29.0, building on the concept of *tools* and *agents*. You can play with in
+[this colab](https://colab.research.google.com/drive/1c7MHD-T1forUPGcC_jlwsIptOzpG3hSj).

 In short, it provides a natural language API on top of transformers: we define a set of curated tools and design an 
 agent to interpret natural language and to use these tools. It is extensible by design; we curated some relevant tools, 
@@ -60,10 +61,19 @@ agent.run(
 ## Quickstart

 Before being able to use `agent.run`, you will need to instantiate an agent, which is a large language model (LLM). 
-We recommend using the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) checkpoint as it works very well 
-for the task at hand and is open-source, but please find other examples below.
+We provide support for openAI models as well as opensource alternatives from BigCode and OpenAssistant. The openAI
+models perform better (but require you to have an openAI API key, so cannot be used for free); Hugging Face is
+providing free access to endpoints for BigCode and OpenAssistant models.

-Start by logging in to have access to the Inference API:
+To use openAI models, you instantiate an [`OpenAiAgent`]:
+
+```py
+from transformers import OpenAiAgent
+
+agent = OpenAiAgent(model="text-davinci-003", api_key="<your_api_key>")
+```
+
+To use BigCode or OpenAssistant, start by logging in to have access to the Inference API:

 ```py
 from huggingface_hub import login
@@ -76,17 +86,22 @@ Then, instantiate the agent
 ```py
 from transformers import HfAgent

+# Starcoder
 agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
+# StarcoderBase
+# agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoderbase")
+# OpenAssistant
+# agent = HfAgent(url_endpoint="https://api-inference.huggingface.co/models/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5")
 ```

-This is using the inference API that Hugging Face provides for free at the moment if you have your inference
+This is using the inference API that Hugging Face provides for free at the moment. If you have your own inference
 endpoint for this model (or another one) you can replace the URL above with your URL endpoint.

 <Tip>

-We're showcasing StarCoder as the default in the documentation as the model is free to use and performs admirably well
-on simple tasks. However, the checkpoint doesn't hold up when handling more complex prompts. If you're facing such an
-issue, we recommend trying out the OpenAI model which, while sadly not open-source, performs better at this given time.
+StarCoder and OpenAssistant are free to use and perform admirably well on simple tasks. However, the checkpoints
+don't hold up when handling more complex prompts. If you're facing such an issue, we recommend trying out the OpenAI
+model which, while sadly not open-source, performs better at this given time.

 </Tip>

@@ -97,7 +112,7 @@ You're now good to go! Let's dive into the two APIs that you now have at your di
 The single execution method is when using the [`~Agent.run`] method of the agent:

 ```py
-agent.run("Draw me a picture of rivers and lakes")
+agent.run("Draw me a picture of rivers and lakes.")
 ```

 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
@@ -107,7 +122,7 @@ can perform one or several tasks in the same instruction (though the more comple
 the agent is to fail).

 ```py
-agent.chat("Draw me a picture of the sea then transform the picture to add an island.")
+agent.run("Draw me a picture of the sea then transform the picture to add an island")
 ```

 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/sea_and_island.png" width=200>
@@ -118,15 +133,16 @@ agent.chat("Draw me a picture of the sea then transform the picture to add an is
 Every [`~Agent.run`] operation is independent, so you can run it several times in a row with different tasks.

 Note that your `agent` is just a large-language model, so small variations in your prompt might yield completely
-different results. It's important to explain as clearly as possible the task you want to perform.
+different results. It's important to explain as clearly as possible the task you want to perform. We go more in-depth
+on how to write good prompts [here](custom_tools#writing-good-user-inputs).

 If you'd like to keep a state across executions or to pass non-text objects to the agent, you can do so by specifying
 variables that you would like the agent to use. For example, you could generate the first image of rivers and lakes, 
 and ask the model to update that picture to add an island by doing the following:

 ```python
-picture = agent.run("Draw me a picture of rivers and lakes")
-updated_picture = agent.chat("Take that `picture` and add an island to it", picture=picture)
+picture = agent.run("Generate a picture of rivers and lakes.")
+updated_picture = agent.run("Transform the image in `picture` to add an island to it.", picture=picture)
 ```

 <Tip>
@@ -155,7 +171,7 @@ agent.run("Draw me a picture of the `prompt`", prompt="a capybara swimming in th
 The agent also has a chat-based approach, using the [`~Agent.chat`] method:

 ```py
-agent.chat("Draw me a picture of rivers and lakes")
+agent.chat("Generate a picture of rivers and lakes")
 ```

 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200> 
@@ -197,6 +213,8 @@ agent.chat("Draw me a picture of rivers and lakes", remote=True)

 ### What's happening here? What are tools, and what are agents?

+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/diagram.png">
+
 #### Agents

 The "agent" here is a large language model, and we're prompting it so that it has access to a specific set of tools.
@@ -270,6 +288,7 @@ directly with the agent. We've added a few
 - **Text downloader**: to download a text from a web URL
 - **Text to image**: generate an image according to a prompt, leveraging stable diffusion
 - **Image transformation**: modify an image given an initial image and a prompt, leveraging instruct pix2pix stable diffusion
+- **Text to video**: generate a small video according to a prompt, leveraging damo-vilab

 The text-to-image tool we have been using since the beginning is a remote tool that lives in 
 [*huggingface-tools/text-to-image*](https://huggingface.co/spaces/huggingface-tools/text-to-image)! We will
@@ -278,32 +297,6 @@ continue releasing such tools on this and other organizations, to further superc
 The agents have by default access to tools that reside on `huggingface-tools`.
 We explain how to you can write and share your tools as well as leverage any custom tool that resides on the Hub in [following guide](custom_tools).

-### Leveraging different agents
-
-We showcase here how to use the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) model as an LLM, but 
-it isn't the only model available. We also support the OpenAssistant model and OpenAI's davinci models (3.5 and 4).
-
-We're planning on supporting local language models in an ulterior version.
-
-The tools defined in this implementation are agnostic to the agent used; we are showcasing the agents that work with 
-our prompts below, but the tools can also be used with Langchain, Minichain, or any other Agent-based library.
-
-#### Example code for the OpenAssistant model
-
-```py
-from transformers import HfAgent
-
-agent = HfAgent(url_endpoint="https://OpenAssistant/oasst-sft-1-pythia-12b", token="<HF_TOKEN>")
-```
-
-#### Example code for OpenAI models
-
-```py
-from transformers import OpenAiAgent
-
-agent = OpenAiAgent(model="text-davinci-003", api_key="<API_KEY>")
-```
-
 ### Code generation

 So far we have shown how to use the agents to perform actions for you. However, the agent is only generating code

--- a/src/transformers/tools/agents.py
+++ b/src/transformers/tools/agents.py
@@ -264,7 +264,9 @@ class Agent:
        """
        prompt = self.format_prompt(task, chat_mode=True)
        result = self.generate_one(prompt, stop=["Human:", "====="])
-        self.chat_history = prompt + result + "\n"
+        self.chat_history = prompt + result
+        if not self.chat_history.endswith("\n"):
+            self.chat_history += "\n"
        explanation, code = clean_code_for_chat(result)

        print(f"==Explanation from the agent==\n{explanation}")