Commit 396700dd authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #2603 failed with stages
in 0 seconds
# 1.1 Hello World
## Preparation
In this tutorial, we'll use `poetry` to manage our project dependencies. If you don't have `poetry` installed, you can install it by following the instructions [here](https://python-poetry.org/docs/).
## Creating A Project
You'll start by creating a new python project. You can name it whatever you like; for this tutorial, we'll call it `awel-tutorial`.
We suggest making a project directory in your home directory, but you can put it wherever you like.
Open a terminal and run the following commands to make a project directory and an AWEL tutorial directory:
For Linux, macOS, or PowerShell, enter this:
```bash
mkdir -p ~/projects
cd ~/projects
```
Then, run the following commands to create a new project and change to the new directory:
```bash
poetry new awel-tutorial
cd awel-tutorial
```
The tree of the project should look like this:
```plaintext
awel-tutorial
├── README.md
├── awel_tutorial
│   └── __init__.py
├── pyproject.toml
└── tests
└── __init__.py
```
## Adding DB-GPT Dependency
```bash
poetry add "dbgpt>=0.5.1"
```
## First Hello World
Next, you'll create a simple DAG that prints "Hello, world" to the console.
Now create a new file called `first_hello_world.py` in the `awel_tutorial` directory and add the following code:
```python
from dbgpt.core.awel import DAG, MapOperator
with DAG("awel_hello_world") as dag:
task = MapOperator(map_function=lambda x: print(f"Hello, {x}!"))
task._blocking_call(call_data="world")
```
Now, the tree of the project should look like this:
```plaintext
awel-tutorial
├── README.md
├── awel_tutorial
│ ├── __init__.py
│ └── first_hello_world.py
├── poetry.lock
├── pyproject.toml
└── tests
└── __init__.py
```
Then, run the following command to execute the code:
```bash
poetry run python awel_tutorial/first_hello_world.py
```
And you will see "Hello, world!" printed to the console.
```bash
Hello, world!
```
## Anatomy Of AWEL Code
Let's break down the code you just wrote.
```python
with DAG("awel_hello_world") as dag:
task = MapOperator(map_function=lambda x: print(f"Hello, {x}!"))
```
This code creates a new DAG(directed acyclic graph) with the name `awel_hello_world`.
The `MapOperator` is a simple operator that takes a function and calls it with the data
passed to it. In this case, the function is a lambda that prints "Hello, world!" to the console.
The task is the instance of the `MapOperator` class. And we call the `call` method of
the task with the `call_data` parameter set to `"world"`.
```python
task._blocking_call(call_data="world")
```
When you call the task, the lambda function is called with the data(`"world"`) you passed to it.
THe `_blocking_call` method is used to call the task in a blocking way. Just for
testing here, and we will find a better way to call the task in the next section.
## Hello World With `asyncio`
ALL task calls in AWEL are asynchronous. This example shows how to run the task with
asyncio.
Create a new file called `first_hello_world_asyncio.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator
with DAG("awel_hello_world") as dag:
task = MapOperator(map_function=lambda x: print(f"Hello, {x}!"))
asyncio.run(task.call(call_data="world"))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/first_hello_world_asyncio.py
```
And you will see "Hello, world!" printed to the console.
```bash
Hello, world!
```
## Hello World With Two Tasks
When we call a single node, we can pass data to it. This example shows how to pass data
to tasks with a InputOperator.
Create a new file called `first_hello_world_two_tasks.py` in the `awel_tutorial`
directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator, InputOperator, SimpleCallDataInputSource
with DAG("awel_hello_world") as dag:
input_task = InputOperator(
input_source=SimpleCallDataInputSource()
)
task = MapOperator(map_function=lambda x: print(f"Hello, {x}!"))
input_task >> task
asyncio.run(task.call(call_data="world"))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/first_hello_world_two_tasks.py
```
And you will see "Hello, world!" printed to the console.
```bash
Hello, world!
```
In this case, we have two tasks. The first task is an `InputOperator` that takes data
from the `SimpleCallDataInputSource`. The second task is a `MapOperator` that takes the
data from the first task and prints "Hello, world!" to the console.
And we use the `>>` operator to connect the two tasks. This operator is used to define
the parent-child relationship between tasks, also known as the task dependency.
You can define the task dependency by using the `set_downstream` method as well, flollowing is the example:
```python
input_task.set_downstream(task)
```
The one task DAG above is a special case of the two tasks DAG, where the `InputOperator` is not used.
```python
with DAG("awel_hello_world") as dag:
task = MapOperator(map_function=lambda x: print(f"Hello, {x}!"))
asyncio.run(task.call(call_data="world"))
```
## DAG Visualization
Install the graphviz package to visualize the DAG graph.
```bash
poetry add graphviz
```
Modify the `first_hello_world_two_tasks.py` file to add the following code:
```python
dag.visualize_dag()
```
The full code is like this:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator, InputOperator, SimpleCallDataInputSource
with DAG("awel_hello_world") as dag:
input_task = InputOperator(
input_source=SimpleCallDataInputSource()
)
task = MapOperator(map_function=lambda x: print(f"Hello, {x}!"))
input_task >> task
dag.visualize_dag()
asyncio.run(task.call(call_data="world"))
```
Run `first_hello_world_two_tasks.py` again:
```bash
poetry run python awel_tutorial/first_hello_world_two_tasks.py
```
You will see the following output:
```bash
InputOperator(node_id=a307d921-3bd0-423d-80f0-30aa25aaa9fe)
-> MapOperator(node_id=bdb335f8-179d-4e08-b1ec-3b58a52d1e84)
Hello, world!
```
The graph of the DAG is like this:
<p align="left">
<img src={'/img/awel/awel_tutorial/first_hello_world_two_tasks.png'} width="720px" />
</p>
# 1.2 How AWEL Works
## Introduction
Leet us look again at the DAG from the previous section:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator, InputOperator, SimpleCallDataInputSource
with DAG("awel_hello_world") as dag:
input_task = InputOperator(
input_source=SimpleCallDataInputSource()
)
task = MapOperator(map_function=lambda x: print(f"Hello, {x}!"))
input_task >> task
dag.visualize_dag()
asyncio.run(task.call(call_data="world"))
```
There code contains a few new concepts: `DAG`, `Operator`, and `Task`.
- `DAG`: `DAG` is a class that represents a **Directed Acyclic Graph**. It is used to
define the structure of the tasks and their dependencies.
- `Operator`: `InputOperator` and `MapOperator` are examples of operators. An operator
is a node in the DAG. It can be a source of data, a transformation, or a sink of data.
In this example, `InputOperator` is a source of data, and `MapOperator` is a
transformation.
- `Task`: A task is an instance of an operator, it is a dynamic concept.
- `Runner`: A runner is used to execute the tasks in the DAG. When we
call `task.call(call_data="world")`, we are using a runner to execute the task. The
`DefaultWorkflowRunner` is run your task in the same process. And the
`RayWorkflowRunner` is run your task in a Ray cluster(Not implemented yet in community
version).
## DAG
### What is a DAG?
A Directed Acyclic Graph (DAG) is a graph that has a set of vertices and a set of
directed edges. The edges are directed from one vertex to another, and there are no
cycles in the graph. In the context of AWEL, the vertices are the operators, and the
edges are the dependencies between the operators.
## Operator
### What is an Operator?
An operator is a node in the DAG. It can be a source of data, a transformation, or a
call to a LLM service. In the context of AWEL, an operator is a class that inherits
from the `dbgpt.core.awel.BaseOperator` class.
According to the type of output data, there are two types of operators:
**streaming operators** and **non-streaming operators**.
### Basic Operators
There are a few basic operators that are used to build up the more complex operators.
- `InputOperator`: This **non-streaming** operator is used to get data from an input
source.
- `MapOperator`: This **non-streaming** operator is used to apply a function to the
input data and return the transformed data.
- `BranchOperator`: This **non-streaming** operator is used to decide which path to run
based on the input data.
- `JoinOperator`: This **non-streaming** operator is used to join the data from multiple
paths into a single path.
- `StreamifyAbsOperator`: This **streaming** operator is used to convert the
non-streaming operator to a streaming operator.
- `UnstreamifyAbsOperator`: This **non-streaming** operator is used to convert the
streaming data to non-streaming data.
- `TransformStreamAbsOperator`: This **streaming** operator is used to transform the
streaming data to another streaming data.
- `ReduceStreamOperator`: This **non-streaming** operator is used to reduce the
streaming to non-streaming data.
- `TriggerOperator`: This **non-streaming** operator is used to trigger a task.
It is a special `InputOperator`
### High-level Operators
- `RequestBuilderOperator`: This **non-streaming** operator is used to build a model
request from the input data.
- `LLMOperator`: This **non-streaming** operator is used to call a LLM service.
- `StreamingLLMOperator`: This **streaming** operator is used to call a LLM service and
expect a streaming response.
- `LLMBranchOperator`: This **non-streaming** operator is used to decide which path to
run based on the input data.
- `OpenAIStreamingOutputOperator`: This **streaming** operator is transform the model
output to a streaming data compatible with the OpenAI.
- `ChatHistoryPromptComposerOperator`: This **non-streaming** operator is used to build
a high-level task to compose a chat history prompt.
## Task
### What is a Task?
Task is an instance of an operator. It is a stateless design, it means that the task
can be executed multiple times with different input data.
Every task can receive multiple input data from the parent tasks, and return a single
output data to the child tasks.
## Runner
### What is a Runner?
A runner is a class that is used to execute the tasks in the DAG. When we call a task
by `task.call(call_data="world")`, we are using a runner to execute the task. It will
trigger all the parent tasks of the task, and then execute the task.
The `DefaultWorkflowRunner` is run your task in the same process. And the
`RayWorkflowRunner` is run your task in a Ray cluster(Not implemented yet in community
version). Also, you can implement your own runner to run your task in your own
environment.
\ No newline at end of file
# 1.3 Custom Operator
## Your First Custom Operator
It is easy to create a custom operator in AWEL. In this section, we will create a
custom operator that prints the "Hello, world!" message.
In most cases, you just need to inherit basic operators and override the corresponding
methods.
Create a new file named `hello_world_custom_operator.py` in the `awel_tutorial`
directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator
class HelloWorldOperator(MapOperator[str, None]):
async def map(self, x: str) -> None:
print(f"Hello, {x}!")
with DAG("awel_hello_world") as dag:
task = HelloWorldOperator()
asyncio.run(task.call(call_data="world"))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/hello_world_custom_operator.py
```
And you will see "Hello, world!" printed to the console.
```bash
Hello, world!
```
## Your First Streaming Operator
Let's create a streaming operator that creates a stream of numbers from 0 to `n-1`,
then doubles each number in another streaming operator.
Create a new file named `custom_streaming_operator.py` in the `awel_tutorial`
```python
import asyncio
from typing import AsyncIterator
from dbgpt.core.awel import DAG, StreamifyAbsOperator, TransformStreamAbsOperator
class NumberProducerOperator(StreamifyAbsOperator[int, int]):
async def streamify(self, n: int) -> AsyncIterator[int]:
for i in range(n):
yield i
class NumberDoubleOperator(TransformStreamAbsOperator[int, int]):
async def transform_stream(self, it: AsyncIterator) -> AsyncIterator[int]:
async for i in it:
# Double the number
yield i * 2
with DAG("numbers_dag") as dag:
task = NumberProducerOperator()
double_task = NumberDoubleOperator()
task >> double_task
async def helper_call_fn(t, n: int):
# Call the streaming operator by `call_stream` method
async for i in await t.call_stream(call_data=n):
print(i)
asyncio.run(helper_call_fn(double_task, 10))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/custom_streaming_operator.py
```
And you will see the following output printed to the console.
```bash
0
2
4
6
8
10
12
14
16
18
```
In this example, we call the `call_stream` method to execute the streaming operator,
please don't forget to use `await` to get the streaming result.
\ No newline at end of file
# 3.1 Http Trigger
## Introduction
In this chapter, we will start to focus on how to use AWEL to develop a network program.
First, we will create a simple HTTP trigger that receives a request and returns a response.
`HttpTrigger` is a special `InputOperator`.
## Installation
We have already created a project named `awel-tutorial` in the
[previous chapter](/docs/awel/awel_tutorial/getting_started/1.1_hello_world#creating-a-project)
and added the `dbgpt` dependency.
To use the `HttpTrigger` operator, we need to install the `fastapi` and `uvicorn` packages.
```bash
poetry add fastapi uvicorn
```
The output should look like this:
```plaintext
➜ awel-tutorial poetry add fastapi uvicorn
Using version ^0.110.0 for fastapi
Using version ^0.27.1 for uvicorn
Updating dependencies
Resolving dependencies... (2.7s)
Package operations: 7 installs, 0 updates, 0 removals
• Installing sniffio (1.3.1)
• Installing anyio (4.3.0)
• Installing click (8.1.7)
• Installing h11 (0.14.0)
• Installing starlette (0.36.3)
• Installing fastapi (0.110.0)
• Installing uvicorn (0.27.1)
Writing lock file
```
## First HTTP Trigger
Create a new file named `frist_http_trigger_hello.py` in the `awel_tutorial` directory and add the following code:
```python
from dbgpt.core.awel import DAG, HttpTrigger, MapOperator, setup_dev_environment
with DAG("awel_hello_world") as dag:
trigger_task = HttpTrigger(endpoint="/awel_tutorial/hello_world")
task = MapOperator(map_function=lambda x: f"Hello, world!")
trigger_task >> task
setup_dev_environment([dag], port=5555)
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/frist_http_trigger_hello.py
```
And the main output should look like this:
```plaintext
2024-03-03 16:26:57 | INFO | dbgpt.core.awel.trigger.http_trigger | Mount http trigger success, path: /api/v1/awel/trigger/awel_tutorial/hello_world
2024-03-03 16:26:57 | INFO | dbgpt.core.awel.trigger.trigger_manager | Include router <fastapi.routing.APIRouter object at 0x10ed64e50> to prefix path /api/v1/awel/trigger
INFO: Started server process [69774]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:5555 (Press CTRL+C to quit)
```
In AWEL, all HTTP endpoints are prefixed with `/api/v1/awel/trigger` by default.
Now, open a new terminal and run the following command to send a request to the server:
```bash
curl -X GET http://127.0.0.1:5555/api/v1/awel/trigger/awel_tutorial/hello_world
```
The output should look like this:
```plaintext
"Hello, world!"
```
Congratulations! You have created your first HTTP trigger.
Then you can stop the server by pressing `Ctrl+C` in the terminal.
## How It Works
In above code, we created a `HttpTrigger` operator and a `MapOperator` operator.
`HttpTrigger` defines the endpoint of the HTTP request, and the method of the request
is "GET" by default.
The `setup_dev_environment` function is used to start the server and register dags, it
will block the main thread if there are `HttpTrigger` operators in the DAG and listen
on 5555 port by default.
When the server receives a request, it will call the `MapOperator` operator to process
the request and return the result.
In `HttpTrigger`, you can configure the endpoint, method, request body, response body,
response status code, etc.
In next section, we will introduce more about the `HttpTrigger`.
# 3.2 Handling GET Requests
In previous section, we created a simple HTTP trigger that returns a fixed string. In
this section, we will create a new HTTP trigger that returns a string based on the
query parameters of the request.
## Say Hello To Someone
Before we start writing the code, we need to install the `pydantic` package in your
project [awel-tutorial](/docs/awel/awel_tutorial/getting_started/1.1_hello_world#creating-a-projec)
```bash
poetry add "pydantic>=2.6.0"
```
Then create a new file named `http_trigger_say_hello.py` in the `awel_tutorial` directory and add the following code:
```python
from dbgpt._private.pydantic import BaseModel, Field
from dbgpt.core.awel import DAG, HttpTrigger, MapOperator, setup_dev_environment
class TriggerReqBody(BaseModel):
name: str = Field(..., description="User name")
age: int = Field(18, description="User age")
with DAG("awel_say_hello") as dag:
trigger_task = HttpTrigger(
endpoint="/awel_tutorial/say_hello",
methods="GET",
request_body=TriggerReqBody,
status_code=200
)
task = MapOperator(
map_function=lambda x: f"Hello, {x.name}! You are {x.age} years old."
)
trigger_task >> task
setup_dev_environment([dag], port=5555)
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/http_trigger_say_hello.py
```
Now, open a new terminal and run the following command to send a GET request to the server:
```bash
curl -X GET \
"http://127.0.0.1:5555/api/v1/awel/trigger/awel_tutorial/say_hello?name=John&age=25"
```
The output should look like this:
```plaintext
"Hello, John! You are 25 years old."
```
Then you can stop the server by pressing `Ctrl+C`.
In above code, we created a `TriggerReqBody` class that inherits from `BaseModel` to
define the structure of the request body. The module `dbgpt._private.pydantic` wraps
the pydantic for the compatibility of different versions of pydantic.
When receiving a request, http trigger will parse the query parameters and build a
`TriggerReqBody` object, then pass it to the next operator.
## Return JSON Response
In previous section, we returned a string as the response. And we can also return a
JSON response.
AWEL use `fastapi` as the default web framework, and it will automatically judge the
response type of your task output, common `dict`, `list`, `BaseModel` of `pydantic`,
etc. will be automatically converted to JSON response.
So if you want to return a JSON response, you can simply return a `dict` or a `BaseModel`.
Create a new file named `http_trigger_say_hello_json.py` in the `awel_tutorial` directory and add the following code:
```python
from dbgpt._private.pydantic import BaseModel, Field
from dbgpt.core.awel import DAG, HttpTrigger, MapOperator, setup_dev_environment
class TriggerReqBody(BaseModel):
name: str = Field(..., description="User name")
age: int = Field(18, description="User age")
with DAG("awel_say_hello_json") as dag:
trigger_task = HttpTrigger(
endpoint="/awel_tutorial/say_hello_json",
methods="GET",
request_body=TriggerReqBody,
)
task = MapOperator(
map_function=lambda x: {"message": f"Hello, {x.name}! You are {x.age} years old."}
)
trigger_task >> task
setup_dev_environment([dag], port=5555)
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/http_trigger_say_hello_json.py
```
Now, open a new terminal and run the following command to send a GET request to the server:
```bash
curl -X GET \
"http://127.0.0.1:5555/api/v1/awel/trigger/awel_tutorial/say_hello_json?name=John&age=25"
```
And you will get the following output:
```plaintext
{"message":"Hello, John! You are 25 years old."}
```
Then you can stop the server by pressing `Ctrl+C`.
\ No newline at end of file
# 3.3 Handling Post Requests
The `HttpTrigger` operator can also handle POST requests. In this section, we will
create a new HTTP trigger that returns a json response based on the request body of the POST request.
## Say Hello To Someone
Create a new file named `http_trigger_say_hello_post.py` in the `awel_tutorial` directory and add the following code:
```python
from dbgpt._private.pydantic import BaseModel, Field
from dbgpt.core.awel import DAG, HttpTrigger, MapOperator, setup_dev_environment
class TriggerReqBody(BaseModel):
name: str = Field(..., description="User name")
age: int = Field(18, description="User age")
with DAG("awel_say_hello_post") as dag:
trigger_task = HttpTrigger(
endpoint="/awel_tutorial/say_hello_post",
methods="POST",
request_body=TriggerReqBody,
status_code=200
)
task = MapOperator(
map_function=lambda x: {"message": f"Hello, {x.name}! You are {x.age} years old."}
)
trigger_task >> task
setup_dev_environment([dag], port=5555)
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/http_trigger_say_hello_post.py
```
Now, open a new terminal and run the following command to send a POST request to the server:
```bash
curl -X POST \
"http://127.0.0.1:5555/api/v1/awel/trigger/awel_tutorial/say_hello_post" \
-H "Content-Type: application/json" \
-d '{"name": "John", "age": 25}'
```
The output should look like this:
```plaintext
{"message":"Hello, John! You are 20 years old."}
```
Then you can stop the server by pressing `Ctrl+C`.
\ No newline at end of file
# 3.4 Handling Streaming Requests
In this section, we will create a new HTTP trigger that returns a streaming response
based on the request body of the POST request.
## Stream The Numbers
Create a new file named `http_trigger_stream_numbers.py` in the `awel_tutorial` directory
```python
from dbgpt._private.pydantic import BaseModel, Field
from dbgpt.core.awel import DAG, HttpTrigger, StreamifyAbsOperator, setup_dev_environment
from typing import AsyncIterator
class TriggerReqBody(BaseModel):
n: int = Field(..., description="The number of integers to be streamed")
class NumberProducerOperator(StreamifyAbsOperator[TriggerReqBody, int]):
"""Create a stream of numbers from 0 to `n-1`"""
async def streamify(self, req: TriggerReqBody) -> AsyncIterator[int]:
for i in range(req.n):
yield str(i) + "\n"
with DAG("awel_stream_numbers") as dag:
trigger_task = HttpTrigger(
endpoint="/awel_tutorial/stream_numbers",
methods="POST",
request_body=TriggerReqBody,
status_code=200,
streaming_predict_func=lambda x: True
)
task = NumberProducerOperator()
trigger_task >> task
setup_dev_environment([dag], port=5555)
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/http_trigger_stream_numbers.py
```
Now, open a new terminal and run the following command to send a POST request to the server:
```bash
curl -X POST \
"http://127.0.0.1:5555/api/v1/awel/trigger/awel_tutorial/stream_numbers" \
-H "Content-Type: application/json" \
-d '{"n": 5}'
```
The output should look like this:
```plaintext
0
1
2
3
4
```
Then you can stop the server by pressing `Ctrl+C`.
In this example, we created a `HttpTrigger` operator with a streaming predict function
which is used to determine whether to stream the response(it always returns `True` in this example).
\ No newline at end of file
# Embedding Process Workflow
# Introduction
the traditional knowledge extraction preparation process of Native RAG aims at the process of turning documents into databases, including reading unstructured documents-&gt; knowledge slices-&gt; document slices turning-&gt; import vector databases.
# Applicable Scenarios
+ supports simple intelligent question and answer scenarios and recalls context information through semantic similarity.
+ Users can cut and add existing embedded processing processes according to their own business scenarios.
# How to use
+ enter the AWEL interface and add a workflow
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734354927468-feed0ac7-e0fe-45e8-b85c-aba170084f82.png)
+ import Knowledge Processing Template
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734358060884-672d3157-a2ee-498b-887e-ea51f1caddae.png)
+ adjust parameters and save
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734358170081-32d38282-7765-4bbf-9bf7-c068550907d1.png)
- `document knowledge loader operator `: Knowledge loading factory, by loading the specified document type, find the corresponding document processor for document content parsing.
- `Document Chunk Manager operator `: Slice the loaded document content according to the specified slicing parameters.
- `Vector storage machining operator `: You can connect different vector databases for vector storage, and you can also connect different Embedding models and services for vector extraction.
+ Register Post as http request
```bash
curl --location --request POST 'http://localhost:5670/api/v1/awel/trigger/rag/knowledge/embedding/process' \
--header 'Content-Type: application/json' \
--data-raw '{}'
```
```bash
[
{
"content": "\"What is AWEL?\": Agentic Workflow Expression Language(AWEL) is a set of intelligent agent workflow expression language specially designed for large model application\ndevelopment. It provides great functionality and flexibility. Through the AWEL API, you can focus on the development of business logic for LLMs applications\nwithout paying attention to cumbersome model and environment details. \nAWEL adopts a layered API design. AWEL's layered API design architecture is shown in the figure below. \n<p align=\"left\">\n<img src={'/img/awel.png'} width=\"480px\"/>\n</p>",
"metadata": {
"Header1": "What is AWEL?",
"source": "../../docs/docs/awel/awel.md"
},
"chunk_id": "c1ffa671-76d0-4c7a-b2dd-0b08dfd37712",
"chunk_name": "",
"score": 0.0,
"summary": "",
"separator": "\n",
"retriever": null
},...
]
```
# Hybrid Knowledge Process Workflow
# Introduction
At present, the DB-GPT knowledge base provides knowledge processing capabilities such as `document uploading` ->` parsing` ->` chunking` ->` Embedding` -> `Knowledge Graph triple extraction `-> `vector database storage` -> `graph database storage`, but it does not have the ability to extract complex information from documents, including vector extraction and Knowledge Graph extraction from document blocks at the same time. The hybrid knowledge processing template defines complex knowledge processing workflow, it also supports document vector extraction, Keyword extraction and Knowledge Graph extraction.
# Applicable Scenarios
+ It is not limited to the traditional, single knowledge processing process (only Embedding processing or knowledge graph extraction processing), knowledge processing workflow implements Embedding and Knowledge Graph extraction at the same time, as a mixed knowledge recall retrieval data storage.
+ Users can tailor and add existing knowledge processing processes based on their own business scenarios.
# How to use
+ Enter the AWEL interface and add a workflow
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734354927468-feed0ac7-e0fe-45e8-b85c-aba170084f82.png)
+ Import Knowledge Processing Template
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734357236704-5a15be65-3d11-4406-98d7-efb82e5142dc.png)
+ Adjust parameters and save
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734355123947-3e252e59-2b2a-4bca-adef-13a93ee6cdf3.png)
- `Document knowledge loading operator `: Knowledge loading factory, by loading the specified document type, find the corresponding document processor for document content parsing.
- `Document Chunk slicing operator `: Slice the loaded document content according to the specified slicing parameters.
- `Knowledge Processing branch operator `: You can connect different knowledge processing processes, including knowledge map processing processes, vector processing processes, and keyword processing processes.
- `Vector storage machining operator `: You can connect different vector databases for vector storage, and you can also connect different Embedding models and services for vector extraction.
- `Knowledge Graph processing operator `: You can connect different knowledge graph processing operators, including native knowledge graph processing operators and community summary Knowledge Graph processing operators. You can also specify different graph databases for storage. Currently, TuGraph databases are supported.
- `Result aggregation operator `: Summarize the results of vector extraction and Knowledge Graph extraction.
+ Register Post as http request
```bash
curl --location --request POST 'http://localhost:5670/api/v1/awel/trigger/rag/knowledge/hybrid/process' \
--header 'Content-Type: application/json' \
--data-raw '{}'
```
```bash
[
"async persist vector store success 1 chunks.",
"async persist graph store success 1 chunks."
]
```
# Knowledge Graph Process Workflow
# Introduction
Unlike traditional Native RAG, which requires vectors as data carriers, GraphRAG requires triple extraction (entity -> relationship -> entity) to build a knowledge graph, so the entire knowledge processing can also be regarded as the process of building a knowledge graph.
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734357331126-a3a96fd7-c8fb-4208-8e3b-be798d1b73b4.png)
# Applicable Scenarios
+ It is necessary to use GraphRAG ability to mine the relationship between knowledge for multi-step reasoning.
+ Make up for the lack of integrity of Naive RAG in the recall context.
# How to use
+ Enter the AWEL interface and add a workflow
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734354927468-feed0ac7-e0fe-45e8-b85c-aba170084f82.png)
+ Import Knowledge Processing Template
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734356276305-a6e03aff-ba89-40c4-be2d-f88dff29d0f5.png)
+ Adjust parameters and save
![](https://intranetproxy.alipay.com/skylark/lark/0/2024/png/26456775/1734356745373-4e449611-d0bc-4735-b142-0aebafaa34d6.png)
- `document knowledge loading operator `: Knowledge loading factory, by loading the specified document type, find the corresponding document processor for document content parsing.
- `Document Chunk slicing operator `: Slice the loaded document content according to the specified slicing parameters.
- `Knowledge Graph processing operator `: You can connect different knowledge graph processing operators, including native knowledge graph processing operators and community summary Knowledge Graph processing operators. You can also specify different graph databases for storage. Currently, TuGraph databases are supported.
+ Register Post as http request
```bash
curl --location --request POST 'http://localhost:5670/api/v1/awel/trigger/rag/knowledge/kg/process' \
--header 'Content-Type: application/json' \
--data-raw '{}'
```
```bash
[
{
"content": "\"What is AWEL?\": Agentic Workflow Expression Language(AWEL) is a set of intelligent agent workflow expression language specially designed for large model application\ndevelopment. It provides great functionality and flexibility. Through the AWEL API, you can focus on the development of business logic for LLMs applications\nwithout paying attention to cumbersome model and environment details. \nAWEL adopts a layered API design. AWEL's layered API design architecture is shown in the figure below. \n<p align=\"left\">\n<img src={'/img/awel.png'} width=\"480px\"/>\n</p>",
"metadata": {
"Header1": "What is AWEL?",
"source": "../../docs/docs/awel/awel.md"
},
"chunk_id": "c1ffa671-76d0-4c7a-b2dd-0b08dfd37712",
"chunk_name": "",
"score": 0.0,
"summary": "",
"separator": "\n",
"retriever": null
},...
]
```
# Build Data analysis Copilot use AWEL
\ No newline at end of file
# RAG With AWEL
In this example, we will show how to use the AWEL library to create a RAG program.
Now, let us create a python file `first_rag_with_awel.py`.
In this example, we will load your knowledge from a URL and store it in a vector store.
### Install Dependencies
First, you need to install the `dbgpt` library.
```bash
pip install "dbgpt[rag]>=0.5.2"
````
### Prepare Embedding Model
To store the knowledge in a vector store, we need an embedding model, DB-GPT supports
a lot of embedding models, here are some of them:
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<Tabs
defaultValue="openai"
values={[
{label: 'Open AI(API)', value: 'openai'},
{label: 'text2vec(local)', value: 'text2vec'},
{label: 'Embedding API Server(cluster)', value: 'remote_embedding'},
]}>
<TabItem value="openai">
```python
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.openai()
```
</TabItem>
<TabItem value="text2vec">
```python
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.default("/data/models/text2vec-large-chinese")
```
</TabItem>
<TabItem value="remote_embedding">
If you have deployed [DB-GPT cluster](/docs/installation/model_service/cluster) and
[API server](/docs/installation/advanced_usage/OpenAI_SDK_call)
, you can connect to the API server to get the embeddings.
```python
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.remote(
api_url="http://localhost:8100/api/v1/embeddings",
api_key="{your_api_key}",
model_name="text2vec"
)
```
</TabItem>
</Tabs>
### Load Knowledge And Store In Vector Store
Then we can create a DAG which loads the knowledge from a URL and stores it in a vector
store.
```python
import asyncio
import shutil
from dbgpt.core.awel import DAG
from dbgpt_ext.rag import ChunkParameters
from dbgpt.rag.knowledge import KnowledgeType
from dbgpt.rag.operators import EmbeddingAssemblerOperator, KnowledgeOperator
from dbgpt.storage.vector_store.chroma_store import ChromaStore, ChromaVectorConfig
# Delete old vector store directory(/tmp/awel_rag_test_vector_store)
shutil.rmtree("/tmp/awel_rag_test_vector_store", ignore_errors=True)
vector_store = ChromaStore(
vector_store_config=ChromaVectorConfig(
name="test_vstore",
persist_path="/tmp/awel_rag_test_vector_store",
embedding_fn=embeddings
)
)
with DAG("load_knowledge_dag") as knowledge_dag:
# Load knowledge from URL
knowledge_task = KnowledgeOperator(knowledge_type=KnowledgeType.URL.name)
assembler_task = EmbeddingAssemblerOperator(
index_store=vector_store,
chunk_parameters=ChunkParameters(chunk_strategy="CHUNK_BY_SIZE")
)
knowledge_task >> assembler_task
chunks = asyncio.run(assembler_task.call("https://docs.dbgpt.site/docs/awel/"))
print(f"Chunk length: {len(chunks)}")
```
### Retrieve Knowledge From Vector Store
Then you can retrieve the knowledge from the vector store.
```python
from dbgpt.core.awel import MapOperator
from dbgpt.rag.operators import EmbeddingRetrieverOperator
with DAG("retriever_dag") as retriever_dag:
retriever_task = EmbeddingRetrieverOperator(
top_k=3,
index_store=vector_store,
)
content_task = MapOperator(lambda cks: "\n".join(c.content for c in cks))
retriever_task >> content_task
chunks = asyncio.run(content_task.call("What is the AWEL?"))
print(chunks)
```
### Prepare LLM
To build a RAG program, we need a LLM, here are some of the LLMs that DB-GPT supports:
<Tabs
defaultValue="openai"
values={[
{label: 'Open AI(API)', value: 'openai'},
{label: 'YI(API)', value: 'yi_proxy'},
{label: 'API Server(cluster)', value: 'model_service'},
]}>
<TabItem value="openai">
First, you should install the `openai` library.
```bash
pip install openai
```
Then set your API key in the environment `OPENAI_API_KEY`.
```python
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient()
```
</TabItem>
<TabItem value="yi_proxy">
You should have a YI account and get the API key from the YI official website.
First, you should install the `openai` library.
```bash
pip install openai
```
Then set your API key in the environment variable `YI_API_KEY`.
```python
from dbgpt.model.proxy import YiLLMClient
llm_client = YiLLMClient()
```
</TabItem>
<TabItem value="model_service">
If you have deployed [DB-GPT cluster](/docs/installation/model_service/cluster) and
[API server](/docs/installation/advanced_usage/OpenAI_SDK_call)
, you can connect to the API server to get the LLM model.
The API is compatible with the OpenAI API, so you can use the OpenAILLMClient to
connect to the API server.
First you should install the `openai` library.
```bash
pip install openai
```
```python
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient(api_base="http://localhost:8100/api/v1/", api_key="{your_api_key}")
```
</TabItem>
</Tabs>
### Create RAG Program
Lastly, we can create a RAG with the retrieved knowledge.
```python
from dbgpt.core.awel import InputOperator, JoinOperator, InputSource
from dbgpt.core.operators import PromptBuilderOperator, RequestBuilderOperator
from dbgpt.model.operators import LLMOperator
prompt = """Based on the known information below, provide users with professional and concise answers to their questions.
If the answer cannot be obtained from the provided content, please say:
"The information provided in the knowledge base is not sufficient to answer this question.".
It is forbidden to make up information randomly. When answering, it is best to summarize according to points 1.2.3.
known information:
{context}
question:
{question}
"""
with DAG("llm_rag_dag") as rag_dag:
input_task = InputOperator(input_source=InputSource.from_callable())
retriever_task = EmbeddingRetrieverOperator(
top_k=3,
index_store=vector_store,
)
content_task = MapOperator(lambda cks: "\n".join(c.content for c in cks))
merge_task = JoinOperator(lambda context, question: {"context": context, "question": question})
prompt_task = PromptBuilderOperator(prompt)
# The model is gpt-3.5-turbo, you can replace it with other models.
req_build_task = RequestBuilderOperator(model="gpt-3.5-turbo")
llm_task = LLMOperator(llm_client=llm_client)
result_task = MapOperator(lambda r: r.text)
input_task >> retriever_task >> content_task >> merge_task
input_task >> merge_task
merge_task >> prompt_task >> req_build_task >> llm_task >> result_task
print(asyncio.run(result_task.call("What is the AWEL?")))
```
The output will be:
```bash
AWEL stands for Agentic Workflow Expression Language, which is a set of intelligent agent workflow expression language designed for large model application development. It simplifies the process by providing functionality and flexibility through its layered API design architecture, including the operator layer, AgentFrame layer, and DSL layer. Its goal is to allow developers to focus on business logic for LLMs applications without having to deal with intricate model and environment details.
```
Congratulations! You have created a RAG program with AWEL.
### Full Code
And let's look the full code of `first_rag_with_awel.py`:
```python
import asyncio
import shutil
from dbgpt.core.awel import DAG, MapOperator, InputOperator, JoinOperator, InputSource
from dbgpt.core.operators import PromptBuilderOperator, RequestBuilderOperator
from dbgpt_ext.rag import ChunkParameters
from dbgpt.rag.knowledge import KnowledgeType
from dbgpt.rag.operators import EmbeddingAssemblerOperator, KnowledgeOperator,
EmbeddingRetrieverOperator
from dbgpt.rag.embedding import DefaultEmbeddingFactory
from dbgpt.storage.vector_store.chroma_store import ChromaStore, ChromaVectorConfig
from dbgpt.model.operators import LLMOperator
from dbgpt.model.proxy import OpenAILLMClient
# Here we use the openai embedding model, if you want to use other models, you can
# replace it according to the previous example.
embeddings = DefaultEmbeddingFactory.openai()
# Here we use the openai LLM model, if you want to use other models, you can replace
# it according to the previous example.
llm_client = OpenAILLMClient()
# Delete old vector store directory(/tmp/awel_rag_test_vector_store)
shutil.rmtree("/tmp/awel_rag_test_vector_store", ignore_errors=True)
vector_store = ChromaStore(
vector_store_config=ChromaVectorConfig(
name="test_vstore",
persist_path="/tmp/awel_rag_test_vector_store",
embedding_fn=embeddings
),
)
with DAG("load_knowledge_dag") as knowledge_dag:
# Load knowledge from URL
knowledge_task = KnowledgeOperator(knowledge_type=KnowledgeType.URL.name)
assembler_task = EmbeddingAssemblerOperator(
index_store=vector_store,
chunk_parameters=ChunkParameters(chunk_strategy="CHUNK_BY_SIZE")
)
knowledge_task >> assembler_task
chunks = asyncio.run(assembler_task.call("https://docs.dbgpt.site/docs/awel/"))
print(f"Chunk length: {len(chunks)}\n")
prompt = """Based on the known information below, provide users with professional and concise answers to their questions.
If the answer cannot be obtained from the provided content, please say:
"The information provided in the knowledge base is not sufficient to answer this question.".
It is forbidden to make up information randomly. When answering, it is best to summarize according to points 1.2.3.
known information:
{context}
question:
{question}
"""
with DAG("llm_rag_dag") as rag_dag:
input_task = InputOperator(input_source=InputSource.from_callable())
retriever_task = EmbeddingRetrieverOperator(
top_k=3,
index_store=vector_store,
)
content_task = MapOperator(lambda cks: "\n".join(c.content for c in cks))
merge_task = JoinOperator(
lambda context, question: {"context": context, "question": question})
prompt_task = PromptBuilderOperator(prompt)
# The model is gpt-3.5-turbo, you can replace it with other models.
req_build_task = RequestBuilderOperator(model="gpt-3.5-turbo")
llm_task = LLMOperator(llm_client=llm_client)
result_task = MapOperator(lambda r: r.text)
input_task >> retriever_task >> content_task >> merge_task
input_task >> merge_task
merge_task >> prompt_task >> req_build_task >> llm_task >> result_task
print(asyncio.run(result_task.call("What is the AWEL?")))
```
### Visualize DAGs
And we can visualize the DAGs with the following code:
```python
knowledge_dag.visualize_dag()
rag_dag.visualize_dag()
```
If you execute the code in Jupyter Notebook, you can see the DAGs in the notebook.
```python
display(knowledge_dag.show())
display(rag_dag.show())
```
The graph of the `knowledge_dag` is:
<p align="left">
<img src={'/img/awel/cookbook/first_rag_knowledge_dag.png'} width="1000px"/>
</p>
And the graph of the `rag_dag` is:
<p align="left">
<img src={'/img/awel/cookbook/first_rag_rag_dag.png'} width="1000px"/>
</p>
# Multi-Round Chat with LLMs
In this example, we will show how to use the AWEL library to create a multi-round chat
with a LLM.
Create a python file `multi_round_chat_with_llm.py` and write the following content:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator, BaseOperator
from dbgpt.core import (
ChatPromptTemplate,
HumanPromptTemplate,
InMemoryStorage,
MessagesPlaceholder,
ModelRequestContext,
SystemPromptTemplate,
)
from dbgpt.core.operators import (
ChatComposerInput,
ChatHistoryPromptComposerOperator,
)
from dbgpt.model.proxy import OpenAILLMClient
from dbgpt.model.operators import LLMOperator
with DAG("multi_round_chat_with_lll_dag") as dag:
prompt = ChatPromptTemplate(
messages=[
SystemPromptTemplate.from_template("You are a helpful chatbot."),
MessagesPlaceholder(variable_name="chat_history"),
HumanPromptTemplate.from_template("{user_input}"),
]
)
composer_operator = ChatHistoryPromptComposerOperator(
prompt_template=prompt,
keep_end_rounds=5,
storage=InMemoryStorage(),
message_storage=InMemoryStorage(),
)
input_task = MapOperator(
lambda req: ChatComposerInput(
context=ModelRequestContext(conv_uid=req["conv_uid"]),
prompt_dict={"user_input": req["user_input"]},
model_dict={"model": "gpt-3.5-turbo"},
)
)
# Use LLMOperator to generate response.
llm_task = LLMOperator(task_name="llm_task", llm_client=OpenAILLMClient())
out_parse_task = MapOperator(lambda out: out.text)
input_task >> composer_operator >> llm_task >> out_parse_task
async def main(task: BaseOperator):
conv_uid = "conv_1234"
first_user_input = "Who is elon musk?"
second_user_input = "Is he rich?"
print(f"First round\nUser: {first_user_input}")
first_ai_response = await task.call({"conv_uid": conv_uid, "user_input": first_user_input})
print(f"AI: {first_ai_response}")
print(f"\nSecond round\nUser: {second_user_input}")
second_ai_response = await task.call({"conv_uid": conv_uid, "user_input": second_user_input})
print(f"AI: {second_ai_response}")
asyncio.run(main(out_parse_task))
```
Then run the file with the following command:
```bash
python multi_round_chat_with_llm.py
```
And you will see the following output:
```plaintext
First round
User: Who is elon musk?
AI: Elon Musk is a well-known entrepreneur and business magnate. He is the CEO and founder of SpaceX, Tesla Inc., Neuralink, and The Boring Company. Musk is known for his work in the technology and space industries, and he is also involved in the development of electric vehicles, renewable energy, and artificial intelligence.
Second round
User: Is he rich?
AI: Yes, Elon Musk is one of the richest people in the world. As the CEO and founder of multiple successful companies, including SpaceX and Tesla, his net worth fluctuates but is consistently in the billions of dollars.
```
# QuickStart Basic AWEL Workflow
## Install
At first, install dbgpt, and necessary dependencies:
```shell
pip install dbgpt --upgrade
pip install openai
```
Create a python file `simple_sdk_llm_example_dag.py` and write the following content:
```python
import asyncio
from dbgpt.core.awel import DAG
from dbgpt.core.operators import (
PromptBuilderOperator,
RequestBuilderOperator,
)
from dbgpt.model.proxy import OpenAILLMClient
from dbgpt.model.operators import LLMOperator
with DAG("simple_sdk_llm_example_dag") as dag:
prompt_task = PromptBuilderOperator(
"Write a SQL of {dialect} to query all data of {table_name}."
)
model_pre_handle_task = RequestBuilderOperator(model="gpt-3.5-turbo")
llm_task = LLMOperator(OpenAILLMClient())
prompt_task >> model_pre_handle_task >> llm_task
output = asyncio.run(
llm_task.call({
"dialect": "MySQL",
"table_name": "users"
}
))
print(output)
```
Configure the environment variables for OpenAI API:
```bash
export OPENAI_API_KEY=sk-xx
export OPENAI_API_BASE=https://xx:80/v1
```
Run the python file:
```bash
python simple_sdk_llm_example_dag.py
```
The output will like this:
```plaintext
ModelOutput(text='SELECT * FROM users;', error_code=0, model_context=None, finish_reason=None, usage={'completion_tokens': 5, 'prompt_tokens': 19, 'total_tokens': 24}, metrics=None)
```
Congratulations! You have already mastered the basic usage of AWEL. For more examples,
please refer to the **[cookbook](/docs/awel/cookbook/)**.
And we suggest you to read the book **[AWEL Tutorial](/docs/awel/tutorial/)** to learn more about AWEL.
# Write Your Own `Chat Data` With `AWEL`
In this guide, we will show you how to write your own `Chat Data` with `AWEL`, just
link the scene of `Chat Data` in DB-GPT.
This guide is a little bit advanced, may take you some time to understand it. If you have any questions,
please feel free to ask in the [DB-GPT issues](https://github.com/eosphoros-ai/DB-GPT/issues).
## Introduction
`Chat Data` is **chat with your database**. Its goal is to interact with the database
through natural language, it includes the following steps:
1. **Build knowledge base**: parse the database schema and other information to build a knowledge base.
2. **Chat with database**: chat with the database through natural language.
There are some steps of **Chat with database**:
1. **Retrieve relevant information**: retrieve the relevant information from the
database according to the user's query.
2. **Generate response**: pass relevant information and user query to the LLM, and then
generate a response which includes some SQL and other information.
3. **Execute SQL**: execute the SQL to get the final result.
4. **Visualize result**: visualize the result and return it to the user.
In this guide, we mainly focus on step 1, 2, and 3.
## Install Dependencies
First, you need to install the `dbgpt` library.
```bash
pip install "dbgpt[rag]>=0.5.3rc0" -U
````
## Build Knowledge Base
### Prepare Embedding Model
First, you need to prepare the embedding model, you can provide an embedding model
according [Prepare Embedding Model](./first_rag_with_awel.md#prepare-embedding-model).
Here we use OpenAI's embedding model.
```python
from dbgpt.rag.embedding import DefaultEmbeddingFactory
embeddings = DefaultEmbeddingFactory.openai()
```
### Prepare Database
Here we create a simple SQLite database.
```python
from dbgpt.datasource.rdbms.conn_sqlite import SQLiteTempConnector
db_conn = SQLiteTempConnector.create_temporary_db()
db_conn.create_temp_tables(
{
"user": {
"columns": {
"id": "INTEGER PRIMARY KEY",
"name": "TEXT",
"age": "INTEGER",
},
"data": [
(1, "Tom", 10),
(2, "Jerry", 16),
(3, "Jack", 18),
(4, "Alice", 20),
(5, "Bob", 22),
],
}
}
)
```
### Store Database Schema To Vector Store
```python
import asyncio
import shutil
from dbgpt.core.awel import DAG, InputOperator
from dbgpt_ext.rag import ChunkParameters
from dbgpt.rag.operators import DBSchemaAssemblerOperator
from dbgpt.storage.vector_store.chroma_store import ChromaVectorConfig, ChromaStore
# Delete old vector store directory(/tmp/awel_with_data_vector_store)
shutil.rmtree("/tmp/awel_with_data_vector_store", ignore_errors=True)
vector_store = ChromaStore(
ChromaVectorConfig(
persist_path="/tmp/tmp_ltm_vector_store",
name="ltm_vector_store",
embedding_fn=embeddings,
)
)
with DAG("load_schema_dag") as load_schema_dag:
input_task = InputOperator.dummy_input()
# Load database schema to vector store
assembler_task = DBSchemaAssemblerOperator(
connector=db_conn,
index_store=vector_store,
chunk_parameters=ChunkParameters(chunk_strategy="CHUNK_BY_SIZE")
)
input_task >> assembler_task
chunks = asyncio.run(assembler_task.call())
print(chunks)
```
### Retrieve Database Schema From Vector Store
```python
from dbgpt.core.awel import InputSource
from dbgpt.rag.operators import DBSchemaRetrieverOperator
with DAG("retrieve_schema_dag") as retrieve_schema_dag:
input_task = InputOperator(input_source=InputSource.from_callable())
# Retrieve database schema from vector store
retriever_task = DBSchemaRetrieverOperator(
top_k=1,
index_store=vector_store,
)
input_task >> retriever_task
chunks = asyncio.run(retriever_task.call("Query the name and age of users younger than 18 years old"))
print("Retrieved schema:\n", chunks)
```
## Chat With Database
### Prepare LLM
We use LLM to generate SQL queries. Here we use OpenAI's LLM model, you can replace it
with other models according to [Prepare LLM](./first_rag_with_awel.md#prepare-llm).
```python
from dbgpt.model.proxy import OpenAILLMClient
llm_client = OpenAILLMClient()
```
### Prepare Some Decisions
Sometimes, we hope LLM can make some decisions, here we provide some decisions which are chart types.
```python
antv_charts = [
{"response_line_chart": "used to display comparative trend analysis data"},
{
"response_pie_chart": "suitable for scenarios such as proportion and distribution statistics"
},
{
"response_table": "suitable for display with many display columns or non-numeric columns"
},
# {"response_data_text":" the default display method, suitable for single-line or simple content display"},
{
"response_scatter_plot": "Suitable for exploring relationships between variables, detecting outliers, etc."
},
{
"response_bubble_chart": "Suitable for relationships between multiple variables, highlighting outliers or special situations, etc."
},
{
"response_donut_chart": "Suitable for hierarchical structure representation, category proportion display and highlighting key categories, etc."
},
{
"response_area_chart": "Suitable for visualization of time series data, comparison of multiple groups of data, analysis of data change trends, etc."
},
{
"response_heatmap": "Suitable for visual analysis of time series data, large-scale data sets, distribution of classified data, etc."
},
]
display_type = "\n".join(
f"{key}:{value}" for dict_item in antv_charts for key, value in dict_item.items()
)
```
### Generate SQL
Now, let's pass the user query and database schema to LLM to generate SQL.
```python
import asyncio
import json
from dbgpt.core import (
ChatPromptTemplate,
HumanPromptTemplate,
SystemPromptTemplate,
SQLOutputParser
)
from dbgpt.core.awel import DAG, InputOperator, InputSource, MapOperator, JoinOperator
from dbgpt.core.operators import PromptBuilderOperator, RequestBuilderOperator
from dbgpt.rag.operators import DBSchemaRetrieverOperator
from dbgpt.model.operators import LLMOperator
system_prompt = """You are a database expert. Please answer the user's question based on the database selected by the user and some of the available table structure definitions of the database.
Database name:
{db_name}
Table structure definition:
{table_info}
Constraint:
1.Please understand the user's intention based on the user's question, and use the given table structure definition to create a grammatically correct {dialect} sql. If sql is not required, answer the user's question directly..
2.Always limit the query to a maximum of {top_k} results unless the user specifies in the question the specific number of rows of data he wishes to obtain.
3.You can only use the tables provided in the table structure information to generate sql. If you cannot generate sql based on the provided table structure, please say: "The table structure information provided is not enough to generate sql queries." It is prohibited to fabricate information at will.
4.Please be careful not to mistake the relationship between tables and columns when generating SQL.
5.Please check the correctness of the SQL and ensure that the query performance is optimized under correct conditions.
6.Please choose the best one from the display methods given below for data rendering, and put the type name into the name parameter value that returns the required format. If you cannot find the most suitable one, use 'Table' as the display method.
the available data display methods are as follows: {display_type}
User Question:
{user_input}
Please think step by step and respond according to the following JSON format:
{response}
Ensure the response is correct json and can be parsed by Python json.loads.
"""
RESPONSE_FORMAT_SIMPLE = {
"thoughts": "thoughts summary to say to user",
"sql": "SQL Query to run",
"display_type": "Data display method",
}
prompt = ChatPromptTemplate(
messages=[
SystemPromptTemplate.from_template(
system_prompt,
response_format=json.dumps(
RESPONSE_FORMAT_SIMPLE, ensure_ascii=False, indent=4
),
),
HumanPromptTemplate.from_template("{user_input}"),
]
)
with DAG("chat_data_dag") as chat_data_dag:
input_task = InputOperator(input_source=InputSource.from_callable())
retriever_task = DBSchemaRetrieverOperator(
top_k=1,
index_store=vector_store,
)
content_task = MapOperator(lambda cks: [c.content for c in cks])
merge_task = JoinOperator(lambda table_info, ext_dict: {"table_info": table_info, **ext_dict})
prompt_task = PromptBuilderOperator(prompt)
req_build_task = RequestBuilderOperator(model="gpt-3.5-turbo")
llm_task = LLMOperator(llm_client=llm_client)
# Parse the pure json response, then transform it to the python dict
sql_parse_task = SQLOutputParser()
input_task >> MapOperator(lambda x: x["user_input"]) >> retriever_task >> content_task >> merge_task
input_task >> merge_task
merge_task >> prompt_task >> req_build_task >> llm_task >> sql_parse_task
result = asyncio.run(sql_parse_task.call({
"user_input": "Query the name and age of users younger than 18 years old",
"db_name": "user_management",
"dialect": "SQLite",
"top_k": 1,
"display_type": display_type,
"response": json.dumps(RESPONSE_FORMAT_SIMPLE, ensure_ascii=False, indent=4)
}))
print("Result:\n", result)
```
The output will be like this:
```bash
un_stream ai response: {
"thoughts": "The user wants to retrieve the name and age of users who are younger than 18 years old from the 'user_management' database.",
"sql": "SELECT name, age FROM user WHERE age < 18",
"display_type": "response_table"
}
Result:
{'thoughts': "The user wants to retrieve the name and age of users who are younger than 18 years old from the 'user_management' database.", 'sql': 'SELECT name, age FROM user WHERE age < 18', 'display_type': 'response_table'}
```
### Execute SQL
Let's add an operator to execute the SQL on previous generated SQL.
```python
from dbgpt.datasource.operators import DatasourceOperator
# previous code ...
db_query_task = DatasourceOperator(connector=db_conn)
sql_parse_task >> MapOperator(lambda x: x["sql"]) >> db_query_task
db_result = asyncio.run(db_query_task.call({
"user_input": "Query the name and age of users younger than 18 years old",
"db_name": "user_management",
"dialect": "SQLite",
"top_k": 1,
"display_type": display_type,
"response": json.dumps(RESPONSE_FORMAT_SIMPLE, ensure_ascii=False, indent=4)
}))
print("The result of the query is:")
print(db_result)
```
The output will be like this:
```bash
un_stream ai response: {
"thoughts": "The user wants to retrieve the names and ages of users who are younger than 18 years old from the 'user' table.",
"sql": "SELECT name, age FROM user WHERE age < 18",
"display_type": "response_table"
}
The result of the query is:
name age
0 Tom 10
1 Jerry 16
```
### Write Your Custom Process Logic After SQL Execution
Sometimes, you may want to add some custom logic after SQL execution, here we provide an example with some custom operator.
```python
import pandas as pd
from dbgpt.core.awel import MapOperator, BranchOperator, JoinOperator, is_empty_data
class TwoSumOperator(MapOperator[pd.DataFrame, int]):
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def map(self, df: pd.DataFrame) -> int:
return await self.blocking_func_to_async(self._two_sum, df)
def _two_sum(self, df: pd.DataFrame) -> int:
return df['age'].sum()
def branch_even(x: int) -> bool:
return x % 2 == 0
def branch_odd(x: int) -> bool:
return not branch_even(x)
class DataDecisionOperator(BranchOperator[int, int]):
def __init__(self, odd_task_name: str, even_task_name: str, **kwargs):
super().__init__(**kwargs)
self.odd_task_name = odd_task_name
self.even_task_name = even_task_name
async def branches(self):
return {
branch_even: self.even_task_name,
branch_odd: self.odd_task_name
}
class OddOperator(MapOperator[int, str]):
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def map(self, x: int) -> str:
print(f"{x} is odd")
return f"{x} is odd"
class EvenOperator(MapOperator[int, str]):
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def map(self, x: int) -> str:
print(f"{x} is even")
return f"{x} is even"
class MergeOperator(JoinOperator[str]):
def __init__(self, **kwargs):
super().__init__(combine_function=self.merge_func, **kwargs)
async def merge_func(self, odd: str, even: str) -> str:
return odd if not is_empty_data(odd) else even
```
Let's add these operators to the DAG.
```python
# previous code ...
two_sum_task = TwoSumOperator()
decision_task = DataDecisionOperator(odd_task_name="odd_task", even_task_name="even_task")
odd_task = OddOperator(task_name="odd_task")
even_task = EvenOperator(task_name="even_task")
merge_task = MergeOperator()
db_query_task >> two_sum_task >> decision_task
decision_task >> odd_task >> merge_task
decision_task >> even_task >> merge_task
final_result = asyncio.run(merge_task.call({
"user_input": "Query the name and age of users younger than 18 years old",
"db_name": "user_management",
"dialect": "SQLite",
"top_k": 1,
"display_type": display_type,
"response": json.dumps(RESPONSE_FORMAT_SIMPLE, ensure_ascii=False, indent=4)
}))
print("The final result is:")
print(final_result)
```
The output will be like this:
```bash
un_stream ai response: {
"thoughts": "The user wants to retrieve the names and ages of users who are younger than 18 years old from the 'user' table.",
"sql": "SELECT name, age FROM user WHERE age < 18",
"display_type": "response_table"
}
26 is even
The final result is:
26 is even
```
Congratulations! You have successfully written your own `Chat Data` with `AWEL`.
### Full Code
In the end, let's see the full code:
```python
import asyncio
import json
import shutil
import pandas as pd
from dbgpt.core import (
ChatPromptTemplate,
HumanPromptTemplate,
SQLOutputParser,
SystemPromptTemplate,
)
from dbgpt.core.awel import (
DAG,
BranchOperator,
InputOperator,
InputSource,
JoinOperator,
MapOperator,
is_empty_data,
)
from dbgpt.core.operators import PromptBuilderOperator, RequestBuilderOperator
from dbgpt.datasource.operators import DatasourceOperator
from dbgpt.datasource.rdbms.conn_sqlite import SQLiteTempConnector
from dbgpt.model.operators import LLMOperator
from dbgpt.model.proxy import OpenAILLMClient
from dbgpt_ext.rag import ChunkParameters
from dbgpt.rag.embedding import DefaultEmbeddingFactory
from dbgpt.rag.operators import DBSchemaAssemblerOperator, DBSchemaRetrieverOperator
from dbgpt.storage.vector_store.chroma_store import ChromaVectorConfig, ChromaStore
# Delete old vector store directory(/tmp/awel_with_data_vector_store)
shutil.rmtree("/tmp/awel_with_data_vector_store", ignore_errors=True)
embeddings = DefaultEmbeddingFactory.openai()
# Here we use the openai LLM model, if you want to use other models, you can replace
# it according to the previous example.
llm_client = OpenAILLMClient()
db_conn = SQLiteTempConnector.create_temporary_db()
db_conn.create_temp_tables(
{
"user": {
"columns": {
"id": "INTEGER PRIMARY KEY",
"name": "TEXT",
"age": "INTEGER",
},
"data": [
(1, "Tom", 10),
(2, "Jerry", 16),
(3, "Jack", 18),
(4, "Alice", 20),
(5, "Bob", 22),
],
}
}
)
vector_store = ChromaStore(
ChromaVectorConfig(
embedding_fn=embeddings,
name="db_schema_vector_store",
persist_path="/tmp/awel_with_data_vector_store",
)
)
antv_charts = [
{"response_line_chart": "used to display comparative trend analysis data"},
{
"response_pie_chart": "suitable for scenarios such as proportion and distribution statistics"
},
{
"response_table": "suitable for display with many display columns or non-numeric columns"
},
# {"response_data_text":" the default display method, suitable for single-line or simple content display"},
{
"response_scatter_plot": "Suitable for exploring relationships between variables, detecting outliers, etc."
},
{
"response_bubble_chart": "Suitable for relationships between multiple variables, highlighting outliers or special situations, etc."
},
{
"response_donut_chart": "Suitable for hierarchical structure representation, category proportion display and highlighting key categories, etc."
},
{
"response_area_chart": "Suitable for visualization of time series data, comparison of multiple groups of data, analysis of data change trends, etc."
},
{
"response_heatmap": "Suitable for visual analysis of time series data, large-scale data sets, distribution of classified data, etc."
},
]
display_type = "\n".join(
f"{key}:{value}" for dict_item in antv_charts for key, value in dict_item.items()
)
system_prompt = """You are a database expert. Please answer the user's question based on the database selected by the user and some of the available table structure definitions of the database.
Database name:
{db_name}
Table structure definition:
{table_info}
Constraint:
1.Please understand the user's intention based on the user's question, and use the given table structure definition to create a grammatically correct {dialect} sql. If sql is not required, answer the user's question directly..
2.Always limit the query to a maximum of {top_k} results unless the user specifies in the question the specific number of rows of data he wishes to obtain.
3.You can only use the tables provided in the table structure information to generate sql. If you cannot generate sql based on the provided table structure, please say: "The table structure information provided is not enough to generate sql queries." It is prohibited to fabricate information at will.
4.Please be careful not to mistake the relationship between tables and columns when generating SQL.
5.Please check the correctness of the SQL and ensure that the query performance is optimized under correct conditions.
6.Please choose the best one from the display methods given below for data rendering, and put the type name into the name parameter value that returns the required format. If you cannot find the most suitable one, use 'Table' as the display method.
the available data display methods are as follows: {display_type}
User Question:
{user_input}
Please think step by step and respond according to the following JSON format:
{response}
Ensure the response is correct json and can be parsed by Python json.loads.
"""
RESPONSE_FORMAT_SIMPLE = {
"thoughts": "thoughts summary to say to user",
"sql": "SQL Query to run",
"display_type": "Data display method",
}
prompt = ChatPromptTemplate(
messages=[
SystemPromptTemplate.from_template(
system_prompt,
response_format=json.dumps(
RESPONSE_FORMAT_SIMPLE, ensure_ascii=False, indent=4
),
),
HumanPromptTemplate.from_template("{user_input}"),
]
)
class TwoSumOperator(MapOperator[pd.DataFrame, int]):
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def map(self, df: pd.DataFrame) -> int:
return await self.blocking_func_to_async(self._two_sum, df)
def _two_sum(self, df: pd.DataFrame) -> int:
return df["age"].sum()
def branch_even(x: int) -> bool:
return x % 2 == 0
def branch_odd(x: int) -> bool:
return not branch_even(x)
class DataDecisionOperator(BranchOperator[int, int]):
def __init__(self, odd_task_name: str, even_task_name: str, **kwargs):
super().__init__(**kwargs)
self.odd_task_name = odd_task_name
self.even_task_name = even_task_name
async def branches(self):
return {branch_even: self.even_task_name, branch_odd: self.odd_task_name}
class OddOperator(MapOperator[int, str]):
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def map(self, x: int) -> str:
print(f"{x} is odd")
return f"{x} is odd"
class EvenOperator(MapOperator[int, str]):
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def map(self, x: int) -> str:
print(f"{x} is even")
return f"{x} is even"
class MergeOperator(JoinOperator[str]):
def __init__(self, **kwargs):
super().__init__(combine_function=self.merge_func, **kwargs)
async def merge_func(self, odd: str, even: str) -> str:
return odd if not is_empty_data(odd) else even
with DAG("load_schema_dag") as load_schema_dag:
input_task = InputOperator.dummy_input()
# Load database schema to vector store
assembler_task = DBSchemaAssemblerOperator(
connector=db_conn,
index_store=vector_store,
chunk_parameters=ChunkParameters(chunk_strategy="CHUNK_BY_SIZE"),
)
input_task >> assembler_task
chunks = asyncio.run(assembler_task.call())
print(chunks)
with DAG("chat_data_dag") as chat_data_dag:
input_task = InputOperator(input_source=InputSource.from_callable())
retriever_task = DBSchemaRetrieverOperator(
top_k=1,
index_store=vector_store,
)
content_task = MapOperator(lambda cks: [c.content for c in cks])
merge_task = JoinOperator(
lambda table_info, ext_dict: {"table_info": table_info, **ext_dict}
)
prompt_task = PromptBuilderOperator(prompt)
req_build_task = RequestBuilderOperator(model="gpt-3.5-turbo")
llm_task = LLMOperator(llm_client=llm_client)
sql_parse_task = SQLOutputParser()
db_query_task = DatasourceOperator(connector=db_conn)
(
input_task
>> MapOperator(lambda x: x["user_input"])
>> retriever_task
>> content_task
>> merge_task
)
input_task >> merge_task
merge_task >> prompt_task >> req_build_task >> llm_task >> sql_parse_task
sql_parse_task >> MapOperator(lambda x: x["sql"]) >> db_query_task
two_sum_task = TwoSumOperator()
decision_task = DataDecisionOperator(
odd_task_name="odd_task", even_task_name="even_task"
)
odd_task = OddOperator(task_name="odd_task")
even_task = EvenOperator(task_name="even_task")
merge_task = MergeOperator()
db_query_task >> two_sum_task >> decision_task
decision_task >> odd_task >> merge_task
decision_task >> even_task >> merge_task
final_result = asyncio.run(
merge_task.call(
{
"user_input": "Query the name and age of users younger than 18 years old",
"db_name": "user_management",
"dialect": "SQLite",
"top_k": 1,
"display_type": display_type,
"response": json.dumps(
RESPONSE_FORMAT_SIMPLE, ensure_ascii=False, indent=4
),
}
)
)
print("The final result is:")
print(final_result)
```
## Visualize DAGs
And we can visualize the DAGs with the following code:
```python
load_schema_dag.visualize_dag()
chat_data_dag.visualize_dag()
```
If you execute the code in Jupyter Notebook, you can see the DAGs in the notebook.
```python
display(load_schema_dag)
display(chat_data_dag)
```
The graph of the `load_schema_dag` is like this:
<p align="left">
<img src={'/img/awel/cookbook/chat_data_load_schema_dag.png'} width="1000px"/>
</p>
And the graph of the `chat_data_dag` is:
<p align="left">
<img src={'/img/awel/cookbook/chat_data_chat_data_dag.png'} width="1000px"/>
</p>
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 0\n",
"title: Get started\n",
"keywords: [awel.dag]\n",
"---"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"AWEL(Agentic Workflow Expression Language) makes it easy to build complex llm apps, and it provides great functionality and flexibility. "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic example use AWEL: http request + output rewrite\n",
"\n",
"The basic usage about AWEL is to build a http request and rewrite some output value. To see how this works, let's see an example.\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### DAG Planning\n",
"First, let's look at an introductory example of basic AWEL orchestration. The core function of the example is the handling of input and output for an HTTP request. Thus, the entire orchestration consists of only two steps:\n",
"- HTTP Request\n",
"- Processing HTTP Response Result\n",
"\n",
"In DB-GPT, some basic dependent operators have already been encapsulated and can be referenced directly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"from dbgpt._private.pydantic import BaseModel, Field\n",
"from dbgpt.core.awel import DAG, HttpTrigger, MapOperator"
],
"outputs": []
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom Operator\n",
"\n",
"Define an HTTP request body that accepts two parameters: name and age.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"class TriggerReqBody(BaseModel):\n",
" name: str = Field(..., description=\"User name\")\n",
" age: int = Field(18, description=\"User age\")"
],
"outputs": []
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Define a Request handler operator called `RequestHandleOperator`, which is an operator that extends the basic `MapOperator`. The actions of the `RequestHandleOperator` are straightforward: parse the request body and extract the name and age fields, then concatenate them into a sentence. For example:\n",
"> \"Hello, zhangsan, your age is 18.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"class RequestHandleOperator(MapOperator[TriggerReqBody, str]):\n",
" def __init__(self, **kwargs):\n",
" super().__init__(**kwargs)\n",
"\n",
" async def map(self, input_value: TriggerReqBody) -> str:\n",
" print(f\"Receive input value: {input_value}\")\n",
" return f\"Hello, {input_value.name}, your age is {input_value.age}\""
],
"outputs": []
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### DAG Pipeline\n",
"\n",
"After writing the above operators, they can be assembled into a DAG orchestration. This DAG has a total of two nodes: the first node is an `HttpTrigger`, which primarily processes HTTP requests (this operator is built into DB-GPT), and the second node is the newly defined `RequestHandleOperator` that processes the request body. The DAG code below can be used to link the two nodes together.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"source": [
"with DAG(\"simple_dag_example\") as dag:\n",
" trigger = HttpTrigger(\"/examples/hello\", request_body=TriggerReqBody)\n",
" map_node = RequestHandleOperator()\n",
" trigger >> map_node"
],
"outputs": []
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Access Verification\n",
"\n",
"Before performing access verification, the project needs to be started first: `python dbgpt/app/dbgpt_server.py`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"source": [
"\n",
"% curl -X GET http://127.0.0.1:5000/api/v1/awel/trigger/examples/hello\\?name\\=zhangsan\n",
"\"Hello, zhangsan, your age is 18\""
],
"outputs": []
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, to make it more convenient for users to test, we also provide a test environment. This test environment allows testing without starting the dbgpt_server. Add the following code below simple_dag_example, then directly run the simple_dag_example.py script to run the test script without starting the project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"source": [
"if __name__ == \"__main__\":\n",
" if dag.leaf_nodes[0].dev_mode:\n",
" # Development mode, you can run the dag locally for debugging.\n",
" from dbgpt.core.awel import setup_dev_environment\n",
"\n",
" setup_dev_environment([dag], port=5555)\n",
" else:\n",
" # Production mode, DB-GPT will automatically load and execute the current file after startup.\n",
" pass"
],
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "powershell"
}
},
"source": [
"curl -X GET http://127.0.0.1:5555/api/v1/awel/trigger/examples/hello\\?name\\=zhangsan\n",
"\"Hello, zhangsan, your age is 18\""
],
"outputs": []
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"[simple_dag_example](/examples/awel/simple_dag_example.py)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "dbgpt_env",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ]"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "f8b6b0e04f284afd2fbb5e4163e7d03bbdc845eaeb6e8c78fae04fce6b51dae6"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
# Get Started
AWEL(Agentic Workflow Expression Language) makes it easy to build complex llm apps, and it provides great functionality and flexibility.
## Basic example use AWEL: http request + output rewrite
The basic usage about AWEL is to build a http request and rewrite some output value. To see how this works, let's see an example.
### DAG Planning
First, let's look at an introductory example of basic AWEL orchestration. The core function of the example is the handling of input and output for an HTTP request. Thus, the entire orchestration consists of only two steps:
- HTTP Request
- Processing HTTP Response Result
In DB-GPT, some basic dependent operators have already been encapsulated and can be referenced directly.
```python
from dbgpt._private.pydantic import BaseModel, Field
from dbgpt.core.awel import DAG, HttpTrigger, MapOperator
```
### Custom Operator
Define an HTTP request body that accepts two parameters: name and age.
```python
class TriggerReqBody(BaseModel):
name: str = Field(..., description="User name")
age: int = Field(18, description="User age")
```
Define a Request handler operator called `RequestHandleOperator`, which is an operator that extends the basic `MapOperator`. The actions of the `RequestHandleOperator` are straightforward: parse the request body and extract the name and age fields, then concatenate them into a sentence. For example:
> "Hello, zhangsan, your age is 18."
```python
class RequestHandleOperator(MapOperator[TriggerReqBody, str]):
def __init__(self, **kwargs):
super().__init__(**kwargs)
async def map(self, input_value: TriggerReqBody) -> str:
print(f"Receive input value: {input_value}")
return f"Hello, {input_value.name}, your age is {input_value.age}"
```
### DAG Pipeline
After writing the above operators, they can be assembled into a DAG orchestration. This DAG has a total of two nodes: the first node is an `HttpTrigger`, which primarily processes HTTP requests (this operator is built into DB-GPT), and the second node is the newly defined `RequestHandleOperator` that processes the request body. The DAG code below can be used to link the two nodes together.
```python
with DAG("simple_dag_example") as dag:
trigger = HttpTrigger("/examples/hello", request_body=TriggerReqBody)
map_node = RequestHandleOperator()
trigger >> map_node
```
### Access Verification
Before performing access verification, the project needs to be started first: `python dbgpt/app/dbgpt_server.py`
```bash
% curl -X GET http://127.0.0.1:5670/api/v1/awel/trigger/examples/hello\?name\=zhangsan
"Hello, zhangsan, your age is 18"
```
Of course, to make it more convenient for users to test, we also provide a test environment. This test environment allows testing without starting the dbgpt_server. Add the following code below simple_dag_example, then directly run the simple_dag_example.py script to run the test script without starting the project.
```python
if __name__ == "__main__":
if dag.leaf_nodes[0].dev_mode:
# Development mode, you can run the dag locally for debugging.
from dbgpt.core.awel import setup_dev_environment
setup_dev_environment([dag], port=5555)
else:
# Production mode, DB-GPT will automatically load and execute the current file after startup.
pass
```
```bash
curl -X GET http://127.0.0.1:5555/api/v1/awel/trigger/examples/hello\?name\=zhangsan
"Hello, zhangsan, your age is 18"
```
[simple_dag_example](/examples/awel/simple_dag_example.py)
\ No newline at end of file
# Why use AWEL?
AWEL (Agentic Workflow Expression Language) is an intelligent agent workflow expression language specifically designed for the development of LLMs applications. In the design of DB-GPT, Agents are considered first-class citizens. RAGs, Datasources (DS), SMMF(Service-oriented Multi-model Management Framework), and Plugins are all resources that agents depend on.
We currently also see that the auto-orchestration capabilities of multi-agents are greatly limited by the model's capabilities, and at the same time, for scenarios that require determinism. For instance, tasks like pipeline work do not need to utilize the auto-orchestration capabilities of large models. Therefore, in DB-GPT, the integration of AWEL with agents can satisfy the implementation of a production-level pipeline and the auto-orchestration of agents systems that address open-ended problems.
Through the orchestration capabilities of AWEL, it is possible to develop large language model applications with a minimal amount of code.
**AWEL and agents are all you need**.
\ No newline at end of file
# Released V0.5.0 | Develop native data applications through workflows and agents
## Release Notes for Version 0.5.0
After a period of intensive development, version 0.5.0 has taken over two months to come to fruition. This marks the first stable release that will be maintained over an extended period within the DB-GPT project. Concurrently, the long-term vision for DB-GPT has been officially set: it aims to be an AI native data application development framework utilizing Agentic Workflow Expression Language (AWEL) and agents.
In essence, this framework facilitates the creation of data-centric applications through an intelligent agent-based expression language.
<p align="left">
<img src={'/img/app/app_list.png'} width="720px" />
</p>
## Introduction to Version Update
In its early releases, the DB-GPT project offered six default use cases, namely:
- [ChatData](https://docs.dbgpt.site/docs/application/started_tutorial/chat_data)
- [ChatExcel](https://docs.dbgpt.site/docs/application/started_tutorial/chat_excel)
- [ChatDB](https://docs.dbgpt.site/docs/application/started_tutorial/chat_db)
- [ChatKnowledge](https://docs.dbgpt.site/docs/application/started_tutorial/chat_knowledge)
- [ChatAgents](https://docs.dbgpt.site/docs/agents)
- [ChatDashboard](https://docs.dbgpt.site/docs/application/started_tutorial/chat_dashboard)
These scenarios were designed to satisfy basic and simple use requirements. However, for large-scale production deployment, particularly when dealing with complex business scenarios, it becomes necessary to develop custom scenarios tailored to specific business conditions. This presents significant challenges in terms of flexibility and development complexity.
To further enhance the usability and flexibility of the business framework, we have built upon our existing features, including the multi-model management (SMMF), knowledge base, Agents, data sources, plugins, and Prompts. We have abstracted the capabilities of intelligent agent orchestration (AWEL) and application construction. Additionally, to facilitate application management and distribution, we have introduced the [dbgpts](https://github.com/eosphoros-ai/dbgpts) subproject, which specifically manages the construction of native intelligent data applications, AWEL common operators, AWEL generic workflow templates, and Agents on top of DB-GPT.
This version update will not affect the usage of the previously established six scenarios. However, with subsequent iterations, these default scenarios will gradually be rewritten as Data Apps. We also plan to incorporate them into the `dbgpts` project as default applications, making them readily available for installation and use.
Now, let's provide a systematic explanation of the main updates in this local release.
### Glossary of Terms:
1. **Data App**: an intelligent Data application built on DB-GPT.
2. **AWEL**: Agentic Workflow Expression Language, intelligent Workflow Expression Language
3. **AWEL Flow**: workflow orchestration using the intelligent workflow Expression Language
4. **SMMF**: a service-oriented multi-model management framework.
5. **Datasource**: data sources, such as MySQL, PG, StarRocks, and Clickhouse.
## AWEL workflow and application
As shown in the following figure, in the left-side navigation pane, there is an AWEL workflow menu. After you open it, you can orchestrate the workflow.
<p align="left">
<img src={'/img/app/awel_flow_list.png'} width="720px" />
</p>
After the default installation, there is no content in the AWEL stream. You can build it in two ways.
1. Install it from the application repository provided by DB-GPT.
2. Create it yourself. The following describes the simple use of the following two methods. For more detailed use, see DB-GPT following tutorial.
<p align="left">
<img src={'/img/app/flow_detail.png'} width="720px" />
</p>
### To install from the official repository:
Ensure that you first install and deploy DB-GPT.
Following the installation and deployment, you can utilize the default `dbgpt` command for various operations.
:::info NOTE
This process will allow you to subsequently install the AWEL workflow.
:::
<p align="left">
<img src={'/img/app/dbgpts_cli.png'} width="720px" />
</p>
As shown in the figure, the dbgpt command supports multiple operations, including model-related operations, knowledge base operations, and Trace logs. Here we will focus on the operation of the app.
<p align="left">
<img src={'/img/app/dbgpts_apps.png'} width="720px" />
</p>
Pass `dbgpt app` list-remote command, we can see that there are three AWEL workflows available in the current warehouse. Here we install `awel-flow-web-info-search` this workflow. Run the command `dbgpt app install awel-flow-web-info-search`
<p align="left">
<img src={'/img/app/dbgpts_app_install.png'} width="720px" />
</p>
After the installation is successful, restart the DB-GPT service (dynamic hot loading is on the way), refresh the page, and then `AWEL workflow page` see the corresponding workflow.
<p align="left">
<img src={'/img/app/dbgpts_flow_black.png'} width="720px" />
</p>
### Building Your Own
In addition to installing the default AWEL flows using the official commands, you'll often need to build your own in practical scenarios. As illustrated below, by clicking on `New AWEL Flow`, you will be brought to the editing page as shown.
<p align="left">
<img src={'/img/app/awel_flow_node.png'} width="720px" />
</p>
During the editing process, each task's downstream nodes and operators support auto-completion. By clicking the plus sign (➕) located at the bottom right of each operator, you can bring up a list of potential downstream operators that can be connected to the current one. This feature enhances the user experience by providing suggestions and making it easier to construct complex workflows without needing to remember the exact names or types of operators that are available for use.
<p align="left">
<img src={'/img/app/awel_flow_node_plus.png'} width="720px" />
</p>
## Create a data application
We introduced the construction and installation of AWEL workflow. Next, we will introduce how to create a data application based on a large model.
### Search Chat App
The core capability of the search dialog application is to search for relevant knowledge through search engines (such as Baidu and Google) and then summarize and answer. The effect is as follows:
<p align="left">
<img src={'/img/app/app_search.png'} width="720px" />
</p>
Creating the preceding application is very simple. On the application creation panel, click `create` , enter the following parameters to complete the creation. Note several parameters. 1. Working mode 2. Flows the working mode we use here is `awel_layout` the selected AWEL workflow is installed earlier. `awel-flow-web-info-search` this workflow.
<p align="left">
<img src={'/img/app/app_awel.png'} width="720px" />
</p>
### Data analysis assistant
Use Multi-Agents to write a data analysis Assistant application. The results are as follows.
<p align="left">
<img src={'/img/app/app_analysis.png'} width="720px" />
</p>
<p align="left">
<img src={'/img/app/app_analysis_black.png'} width="720px" />
</p>
## Other Update Details
- Release of dbgpt core sdk (#1092): Now includes AWEL operator orchestration capabilities.
To install, you can use the command: `pip install dbgpt`
- Support for Jina Embeddings (#1105): The update integrates with Jina AI, which provides a way to create and manage embeddings for various data types, enhancing search and similarity tasks within the applications.
- New example of schema-linking using AWEL (#1081): There's a new example available demonstrating how to use AWEL for schema-linking, which can be valuable for tasks that require mapping between different data schemas.
- Unified card UI style, including knowledge base cards, model management cards, etc.: This update brings a more consistent look and feel across different UI components that display information in a card format.
## Bug Fixes
- MySQL databases no longer support automatic table creation and field auto-updates (#1133): This change may require developers to manually handle database schema changes, improving control over database migrations.
- Fixed the issue with default dialogues carrying history message records (#1117): This addresses potential privacy or performance issues by ensuring that history records are handled properly.
- Fixed the issue in examples/awel where model_name was fetched from model_config improperly (#1112): This improves the reliability of AWEL examples by ensuring that the model configuration is fetched and used correctly.
- Fixed DAGs sharing data issue (#1102): This fix might relate to data isolation in Directed Acyclic Graphs (DAGs) to ensure that workflows do not inadvertently share or overwrite data.
- Fixed issue with examples/awel default loading model text2vec-large-chinese (#1095): This fix ensures that the large Chinese text-to-vector model loads as expected in the given examples.
These changes reflect ongoing improvements to the dbgpt project, enhancing its capabilities, fixing known issues, and refining user experience. Users should refer to the official documentation or release notes for detailed instructions and information on these updates.
## Upgrade to V0.5.0
If your current version is V0.4.6 or V0.4.7, you need to upgrade to V0.5.0.
1. Suspend Service
2. upgrade the database table structure
```sql
-- dbgpt.dbgpt_serve_flow definition
CREATE TABLE `dbgpt_serve_flow` (
`id` int NOT NULL AUTO_INCREMENT COMMENT 'Auto increment id',
`uid` varchar(128) NOT NULL COMMENT 'Unique id',
`dag_id` varchar(128) DEFAULT NULL COMMENT 'DAG id',
`name` varchar(128) DEFAULT NULL COMMENT 'Flow name',
`flow_data` text COMMENT 'Flow data, JSON format',
`user_name` varchar(128) DEFAULT NULL COMMENT 'User name',
`sys_code` varchar(128) DEFAULT NULL COMMENT 'System code',
`gmt_created` datetime DEFAULT NULL COMMENT 'Record creation time',
`gmt_modified` datetime DEFAULT NULL COMMENT 'Record update time',
`flow_category` varchar(64) DEFAULT NULL COMMENT 'Flow category',
`description` varchar(512) DEFAULT NULL COMMENT 'Flow description',
`state` varchar(32) DEFAULT NULL COMMENT 'Flow state',
`source` varchar(64) DEFAULT NULL COMMENT 'Flow source',
`source_url` varchar(512) DEFAULT NULL COMMENT 'Flow source url',
`version` varchar(32) DEFAULT NULL COMMENT 'Flow version',
`label` varchar(128) DEFAULT NULL COMMENT 'Flow label',
`editable` int DEFAULT NULL COMMENT 'Editable, 0: editable, 1: not editable',
PRIMARY KEY (`id`),
UNIQUE KEY `uk_uid` (`uid`),
KEY `ix_dbgpt_serve_flow_sys_code` (`sys_code`),
KEY `ix_dbgpt_serve_flow_uid` (`uid`),
KEY `ix_dbgpt_serve_flow_dag_id` (`dag_id`),
KEY `ix_dbgpt_serve_flow_user_name` (`user_name`),
KEY `ix_dbgpt_serve_flow_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
-- dbgpt.gpts_app definition
CREATE TABLE `gpts_app` (
`id` int NOT NULL AUTO_INCREMENT COMMENT 'autoincrement id',
`app_code` varchar(255) NOT NULL COMMENT 'Current AI assistant code',
`app_name` varchar(255) NOT NULL COMMENT 'Current AI assistant name',
`app_describe` varchar(2255) NOT NULL COMMENT 'Current AI assistant describe',
`language` varchar(100) NOT NULL COMMENT 'gpts language',
`team_mode` varchar(255) NOT NULL COMMENT 'Team work mode',
`team_context` text COMMENT 'The execution logic and team member content that teams with different working modes rely on',
`user_code` varchar(255) DEFAULT NULL COMMENT 'user code',
`sys_code` varchar(255) DEFAULT NULL COMMENT 'system app code',
`created_at` datetime DEFAULT NULL COMMENT 'create time',
`updated_at` datetime DEFAULT NULL COMMENT 'last update time',
`icon` varchar(1024) DEFAULT NULL COMMENT 'app icon, url',
PRIMARY KEY (`id`),
UNIQUE KEY `uk_gpts_app` (`app_name`)
) ENGINE=InnoDB AUTO_INCREMENT=39 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE `gpts_app_collection` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'autoincrement id',
`app_code` varchar(255) NOT NULL COMMENT 'Current AI assistant code',
`user_code` int(11) NOT NULL COMMENT 'user code',
`sys_code` varchar(255) NOT NULL COMMENT 'system app code',
`created_at` datetime DEFAULT NULL COMMENT 'create time',
`updated_at` datetime DEFAULT NULL COMMENT 'last update time',
PRIMARY KEY (`id`),
KEY `idx_app_code` (`app_code`),
KEY `idx_user_code` (`user_code`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COMMENT="gpt collections";
-- dbgpt.gpts_app_detail definition
CREATE TABLE `gpts_app_detail` (
`id` int NOT NULL AUTO_INCREMENT COMMENT 'autoincrement id',
`app_code` varchar(255) NOT NULL COMMENT 'Current AI assistant code',
`app_name` varchar(255) NOT NULL COMMENT 'Current AI assistant name',
`agent_name` varchar(255) NOT NULL COMMENT ' Agent name',
`node_id` varchar(255) NOT NULL COMMENT 'Current AI assistant Agent Node id',
`resources` text COMMENT 'Agent bind resource',
`prompt_template` text COMMENT 'Agent bind template',
`llm_strategy` varchar(25) DEFAULT NULL COMMENT 'Agent use llm strategy',
`llm_strategy_value` text COMMENT 'Agent use llm strategy value',
`created_at` datetime DEFAULT NULL COMMENT 'create time',
`updated_at` datetime DEFAULT NULL COMMENT 'last update time',
PRIMARY KEY (`id`),
UNIQUE KEY `uk_gpts_app_agent_node` (`app_name`,`agent_name`,`node_id`)
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
```
```SQL
ALTER TABLE `gpts_conversations`
ADD COLUMN `team_mode` varchar(255) NULL COMMENT 'agent team work mode';
ALTER TABLE `gpts_conversations`
ADD COLUMN `current_goal` text COMMENT 'The target corresponding to the current message';
```
3. Reinstall dependencies
```shell
pip install -e ".[default]"
```
4. Start the service
## Acknowledgments
We would like to express our deepest gratitude to all the contributors who made this release possible!
@Aralhi, @Aries-ckt, @JoanFM, @csunny, @fangyinc, @Hzh_97, @junewgl, @lcxadml, @likenamehaojie, @xiuzhu9527 and @yhjun1026
## Appendix
- DB-GPT framework: https://github.com/eosphoros-ai
- Text2SQL fine tuning: https://github.com/eosphoros-ai/DB-GPT-Hub
- DB-GPT-Web : https://github.com/eosphoros-ai/DB-GPT-Web
- official English documentation: http://docs.dbgpt.site/docs/overview
- official Chinese documentation: https://www.yuque.com/eosphoros/dbgpt-docs/bex30nsv60ru0fmx
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment