"vscode:/vscode.git/clone" did not exist on "89d0e9abeb31e44cccef46537cd10d8812130ef3"
Commit 396700dd authored by chenzk's avatar chenzk
Browse files

v1.0

parents
Pipeline #2603 failed with stages
in 0 seconds
# Chat With Financial Report
Financial report analysis using large models is becoming a popular application in vertical fields. Large models can not only understand complex financial rules more accurately than humans, but can also output reasonable analysis results based on professional knowledge.
Using AWEL to build a financial report knowledge building workflow and a financial report intelligent Q&A workflow app can help users
- answer basic information questions about financial reports
- financial report indicator calculation and analysis questions
- financial report content analysis questions.
#### financial report knowledge building workflow
<p align="left">
<img src={'/img/chat_knowledge/fin_report/knowledge_workflow.png'} width="1000px"/>
</p>
#### a financial report intelligent robot workflow
<p align="left">
<img src={'/img/chat_knowledge/fin_report/financial_robot_chat.png'} width="1000px"/>
</p>
# How to Use
Upload financial report pdf and chat with financial report
scene1:ask base info for financial report
<p align="left">
<img src={'/img/chat_knowledge/fin_report/base_info_chat.jpg'} width="1000px"/>
</p>
scene2:calculate financial indicator for financial report
<p align="left">
<img src={'/img/chat_knowledge/fin_report/chat_indicator.png'} width="1000px"/>
</p>
scene3:analyze financial report
<p align="left">
<img src={'/img/chat_knowledge/fin_report/report_analyze.png'} width="1000px"/>
</p>
# How to Install
Step 1: make sure your dbgpt version is >=0.5.10
Step 2: upgrade python dependencies
```
pip install pdfplumber
pip install fuzzywuzzy
```
Step 3: install financial report app from dbgpts
```
# install poetry
pip install poetry
# install financial report knowledge process pipeline workflow and financial-robot-app workflow
dbgpt app install financial-robot-app financial-report-knowledge-factory
```
Step 4: download pre_trained embedding model from https://www.modelscope.cn/models/AI-ModelScope/bge-large-zh-v1.5
```
git clone https://www.modelscope.cn/models/AI-ModelScope/bge-large-zh-v1.5
```
```
#*******************************************************************#
#** FINANCIAL CHAT Config **#
#*******************************************************************#
FIN_REPORT_MODEL=/app/DB-GPT/models/bge-large-zh-v1.5
```
Step 5: create knowledge space, choose `FinancialReport` doamin type
<p align="left">
<img src={'/img/chat_knowledge/fin_report/financial_space.png'} width="1000px"/>
</p>
Step 6: upload financial report from `docker/examples/fin_report`, if your want to use the financial report dataset, you can download from modelscope.
```bash
git clone http://www.modelscope.cn/datasets/modelscope/chatglm_llm_fintech_raw_dataset.git
```
Step 7: automatic segment and wait for a while
Step 8: chat with financial report
<p align="left">
<img src={'/img/chat_knowledge/fin_report/chat.jpg'} width="1000px"/>
</p>
# Chat Knowledge Base
`Chat knowledge Base` provides the ability to question and answer questions based on private domain knowledge, and can build intelligent question and answer systems, reading assistants and other products based on the `knowledge base`. `RAG` technology is also used in DB-GPT to enhance knowledge retrieval.
## Noun explanation
:::info note
`Knowledge Space`: is a document space that manages a type of knowledge. Document knowledge of the same type can be uploaded to a knowledge space.
:::
## Steps
The knowledge base operation process is relatively simple and is mainly divided into the following steps.
- 1.Create knowledge space
- 2.Upload documents
- 3.Wait for document vectorization
- 4.Select Knowledge Base App
- 5.Chat With App
### Create knowledge space
At first open the `Construct App` and select the `Knowledge` on the top.
<p align="center">
<img src={'/img/app/knowledge_build_v0.6.jpg'} width="800px" />
</p>
Select the knowledge base, click the `Create` button, and fill in the necessary information to complete the creation of the knowledge space.
<p align="center">
<img src={'/img/app/knowledge_space_v0.6.jpg'} width="800px" />
</p>
### Upload documents
Document addition currently supports multiple types, such as plain text, URL crawling, and various document types such as PDF, Word, and Markdown. Select a specific document to `upload`.
<p align="left">
<img src={'/img/chat_knowledge/upload_doc.png'} width="720px" />
</p>
Select one or more corresponding documents and click `next`.
<p align="left">
<img src={'/img/chat_knowledge/upload_doc_finish.png'} width="720px" />
</p>
### Documents Segmentation
Choose Document Segmentation, you can choose to segment the document by chunk size, separator, paragraph or markdown header. The default is to segment the document by chunk size.
and click Process, it will take a few minutes to complete the document segmentation.
<p align="left">
<img src={'/img/chat_knowledge/doc_segmentation.png'} width="720px" />
</p>
:::tip
**Automatic: The document is automatically segmented according to the document type.**
**Chunk size: The number of words in each segment of the document. The default is 512 words.**
- chunk size: The number of words in each segment of the document. The default is 512 words.
- chunk overlap: The number of words overlapped between each segment of the document. The default is 50 words.
** Separator:segmentation by separator **
- separator: The separator of the document. The default is `\n`.
- enable_merge: Whether to merge the separator chunks according to chunk_size after splits. The default is `False`.
** Page: page segmentation, only support .pdf and .pptx document.**
** Paragraph: paragraph segmentation, only support .docx document.**
- separator: The paragraph separator of the document. The default is `\n`.
** Markdown header: markdown header segmentation, only support .md document.**
:::
### Waiting for document vectorization
Click on the `knowledge space` and observe the document `slicing` + `vectorization` status in the lower left corner. When the status reaches `FINISHED`, you can start a knowledge base conversation.
<p align="left">
<img src={'/img/chat_knowledge/waiting_doc_vector.png'} width="720px" />
</p>
### Knowledge base chat
Click the `Chat`button to start a conversation with the knowledge base.
<p align="left">
<img src={'/img/chat_knowledge/chat.png'} width="720px" />
</p>
### Reading assistant
In addition to the above capabilities, you can also upload documents directly in the knowledge base dialogue window, and the document will be summarized by default. This capability can be used as a `reading assistant` to assist document reading.
<p align="left">
<img src={'/img/chat_knowledge/read_helper.gif'} width="720px" />
</p>
\ No newline at end of file
# Use Data App With AWEL
## What Is AWEL?
> Agentic Workflow Expression Language(AWEL) is a set of intelligent agent workflow expression language specially designed for large model application
development.
You can found more information about AWEL in [AWEL](../awel/awel.md) and
[AWEL Tutorial](../awel/tutorial/) if you want to know more about AWEL.
In short, you can use AWEL to develop LLM applications with AWEL Python API.
## What Is AWEL Flow?
AWEL flow allows you to develop LLM applications without writing code. It is built on top of AWEL Python API.
## Visit Your AWEL Flows in `AWEL Flow` Page
In the `AWEL Flow` page, you can see all the AWEL flows you have created. You can also create a new AWEL flow by clicking the `Create Flow` button.
<p align="left">
<img src={'/img/application/awel/awel_flow_page.png'} width="720px"/>
</p>
## Examples
### Build Your RAG Application
To build your RAG application, you need to create a knowledge space according to [Chat Knowledge Base](./apps/chat_knowledge.md) first.
Then, click the `Create Flow` button to create a new flow.
In the flow editor, you can drag and drop the nodes to build your RAG application.
1. You will see an empty flow editor like below:
<p align="left">
<img src={'/img/application/awel/flow_dev_empty_page_img.png'} width="720px"/>
</p>
2. Drag a `Streaming LLM Operator` node to the flow editor.
<p align="left">
<img src={'/img/application/awel/flow_dev_rag_llm_1.png'} width="720px"/>
</p>
3. Drag a `Knowledge Operator` node to the flow editor.
You can click the "+" button in the `Streaming LLM Operator` node's second input(`"HOContext"`),
it will show a list of nodes that can be connected to current node of input, then you can select the `Knowledge Operator` node.
<p align="left">
<img src={'/img/application/awel/flow_dev_rag_llm_2_.png'} width="720px"/>
</p>
The options of nodes can be connected as follows:
<p align="left">
<img src={'/img/application/awel/flow_dev_rag_llm_3.png'} width="720px"/>
</p>
Then, drag the `Knowledge Operator` node and connect it to the `Streaming LLM Operator` node.
<p align="left">
<img src={'/img/application/awel/flow_def_rag_ko_1.png'} width="720px"/>
</p>
Please select your knowledge space in the `Knowledge Operator` node's `Knowledge Space Name` option.
4. Drag a `Common LLM Http Trigger` node to the flow editor.
<p align="left">
<img src={'/img/application/awel/flow_dev_rag_ko_2.png'} width="720px"/>
</p>
4. Drag a `Common Chat Prompt Template` **resource** node to the flow editor.
<p align="left">
<img src={'/img/application/awel/flow_dev_rag_prompt_1.png'} width="720px"/>
</p>
And you can type your prompt template in the `Common Chat Prompt Template` parameters.
5. Drag a `OpenAI Streaming Output Operator` node to the flow editor.
<p align="left">
<img src={'/img/application/awel/flow_dev_rag_output_1.png'} width="720px"/>
</p>
6. Click the `Save` button in the top right corner to save your flow.
<p align="left">
<img src={'/img/application/awel/flow_dev_rag_save_1.png'} width="720px"/>
</p>
Lastly, you will see your RAG application in the `AWEL Flow` page.
<p align="left">
<img src={'/img/application/awel/flow_dev_rag_show_1.png'} width="720px"/>
</p>
After that, you can use it to build your APP according to [App Manage](./apps/app_manage.md).
## Reference
- [AWEL](../awel/awel.md)
- [AWEL CookBook](../awel/cookbook/)
- [AWEL Tutorial](../awel/tutorial/)
# Datasources
The DB-GPT data source module is designed to manage the structured and semi-structured data assets of an enterprise, connect databases, data warehouses, data lakes, etc. to the DB-GPT framework, and quickly build data-based intelligent applications and large models. Currently, DB-GPT supports some common data sources and also supports custom extensions.
<p align="center">
<img src={'/img/app/datasource.jpg'} width="800px" />
</p>
You can add data sources through the upper right corner **Add a data source** button to add. In the pop-up dialog box, select the corresponding database type and fill in the required parameters to complete the addition.
<p align="center">
<img src={'/img/app/datasource_add.jpg'} width="800px" />
</p>
# Fine-Tuning use dbgpt_hub
The DB-GPT-Hub project has released a pip package to lower the threshold for Text2SQL training. In addition to fine-tuning through the scripts provided in the warehouse, you can alse use the Python package we provide
for fine-tuning.
## Install
```
pip install dbgpt_hub
```
## Show Baseline
```python
from dbgpt_hub.baseline import show_scores
show_scores()
```
<p align="left">
<img src={'/img/ft/baseline.png'} width="720px" />
</p>
## Fine-tuning
```python
from dbgpt_hub.data_process import preprocess_sft_data
from dbgpt_hub.train import train_sft
from dbgpt_hub.predict import start_predict
from dbgpt_hub.eval import start_evaluate
```
Preprocessing data into fine-tuned data format.
```
data_folder = "dbgpt_hub/data"
data_info = [
{
"data_source": "spider",
"train_file": ["train_spider.json", "train_others.json"],
"dev_file": ["dev.json"],
"tables_file": "tables.json",
"db_id_name": "db_id",
"is_multiple_turn": False,
"train_output": "spider_train.json",
"dev_output": "spider_dev.json",
}
]
preprocess_sft_data(
data_folder = data_folder,
data_info = data_info
)
```
Fine-tune the basic model and generate model weights
```
train_args = {
"model_name_or_path": "codellama/CodeLlama-13b-Instruct-hf",
"do_train": True,
"dataset": "example_text2sql_train",
"max_source_length": 2048,
"max_target_length": 512,
"finetuning_type": "lora",
"lora_target": "q_proj,v_proj",
"template": "llama2",
"lora_rank": 64,
"lora_alpha": 32,
"output_dir": "dbgpt_hub/output/adapter/CodeLlama-13b-sql-lora",
"overwrite_cache": True,
"overwrite_output_dir": True,
"per_device_train_batch_size": 1,
"gradient_accumulation_steps": 16,
"lr_scheduler_type": "cosine_with_restarts",
"logging_steps": 50,
"save_steps": 2000,
"learning_rate": 2e-4,
"num_train_epochs": 8,
"plot_loss": True,
"bf16": True,
}
start_sft(train_args)
```
Predictive model output results
```
predict_args = {
"model_name_or_path": "codellama/CodeLlama-13b-Instruct-hf",
"template": "llama2",
"finetuning_type": "lora",
"checkpoint_dir": "dbgpt_hub/output/adapter/CodeLlama-13b-sql-lora",
"predict_file_path": "dbgpt_hub/data/eval_data/dev_sql.json",
"predict_out_dir": "dbgpt_hub/output/",
"predicted_out_filename": "pred_sql.sql",
}
start_predict(predict_args)
```
Evaluate the accuracy of the output results on the test datasets
```
evaluate_args = {
"input": "./dbgpt_hub/output/pred/pred_sql_dev_skeleton.sql",
"gold": "./dbgpt_hub/data/eval_data/gold.txt",
"gold_natsql": "./dbgpt_hub/data/eval_data/gold_natsql2sql.txt",
"db": "./dbgpt_hub/data/spider/database",
"table": "./dbgpt_hub/data/eval_data/tables.json",
"table_natsql": "./dbgpt_hub/data/eval_data/tables_for_natsql2sql.json",
"etype": "exec",
"plug_value": True,
"keep_distict": False,
"progress_bar_for_each_datapoint": False,
"natsql": False,
}
start_evaluate(evaluate_args)
```
# Text2SQL Fine-Tuning
We have split the Text2SQL-related fine-tuning code into the `DB-GPT-Hub `sub-project, and you can also view the source code directly.
## Fine-tune pipeline
Text2SQL pipeline mainly includes the following processes:
- [Build environment](#build-environment)
- [Data processing](#data-processing)
- [Model train](#model-train)
- [Model merge](#model-merge)
- [Model predict](#model-predict)
- [Model evaluation](#model-evaluation)
## Build environment
We recommend using the conda virtual environment to build a Text2SQL fine-tuning environment
```python
git clone https://github.com/eosphoros-ai/DB-GPT-Hub.git
cd DB-GPT-Hub
conda create -n dbgpt_hub python=3.10
conda activate dbgpt_hub
conda install -c conda-forge poetry>=1.4.0
poetry install
```
The current project supports multiple LLMs and can be downloaded on demand. In this tutorial, we use `CodeLlama-13b-Instruct-hf` as the base model. The model can be downloaded from platforms such as [HuggingFace](https://huggingface.co/) and [Modelscope](https://modelscope.cn/models). Taking HuggingFace as an example, the download command is:
```python
cd Your_model_dir
git lfs install
git clone git@hf.co:codellama/CodeLlama-13b-Instruct-hf
```
## Data processing
### Data collection
The case data of this tutorial mainly uses the `Spider` dataset as an example:
- introduction: the `Spider` dataset is recognized as the most difficult large-scale cross-domain evaluation list in the industry. It contains 10,181 natural language questions and 5,693 SQL statements, involving more than 200 databases in 138 different fields.
- download: [download](https://drive.google.com/uc?export=download&id=1TqleXec_OykOYFREKKtschzY29dUcVAQ) the data set to the project directory, which is located in `dbgpt_hub/data/spider`.
### Data processing
The project uses the information matching generation method for data preparation, that is, the `SQL + Repository` generation method that combines table information. This method combines the data table information to better understand the structure and relationship of the data table, and to better generate SQL that meets the needs.
The project has encapsulated the relevant processing code in the corresponding script. You can directly run the script command with one click. The generated training sets `example_text2sql_train.json` and `example_text2sql_dev.json` will be obtained in the `dbgpt_hub/data/` directory.
```python
# Generate train data and dev(eval) data
sh dbgpt_hub/scripts/gen_train_eval_data.sh
```
There are `8659` items in the training set and `1034` items in the dev set. The generated training set data format is as follows:
```json
{
"db_id": "department_management",
"instruction": "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n\"\n##Instruction:\ndepartment_management contains tables such as department, head, management. Table department has columns such as Department_ID, Name, Creation, Ranking, Budget_in_Billions, Num_Employees. Department_ID is the primary key.\nTable head has columns such as head_ID, name, born_state, age. head_ID is the primary key.\nTable management has columns such as department_ID, head_ID, temporary_acting. department_ID is the primary key.\nThe head_ID of management is the foreign key of head_ID of head.\nThe department_ID of management is the foreign key of Department_ID of department.\n\n",
"input": "###Input:\nHow many heads of the departments are older than 56 ?\n\n###Response:",
"output": "SELECT count(*) FROM head WHERE age > 56",
"history": []
}
```
Configure the training data file in `dbgpt_hub/data/dataset_info.json`. The value of the corresponding key in the json file defaults to `example_text2sql`. This value is the value that needs to be passed in for the parameter `--dataset` in the subsequent training script train_sft. The `file_name` in json The value is the file name of the training set.
### Code interpretation
The core code of data processing is mainly in `dbgpt_hub/data_process/sql_data_process.py`. The core processing class is `ProcessSqlData()`, and the core processing function is `decode_json_file()`.
`decode_json_file()` first processes the table information in the `Spider` data into a dictionary format. The `key` and `value` are respectively the `db_id` and the `table` and `column` information corresponding to the `db_id` into the required format, for example:
```json
{
"department_management": department_management contains tables such as department, head, management. Table department has columns such as Department_ID, Name, Creation, Ranking, Budget_in_Billions, Num_Employees. Department_ID is the primary key.\nTable head has columns such as head_ID, name, born_state, age. head_ID is the primary key.\nTable management has columns such as department_ID, head_ID, temporary_acting. department_ID is the primary key.\nThe head_ID of management is the foreign key of head_ID of head.\nThe department_ID of management is the foreign key of Department_ID of department.
}
```
Then fill the `{}` part of `INSTRUCTION_PROMPT` in the config file with the above text to form the final instruction. `INSTRUCTION_PROMPT` is as follows:
```json
INSTRUCTION_PROMPT = "I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.\n ##Instruction:\n{}\n"
```
Finally, the question and query corresponding to each `db_id`in the training set and validation set are processed into the format required for model SFT training, that is, the data format shown in the execution part of the data processing code above.
:::info note
If you want to collect more data for training yourself, you can use the relevant code of this project to process it according to the above logic.
:::
## Model train
For the sake of simplicity, this reproduction tutorial uses LoRA fine-tuning to run directly as an example, but project fine-tuning can support not only `LoRA` but also `QLoRA` and [deepspeed](https://github.com/microsoft/DeepSpeed) acceleration. The detailed parameters of the training script `dbgpt_hub/scripts/train_sft.sh` are as follows:
```json
CUDA_VISIBLE_DEVICES=0 python dbgpt_hub/train/sft_train.py \
--model_name_or_path Your_download_CodeLlama-13b-Instruct-hf_path \
--do_train \
--dataset example_text2sql_train \
--max_source_length 2048 \
--max_target_length 512 \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--template llama2 \
--lora_rank 64 \
--lora_alpha 32 \
--output_dir dbgpt_hub/output/adapter/code_llama-13b-2048_epoch8_lora \
--overwrite_cache \
--overwrite_output_dir \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16 \
--lr_scheduler_type cosine_with_restarts \
--logging_steps 50 \
--save_steps 2000 \
--learning_rate 2e-4 \
--num_train_epochs 8 \
--plot_loss \
--bf16
```
Introduction to key parameters and meanings in train_sft.sh:
- `model_name_or_path` : Path to the LLM model used.
- `dataset`: The value is the configuration name of the training data set, corresponding to the outer key value in `dbgpt_hub/data/dataset_info.json`, such as `example_text2sql`.
- `max_source_length`: Enter the text length of the model. The effect parameter of this tutorial is `2048`, which is the optimal length after multiple experiments and analysis.
- `max_target_length`: The sql content length of the output model, set to `512`.
- `template`: The lora part of different model fine-tuning in the project settings. For the Llama2 series models, it is set to `llama2`.
- `lora_target`: The network parameter changing part during LoRA fine-tuning.
- `finetuning_type`: Finetuning type, the value is `[ptuning, lora, freeze, full]`, etc.
- `lora_rank`: Rank size in LoRA fine-tuning.
- `lora_alpha`: scaling factor in LoRA fine-tuning.
- `output_dir`: The path output by the Peft module during SFT fine-tuning. The default setting is under the `dbgpt_hub/output/adapter/` path.
- `per_device_train_batch_size`: The batch of training samples on each GPU. If the computing resources support it, it can be set to larger. The default is `1`.
- `gradient_accumulation_steps`: The accumulated steps value of gradient update.
- `lr_scheduler_type`: learning rate type.
- `logging_steps`: steps interval for log saving.
- `save_steps`: The steps size value of ckpt saved by the model.
- `num_train_epochs`: The number of epochs of training data.
- `learning_rate`: learning rate, the recommended learning rate is `2e-4`.
If you want to train based on `QLoRA`, you can add the parameter quantization_bit to the script to indicate whether to quantize. The value is `[4 or 8]` to enable quantization.
If you want to fine-tune different LLMs, the key parameters `lora_target` and `template` corresponding to different models can be changed by referring to the relevant content in the project's `README.md`.
## Model merge
## Model predict
After the model training is completed, to predict the trained model, you can directly run `predict_sft.sh` in the project script directory.
Prediction run command:
```python
sh ./dbgpt_hub/scripts/predict_sft.sh
```
In the project directory `./dbgpt_hub/output/pred/`, this file path is the location of the default output of the model prediction results (if it does not exist, it needs to be created). The detailed parameters in `predict_sft.sh` for this tutorial are as follows:
```python
echo " predict Start time: $(date)"
## predict
CUDA_VISIBLE_DEVICES=0 python dbgpt_hub/predict/predict.py \
--model_name_or_path Your_download_CodeLlama-13b-Instruct-hf_path \
--template llama2 \
--finetuning_type lora \
--checkpoint_dir Your_last_peft_checkpoint-4000 \
--predicted_out_filename Your_model_pred.sql
echo "predict End time: $(date)"
```
The value of the parameter `--predicted_out_filename` is the file name of the result predicted by the model, and the results can be found in the `dbgpt_hub/output/pred` directory.
## Model evaluation
For the evaluation of the model's effect on the dataset, the default is on the `Spider` dataset. Run the following command:
```python
python dbgpt_hub/eval/evaluation.py --plug_value --input Your_model_pred.sql
```
The results generated by large models have a certain degree of randomness because they are closely related to parameters such as `temperature` (can be adjusted in `GeneratingArguments` in `/dbgpt_hub/configs/model_args.py`). By default, the execution accuracy of our multiple evaluations is `0.789` and above. We have placed some of the experimental and evaluation results in the project `docs/eval_llm_result.md` for reference only.
`DB-GPT-Hub` uses `LoRA` to fine-tune the weight file on `Spider`'s training set based on the LLM of `CodeLlama-13b-Instruct-hf`. The weight file has been released. Currently, it has achieved an execution accuracy of about `0.789` on the `Spider`'s evaluation set. The weight file `CodeLlama-13b-sql-lora` is available on [HuggingFace](https://huggingface.co/eosphoros).
## Appendix
The experimental environment of this article is based on a graphics card server with `A100 (40G)`, and the total training time is about 12 hours. If your machine resources are insufficient, you can give priority to reducing the value of the parameter `gradient_accumulation_steps`. In addition, you can consider using `QLoRA` to fine-tune (add `--quantization_bit 4` to the training script `dbgpt_hub/scripts/train_sft.sh`). From our experience, `QLoRA` At `8` epochs, the results are not much different from the `LoRA` fine-tuning results.
# test
The output is as follows:
```python
dbgpt trace --help
```
:::info note
:::
# GraphRAG
[GraphRAG](../cookbook/rag/graph_rag_app_develop.md)
\ No newline at end of file
# LLMs
In the new version, the location of model management has moved to **Application Management** Under the panel, other functional modules remain unchanged
<p align="center">
<img src={'/img/app/llms_v0.6.jpg'} width="800px" />
</p>
For the use of multi-model management, please refer to the following documents.
- [Multi-Model Management](../application/advanced_tutorial/smmf.md)
- [Model Service deployment](../installation/model_service/)
- [Introduction to the principles of multi-model management](../modules/smmf.md)
# Prompts
In the actual application development process, Prompt needs to be customized in different scenarios, Agent, RAG and other modules. In order to make the editing and adjustment of Prompt more flexible, an independent Prompt module is created.
## Browse
As shown in the figure below, click **Application Management** ->**Prompt** You can enter the corresponding management interface. The interface displays a custom prompt list by default, and you can manage all prompts.
<p align="center">
<img src={'/img/app/prompt_v0.6.jpg'} width="800px" />
</p>
## Added
Next, let's see how to create a new prompt. Click the **Add Prompt** button and the prompt edit box will pop up.
<p align="center">
<img src={'/img/app/prompt_add_v0.6.jpg'} width="800px" />
</p>
We define four types of prompts:
- AGENT: Agent Prompt
- SCENE: Scene Prompt
- NORMAL: Normal prompt word
- EVALUATE: Evaluation Mode Prompt
When the AGENT type is selected, all registered agents can be seen in the drop-down list menu, and you can select an agent to set the prompt.
<p align="center">
<img src={'/img/app/agent_prompt_v0.6.jpg'} width="400px" />
</p>
After setting the prompt, a unique UID will be generated. You can bind the corresponding prompt according to the ID when using it.
<p align="center">
<img src={'/img/app/agent_prompt_code_v0.6.jpg'} width="800px" />
</p>
## Usage
Enter the AWEL editing interface, as shown below, click **Application Management** -> **Create Workflow**
<p align="center">
<img src={'/img/app/awel_create.6.jpg'} width="800px" />
</p>
Find the Agent resource and select the AWEL Layout Agent operator. We can see that each Agent contains the following information:
- Profile
- Role
- Goal
- Resource (AWELResource): The resource that Agent depends on
- AgentConfig(AWELAgentConfig) Agent Config
- AgentPrompt: Prompt
<p align="center">
<img src={'/img/app/agent_prompt_awel_v0.6.jpg'} width="800px" />
</p>
Click the [+] next to **AgentPrompt**, select the Prompt operator that pops up, and select the corresponding Prompt name or UID in the parameter panel to bind our newly created Prompt to the Agent, and debug the Agent's behavior in turn.
# What is AWEL?
Agentic Workflow Expression Language(AWEL) is a set of intelligent agent workflow expression language specially designed for large model application
development. It provides great functionality and flexibility. Through the AWEL API, you can focus on the development of business logic for LLMs applications
without paying attention to cumbersome model and environment details.
AWEL adopts a layered API design. AWEL's layered API design architecture is shown in the figure below.
<p align="left">
<img src={'/img/awel.png'} width="480px"/>
</p>
## AWEL Design
AWEL is divided into three levels in deign, namely the operator layer, AgentFream layer and DSL layer. The following is a brief introduction
to the three levels.
- **Operator layer**
The operator layer refers to the most basic operation atoms in the LLM application development process,
such as when developing a RAG application. Retrieval, vectorization, model interaction, prompt processing, etc.
are all basic operators. In the subsequent development, the framework will further abstract and standardize the design of operators.
A set of operators can be quickly implemented based on standard APIs
- **AgentFream layer**
The AgentFream layer further encapsulates operators and can perform chain calculations based on operators.
This layer of chain computing also supports distribution, supporting a set of chain computing operations such as filter, join, map, reduce, etc. More calculation logic will be supported in the future.
- **DSL layer**
The DSL layer provides a set of standard structured representation languages, which can complete the operations of AgentFream and operators by writing DSL statements, making it more deterministic to write large model applications around data, avoiding the uncertainty of writing in natural language, and making it easier to write around data. Application programming with large models becomes deterministic application programming.
## Examples
The preliminary version of AWEL has alse been released, and we have provided some built-in usage examples.
## Operators
### Example of API-RAG
You can find [source code](https://github.com/eosphoros-ai/DB-GPT/blob/main/examples/awel/simple_rag_example.py) from `examples/awel/simple_rag_example.py`
```python
with DAG("simple_rag_example") as dag:
trigger_task = HttpTrigger(
"/examples/simple_rag", methods="POST", request_body=ConversationVo
)
req_parse_task = RequestParseOperator()
# TODO should register prompt template first
prompt_task = PromptManagerOperator()
history_storage_task = ChatHistoryStorageOperator()
history_task = ChatHistoryOperator()
embedding_task = EmbeddingEngingOperator()
chat_task = BaseChatOperator()
model_task = ModelOperator()
output_parser_task = MapOperator(lambda out: out.to_dict()["text"])
(
trigger_task
>> req_parse_task
>> prompt_task
>> history_storage_task
>> history_task
>> embedding_task
>> chat_task
>> model_task
>> output_parser_task
)
```
Bit operations will arrange the entire process in the form of DAG
<p align="left">
<img src={'/img/awel_dag_flow.png'} width="360px" />
</p>
#### Example of LLM + cache
<p align="left">
<img src={'/img/awel_cache_flow.png'} width="360px" />
</p>
### AgentFream Example
```python
af = AgentFream(HttpSource("/examples/run_code", method = "post"))
result = (
af
.text2vec(model="text2vec")
.filter(vstore, store = "chromadb", db="default")
.llm(model="vicuna-13b", temperature=0.7)
.map(code_parse_func)
.map(run_sql_func)
.reduce(lambda a, b: a + b)
)
result.write_to_sink(type='source_slink')
```
### DSL Example
``` python
CREATE WORKFLOW RAG AS
BEGIN
DATA requestData = RECEIVE REQUEST FROM
http_source("/examples/rags", method = "post");
DATA processedData = TRANSFORM requestData USING embedding(model = "text2vec");
DATA retrievedData = RETRIEVE DATA
FROM vstore(database = "chromadb", key = processedData)
ON ERROR FAIL;
DATA modelResult = APPLY LLM "vicuna-13b"
WITH DATA retrievedData AND PARAMETERS (temperature = 0.7)
ON ERROR RETRY 2 TIMES;
RESPOND TO http_source WITH modelResult
ON ERROR LOG "Failed to respond to request";
END;
```
## Currently supported operators
- **Basic Operators**
- BaseOperator
- JoinOperator
- ReduceOperator
- MapOperator
- BranchOperator
- InputOperator
- TriggerOperator
- **Stream Operators**
- StreamifyAbsOperator
- UnstreamifyAbsOperator
- TransformStreamAbsOperator
## Executable environment
- Stand-alone environment
- Ray environment
This source diff could not be displayed because it is too large. You can view the blob instead.
# 4.1 AWEL Lifecycle
## Task Lifecycle Hooks
Task lifecycle hooks are a set of methods that can be implemented in a task to perform
actions at different stages of the task lifecycle. The following hooks are available:
- `before_dag_run`: Execute before DAG run
- `after_dag_end`: Execute after DAG end
### Example
Create a new file `lifecycle_hooks.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator
class MyLifecycleTask(MapOperator[str, str]):
async def before_dag_run(self):
print("Before DAG run")
async def after_dag_end(self):
print("After DAG end")
async def map(self, x: str) -> str:
return f"Hello, {x}!"
with DAG("awel_lifecycle_hooks") as dag:
task = MyLifecycleTask()
print(asyncio.run(task.call("world")))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/lifecycle_hooks.py
```
And the main output should look like this:
```plaintext
Before DAG run
After DAG end
Hello, world!
```
\ No newline at end of file
# 2.1 Map Operator
The `MapOperator` is most commonly used to apply a function to input data.
There are two ways to use the `MapOperator`:
## Build a `MapOperator` with a map function
```python
from dbgpt.core.awel import DAG, MapOperator
with DAG("awel_hello_world") as dag:
task = MapOperator(map_function=lambda x: print(f"Hello, {x}!"))
```
## Implement a custom `MapOperator`
```python
from dbgpt.core.awel import DAG, MapOperator
class MyMapOperator(MapOperator[str, None]):
async def map(self, x: str) -> None:
print(f"Hello, {x}!")
with DAG("awel_hello_world") as dag:
task = MyMapOperator()
```
## Examples
### Double the number
Create a new file named `map_operator_double_number.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator
class DoubleNumberOperator(MapOperator[int, int]):
async def map(self, x: int) -> int:
print(f"Received {x}, returning {x * 2}")
return x * 2
with DAG("awel_double_number") as dag:
task = DoubleNumberOperator()
assert asyncio.run(task.call(2)) == 4
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/map_operator_double_number.py
```
And you will see "Received 2, returning 4" printed to the console.
```bash
Received 2, returning 4
```
# 2.2 Reduce Operator
The `ReduceStreamOperator` is used to reduce the streaming data to non-streaming data.
There are two ways to use the `ReduceStreamOperator`:
## Build a `ReduceStreamOperator` with a reduce function
```python
from dbgpt.core.awel import DAG, ReduceStreamOperator
with DAG("awel_reduce_operator") as dag:
task = ReduceStreamOperator(reduce_function=lambda x, y: x + y)
```
## Implement a custom `ReduceStreamOperator`
```python
from dbgpt.core.awel import DAG, ReduceStreamOperator
class MySumOperator(ReduceStreamOperator[int, int]):
async def reduce(self, x: int, y: int) -> int:
return x + y
with DAG("awel_reduce_operator") as dag:
task = MySumOperator()
```
## Examples
### Sum the numbers
Create a new file named `reduce_operator_sum_numbers.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from typing import AsyncIterator
from dbgpt.core.awel import DAG, ReduceStreamOperator, StreamifyAbsOperator
class NumberProducerOperator(StreamifyAbsOperator[int, int]):
"""Create a stream of numbers from 0 to `n-1`"""
async def streamify(self, n: int) -> AsyncIterator[int]:
for i in range(n):
yield i
class MySumOperator(ReduceStreamOperator[int, int]):
async def reduce(self, x: int, y: int) -> int:
return x + y
with DAG("sum_numbers_dag") as dag:
task = NumberProducerOperator()
sum_task = MySumOperator()
task >> sum_task
o1 = asyncio.run(sum_task.call(call_data=5))
if o1 == sum(range(5)):
print(f"Success! n is 5, sum is {o1}")
else:
print("Failed")
o2 = asyncio.run(sum_task.call(call_data=10))
if o2 == sum(range(10)):
print(f"Success! n is 10, sum is {o2}")
else:
print("Failed")
```
Then run the following command to execute the code:
```bash
poetry run python awel_tutorial/reduce_operator_sum_numbers.py
```
And you will see "Success! n is 5, sum is 10" and "Success! n is 10, sum is 45" printed to the console.
```bash
Success! n is 5, sum is 10
Success! n is 10, sum is 45
```
\ No newline at end of file
# 2.3 Join Operator
The `JoinOperator` is used to join the data from multiple input data into a single data.
Example, if you have two parents, you can join the data from both parents into a single
data.
There are one way to use the `JoinOperator`:
## Build a `JoinOperator` with a combine function
```python
from dbgpt.core.awel import DAG, JoinOperator
with DAG("awel_join_operator") as dag:
task = JoinOperator(combine_function=lambda x, y: x + y)
```
## Examples
### Two Sum
In this example, we will create a `JoinOperator` that sums the data from two parents.
Create a new file named `join_operator_sum_numbers.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import (
DAG, JoinOperator, MapOperator, InputOperator, SimpleCallDataInputSource
)
with DAG("sum_numbers_dag") as dag:
# Create a input task to receive data from call_data
input_task = InputOperator(input_source=SimpleCallDataInputSource())
task1 = MapOperator(map_function=lambda x: x["t1"])
task2 = MapOperator(map_function=lambda x: x["t2"])
sum_task = JoinOperator(combine_function=lambda x, y: x + y)
input_task >> task1 >> sum_task
input_task >> task2 >> sum_task
if asyncio.run(sum_task.call(call_data={"t1": 5, "t2": 8})) == 13:
print("Success!")
else:
print("Failed")
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/join_operator_sum_numbers.py
```
And you will see "Success!" printed to the console.
```bash
Success!
```
The graph of the DAG is like this:
<p align="left">
<img src={'/img/awel/awel_tutorial/join_operator_example_1.png'} width="1000px" />
</p>
# 2.4 Branch Operator
The `BranchOperator` is used to decide which path to run based on the input data.
Example, if you have two paths, you can decide which path to run based on the input data.
There are two ways to use the `BranchOperator`:
## Build A `BranchOperator` With A Branch Mapping
Pass a dictionary of branch functions and task names to the `BranchOperator` constructor.
```python
from dbgpt.core.awel import DAG, BranchOperator, MapOperator
def branch_even(x: int) -> bool:
return x % 2 == 0
def branch_odd(x: int) -> bool:
return not branch_even(x)
branch_mapping = {
branch_even: "even_task",
branch_odd: "odd_task"
}
with DAG("awel_branch_operator") as dag:
task = BranchOperator(branches=branch_mapping)
even_task = MapOperator(
task_name="even_task",
map_function=lambda x: print(f"{x} is even")
)
odd_task = MapOperator(
task_name="odd_task",
map_function=lambda x: print(f"{x} is odd")
)
```
In above example, the `BranchOperator` has two child tasks, `even_task` and `odd_task`.
The `BranchOperator` will decide which child task to run based on the input data.
So we pass a dictionary of branch functions and task names to the `BranchOperator`
constructor to define the branch mapping, in dictionary, the key is the branch function,
and the value is the task name, when run the branch task, all the branch function will
be executed, if the branch function return `True`, the task will be executed, otherwise,
it will be skipped.
## Implement A Custom `BranchOperator`
Just override the `branches` method to return a dictionary of branch functions and task names.
```python
from dbgpt.core.awel import DAG, BranchOperator, MapOperator
def branch_even(x: int) -> bool:
return x % 2 == 0
def branch_odd(x: int) -> bool:
return not branch_even(x)
class MyBranchOperator(BranchOperator[int]):
def __init__(self, even_task_name: str, odd_task_name: str, **kwargs):
self.even_task_name = even_task_name
self.odd_task_name = odd_task_name
super().__init__(**kwargs)
async def branches(self):
return {
branch_even: self.even_task_name,
branch_odd: self.odd_task_name
}
with DAG("awel_branch_operator") as dag:
task = MyBranchOperator(even_task_name="even_task", odd_task_name="odd_task")
even_task = MapOperator(
task_name="even_task",
map_function=lambda x: print(f"{x} is even")
)
odd_task = MapOperator(
task_name="odd_task",
map_function=lambda x: print(f"{x} is odd")
)
```
## Examples
### Even Or Odd
Create a new file named `branch_operator_even_or_odd.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import (
DAG, BranchOperator, MapOperator, JoinOperator,
InputOperator, SimpleCallDataInputSource,
is_empty_data
)
def branch_even(x: int) -> bool:
return x % 2 == 0
def branch_odd(x: int) -> bool:
return not branch_even(x)
branch_mapping = {
branch_even: "even_task",
branch_odd: "odd_task"
}
def even_func(x: int) -> int:
print(f"Branch even, {x} is even, multiply by 10")
return x * 10
def odd_func(x: int) -> int:
print(f"Branch odd, {x} is odd, multiply by itself")
return x * x
def combine_function(x: int, y: int) -> int:
print(f"Received {x} and {y}")
# Return the first non-empty data
return x if not is_empty_data(x) else y
with DAG("awel_branch_operator") as dag:
input_task = InputOperator(input_source=SimpleCallDataInputSource())
task = BranchOperator(branches=branch_mapping)
even_task = MapOperator(task_name="even_task", map_function=even_func)
odd_task = MapOperator(task_name="odd_task", map_function=odd_func)
join_task = JoinOperator(combine_function=combine_function, can_skip_in_branch=False)
input_task >> task >> even_task >> join_task
input_task >> task >> odd_task >> join_task
print("First call, input is 5")
assert asyncio.run(join_task.call(call_data=5)) == 25
print("=" * 80)
print("Second call, input is 6")
assert asyncio.run(join_task.call(call_data=6)) == 60
```
Note: `can_skip_in_branch` is used to control whether current task can be skipped in the branch.
Set it to `False` to prevent the task from being skipped.
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/branch_operator_even_or_odd.py
```
And you will see the following output printed to the console.
```bash
First call, input is 5
Branch odd, 5 is odd, multiply by itself
Received EmptyData(SKIP_DATA) and 25
================================================================================
Second call, input is 6
Branch even, 6 is even, multiply by 10
Received 60 and EmptyData(SKIP_DATA)
```
The graph of the DAG is like this:
<p align="left">
<img src={'/img/awel/awel_tutorial/branch_operator_example_1.png'} width="1000px"/>
</p>
In above example, the `BranchOperator` has two child tasks, `even_task` and `odd_task`,
it will decide which child task to run based on the input data and the branches mapping.
We also use the `JoinOperator` to combine the data from both child tasks, if a path is
skipped, the `JoinOperator` will receive an `EmptyData(SKIP_DATA)` as input data, and we
can use `dbgpt.core.awel.is_empty_data` to check if the data is empty data.
# 2.5 Streamify Operator
The `StreamifyAbsOperator` is used to convert a single data into a stream of data.
There are one way to use the `StreamifyAbsOperator`:
## Implement A Custom `StreamifyAbsOperator`
Just override the `streamify` method to return an async iterable.
```python
from typing import AsyncIterator
from dbgpt.core.awel import DAG, StreamifyAbsOperator
class NumberProducerOperator(StreamifyAbsOperator[int, int]):
"""Create a stream of numbers from 0 to `n-1`"""
async def streamify(self, n: int) -> AsyncIterator[int]:
for i in range(n):
yield i
with DAG("numbers_dag") as dag:
task = NumberProducerOperator()
```
In above example, the `NumberProducerOperator` is a custom `StreamifyAbsOperator` that
creates a stream of numbers from 0 to `n-1`. It receives a single data `n` and returns
a stream.
## Examples
### Create A Stream Of Numbers
Create a new file named `streamify_operator_numbers.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from typing import AsyncIterator
from dbgpt.core.awel import DAG, StreamifyAbsOperator
class NumberProducerOperator(StreamifyAbsOperator[int, int]):
"""Create a stream of numbers from 0 to `n-1`"""
async def streamify(self, n: int) -> AsyncIterator[int]:
for i in range(n):
yield i
with DAG("numbers_dag") as dag:
task = NumberProducerOperator()
async def print_stream(t, n: int):
# Call the streaming operator by `call_stream` method
async for i in await t.call_stream(call_data=n):
print(i)
asyncio.run(print_stream(task, 10))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/streamify_operator_numbers.py
```
And you will see the following output printed to the console.
```bash
0
1
2
3
4
5
6
7
8
9
```
### Mock A Streaming LLM Service
Create a new file named `streamify_operator_mock_llm_service.py` in the `awel_tutorial`
directory and add the following code:
```python
import asyncio
from typing import AsyncIterator, List
from dbgpt.core.awel import DAG, StreamifyAbsOperator
class MockLLMService(StreamifyAbsOperator[str, str]):
"""Mock a streaming LLM service"""
def __init__(self, mock_data: List[str], **kwargs):
self.mock_data = mock_data
super().__init__(**kwargs)
async def streamify(self, user_input: str) -> AsyncIterator[str]:
for data in self.mock_data:
yield data
with DAG("mock_llm_service_dag") as dag:
task = MockLLMService(mock_data=["Hello, ", "how ", "can ", "I ", "help ", "you?"])
async def print_stream(t, user_input: str):
# Call the streaming operator by `call_stream` method
async for i in await t.call_stream(call_data=user_input):
print(i, end="")
asyncio.run(print_stream(task, "Hi"))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/streamify_operator_mock_llm_service.py
```
And you will see the following output printed to the console.
```bash
Hello, how can I help you?
```
\ No newline at end of file
# 2.6 Unstreamify Operator
The `UnstreamifyAbsOperator` is the opposite of the `StreamifyAbsOperator`. It converts
a stream of data into a single data.
There are one way to use the `UnstreamifyAbsOperator`:
## Implement A Custom `UnstreamifyAbsOperator`
Just override the `unstreamify` method to return a single data.
```python
from typing import AsyncIterator
from dbgpt.core.awel import DAG, UnstreamifyAbsOperator
class SumOperator(UnstreamifyAbsOperator[int, int]):
"""Unstreamify the stream of numbers"""
async def unstreamify(self, it: AsyncIterator[int]) -> int:
return sum([i async for i in it])
with DAG("sum_dag") as dag:
task = SumOperator()
```
## Examples
### Sum The Numbers
Create a new file named `unstreamify_operator_sum_numbers.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from typing import AsyncIterator
from dbgpt.core.awel import DAG, UnstreamifyAbsOperator, StreamifyAbsOperator
class NumberProducerOperator(StreamifyAbsOperator[int, int]):
"""Create a stream of numbers from 0 to `n-1`"""
async def streamify(self, n: int) -> AsyncIterator[int]:
for i in range(n):
yield i
class SumOperator(UnstreamifyAbsOperator[int, int]):
"""Unstreamify the stream of numbers"""
async def unstreamify(self, it: AsyncIterator[int]) -> int:
return sum([i async for i in it])
with DAG("sum_dag") as dag:
task = NumberProducerOperator()
sum_task = SumOperator()
task >> sum_task
print(asyncio.run(sum_task.call(call_data=5)))
print(asyncio.run(sum_task.call(call_data=10)))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/unstreamify_operator_sum_numbers.py
```
And you will see the following output printed to the console.
```bash
10
45
```
\ No newline at end of file
# 2.7 Transform Stream Operator
The `TransformStreamAbsOperator` is used to transform the streaming data to another
streaming data.
There are one way to use the `TransformStreamAbsOperator`:
## Implement a custom `TransformStreamAbsOperator`
Just override the `transform` method to return a new async iterable.
```python
from typing import AsyncIterator
from dbgpt.core.awel import DAG, TransformStreamAbsOperator
class NumberDoubleOperator(TransformStreamAbsOperator[int, int]):
async def transform_stream(self, it: AsyncIterator) -> AsyncIterator[int]:
async for i in it:
# Double the number
yield i * 2
with DAG("numbers_dag") as dag:
task = NumberDoubleOperator()
```
## Examples
### Double The Numbers
Create a new file named `transform_stream_operator_double_numbers.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from typing import AsyncIterator
from dbgpt.core.awel import DAG, TransformStreamAbsOperator, StreamifyAbsOperator
class NumberProducerOperator(StreamifyAbsOperator[int, int]):
"""Create a stream of numbers from 0 to `n-1`"""
async def streamify(self, n: int) -> AsyncIterator[int]:
for i in range(n):
yield i
class NumberDoubleOperator(TransformStreamAbsOperator[int, int]):
async def transform_stream(self, it: AsyncIterator) -> AsyncIterator[int]:
async for i in it:
# Double the number
yield i * 2
with DAG("numbers_dag") as dag:
task = NumberProducerOperator()
double_task = NumberDoubleOperator()
task >> double_task
async def print_stream(t, n: int):
# Call the streaming operator by `call_stream` method
async for i in await t.call_stream(call_data=n):
print(i)
asyncio.run(print_stream(double_task, 10))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/transform_stream_operator_double_numbers.py
```
And you will see the following output printed to the console.
```bash
0
2
4
6
8
10
12
14
16
18
```
\ No newline at end of file
# 2.8 Input Operator
The input operator is used to read a value from an **input source**. It allways as the
first Operator in a DAG, and it allows you easily to write your own input source.
The input operator is a special operator, it does not have any input, and it has one
output.
There are one way to use the input operator:
## Build A `InputOperator` With A Input Source
Just pass the input source to the `InputOperator` constructor.
```python
from dbgpt.core.awel import DAG, InputOperator, SimpleInputSource
with DAG("awel_input_operator") as dag:
input_source = SimpleInputSource(data="Hello, World!")
input_task = InputOperator(input_source=input_source)
```
## Examples
### Print The Input Data
This example shows how to use the `InputOperator` to print the input data, it uses
`SimpleInputSource` which is build with a string data as input source.
Create a new file named `input_operator_print_data.py` in the `awel_tutorial` directory
and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator, InputOperator, SimpleInputSource
with DAG("awel_input_operator") as dag:
input_source = SimpleInputSource(data="Hello, World!")
input_task = InputOperator(input_source=input_source)
print_task = MapOperator(map_function=lambda x: print(x))
input_task >> print_task
asyncio.run(print_task.call())
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/input_operator_print_data.py
```
And you will see the following output:
```bash
Hello, World!
```
### Print Stream Data
This example shows how to use the `InputOperator` to print the stream data, it uses
`SimpleInputSource` which is build with a stream data as input source.
Create a new file named `input_operator_print_stream_data.py` in the `awel_tutorial`
directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, InputOperator, SimpleInputSource
async def stream_data():
for i in range(10):
yield i
with DAG("awel_input_operator") as dag:
input_source = SimpleInputSource(data=stream_data())
input_task = InputOperator(input_source=input_source)
async def print_stream(t: InputOperator):
async for i in await t.call_stream():
print(i)
asyncio.run(print_stream(input_task))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/input_operator_print_stream_data.py
```
And you will see the following output printed to the console.
```bash
0
1
2
3
4
5
6
7
8
9
```
### Print Call Data
The **call data** is the data that is passed to the `call` method or `call_stream` method
of the operator.
This example shows how to use the `InputOperator` to print the call data, it uses
`SimpleCallDataInputSource` which is build with a call data as input source.
Create a new file named `input_operator_print_call_data.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, MapOperator, InputOperator, SimpleCallDataInputSource
with DAG("awel_input_operator") as dag:
input_source = SimpleCallDataInputSource()
input_task = InputOperator(input_source=input_source)
print_task = MapOperator(map_function=lambda x: print(x))
input_task >> print_task
asyncio.run(print_task.call(call_data="Hello, World!"))
asyncio.run(print_task.call(call_data="AWEL is cool!"))
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/input_operator_print_call_data.py
```
And you will see the following output printed to the console.
```bash
Hello, World!
AWEL is cool!
```
## Input Source
There two built-in input sources, `SimpleInputSource` and `SimpleCallDataInputSource`.
### `SimpleInputSource`
`SimpleInputSource` is used to create an input source with a single data or a stream data.
### `SimpleCallDataInputSource`
`SimpleCallDataInputSource` is used to create an input source with a call data which
is passed by the `call` method or `call_stream` method of the operator.
### Create Your Own Input Source
The simplest way to create your own input source is implementing the `BaseInputSource` and override the `_read_data` method.
Create a new file named `my_input_source.py` in the `awel_tutorial` directory and add the following code:
```python
import asyncio
from dbgpt.core.awel import DAG, InputOperator, MapOperator, BaseInputSource, TaskContext
class MyInputSource(BaseInputSource):
"""Create an input source with a single data"""
def _read_data(self, ctx: TaskContext) -> str:
return "Hello, World!"
with DAG("awel_input_operator") as dag:
input_source = MyInputSource()
input_task = InputOperator(input_source=input_source)
print_task = MapOperator(map_function=lambda x: print(x))
input_task >> print_task
asyncio.run(print_task.call())
```
And run the following command to execute the code:
```bash
poetry run python awel_tutorial/my_input_source.py
```
And you will see the following output printed to the console.
```bash
Hello, World!
```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment