Commit cc338b7c authored by zhaoying1's avatar zhaoying1
Browse files

llama_fastchat_pytorch

parents
# Python
__pycache__
*.pyc
*.egg-info
dist
# Log
*.log
*.json
# Data
!**/alpaca-data-conversation.json
# Editor
.idea
*.swp
# Other
.DS_Store
wandb
output
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# FastChat
An open platform for training, serving, and evaluating large language model based chatbots.
## Release
<p align="center">
<a href="https://vicuna.lmsys.org"><img src="assets/vicuna-logo.jpeg" width="20%"></a>
</p>
- 🔥 We released **Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality**. Checkout the blog [post](https://vicuna.lmsys.org) and [demo](https://chat.lmsys.org/).
<a href="https://chat.lmsys.org"><img src="assets/demo-narrow.gif" width="70%"></a>
Join our [Discord](https://discord.gg/h6kCZb72G7) server and follow our [Twitter](https://twitter.com/lmsysorg) to get the latest updates.
## Contents
- [Install](#install)
- [Vicuna Weights](#vicuna-weights)
- [Inference with Command Line Interface](#inference-with-command-line-interface)
- [Serving with Web GUI](#serving-with-web-gui)
- [Evaluation](#evaluation)
- [Fine-tuning](#fine-tuning)
## Install
### Method 1: With pip
```bash
# Install FastChat
pip3 install fschat
# Install the latest main branch of huggingface/transformers
pip3 install git+https://github.com/huggingface/transformers
```
### Method 2: From source
1. Clone this repository and navigate to the FastChat folder
```bash
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
```
If you are running on Mac:
```bash
brew install rust cmake
```
2. Install Package
```bash
pip3 install --upgrade pip # enable PEP 660 support
pip3 install -e .
```
## Vicuna Weights
We release [Vicuna](https://vicuna.lmsys.org/) weights as delta weights to comply with the LLaMA model license.
You can add our delta to the original LLaMA weights to obtain the Vicuna weights. Instructions:
1. Get the original LLaMA weights in the huggingface format by following the instructions [here](https://huggingface.co/docs/transformers/main/model_doc/llama).
2. Use the following scripts to get Vicuna weights by applying our delta. They will automatically download delta weights from our Hugging Face [account](https://huggingface.co/lmsys).
**NOTE**:
Our released weights are only compatible with the latest main branch of huggingface/transformers.
We install the correct version of transformers when fastchat is installed.
### Vicuna-13B
This conversion command needs around 60 GB of CPU RAM.
If you do not have enough memory, you can create a large swap file that allows the operating system to automatically utilize the disk as virtual memory.
```bash
python3 -m fastchat.model.apply_delta \
--base /path/to/llama-13b \
--target /output/path/to/vicuna-13b \
--delta lmsys/vicuna-13b-delta-v0
```
### Vicuna-7B
This conversion command needs around 30 GB of CPU RAM.
If you do not have enough memory, you can create a large swap file that allows the operating system to automatically utilize the disk as virtual memory.
```bash
python3 -m fastchat.model.apply_delta \
--base /path/to/llama-7b \
--target /output/path/to/vicuna-7b \
--delta lmsys/vicuna-7b-delta-v0
```
## Inference with Command Line Interface
(Experimental Feature: You can specify `--style rich` to enable rich text output. This may not work on certain terminals.)
#### Single GPU
The command below requires around 28GB of GPU memory for Vicuna-13B and 14GB of GPU memory for Vicuna-7B.
See the "No Enough Memory" section below if you do not have enough memory.
```
python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights
```
#### Multiple GPUs
If you do not have enough GPU memory, you can use model parallelism to aggregate memory from multiple GPUs on the same machine.
```
python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights --num-gpus 2
```
#### CPU Only
This runs on the CPU only and does not require GPU. It requires around 60GB of CPU memory for Vicuna-13B and around 30GB of CPU memory for Vicuna-7B.
```
python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights --device cpu
```
#### Metal Backend (Mac Computers with Apple Silicon or AMD GPUs)
Use `--device mps` to enable GPU acceleration on Mac computers (requires torch >= 2.0).
Use `--load-8bit` to turn on 8-bit compression.
```
python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights --device mps --load-8bit
```
Vicuna-7B can run on a 32GB M1 Macbook with 1 - 2 words / second.
#### No Enough Memory or Other Platforms
If you do not have enough memory, you can enable 8-bit compression by adding `--load-8bit` to commands above.
This can reduce memory usage by around half with slightly degraded model quality.
It is compatible with the CPU, GPU, and Metal backend.
Vicuna-13B with 8-bit compression can run on a single NVIDIA 3090/4080/V100(16GB) GPU.
```
python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights --load-8bit
```
Besides, we are actively exploring more methods to make the model easier to run on more platforms.
Contributions and pull requests are welcome.
## Serving with Web GUI
To serve using the web UI, you need three main components: web servers that interface with users, model workers that host one or more models, and a controller to coordinate the webserver and model workers. Here are the commands to follow in your terminal:
#### Launch the controller
```bash
python3 -m fastchat.serve.controller
```
This controller manages the distributed workers.
#### Launch the model worker
```bash
python3 -m fastchat.serve.model_worker --model-path /path/to/vicuna/weights
```
Wait until the process finishes loading the model and you see "Uvicorn running on ...". You can launch multiple model workers to serve multiple models concurrently. The model worker will connect to the controller automatically.
To ensure that your model worker is connected to your controller properly, send a test message using the following command:
```bash
python3 -m fastchat.serve.test_message --model-name vicuna-13b
```
#### Launch the Gradio web server
```bash
python3 -m fastchat.serve.gradio_web_server
```
This is the user interface that users will interact with.
By following these steps, you will be able to serve your models using the web UI. You can open your browser and chat with a model now.
## Evaluation
Our AI-enhanced evaluation pipeline is based on GPT-4. This section provides a high-level summary of the pipeline. For detailed instructions, please refer to the [evaluation](fastchat/eval) documentation.
### Pipeline Steps
1. Generate answers from different models: Use `qa_baseline_gpt35.py` for ChatGPT, or specify the model checkpoint and run `get_model_answer.py` for Vicuna and other models.
2. Generate reviews with GPT-4: Use GPT-4 to generate reviews automatically. This step can also be performed manually if the GPT-4 API is not available to you.
3. Generate visualization data: Run `generate_webpage_data_from_table.py` to generate data for a static website, which allows you to visualize the evaluation data.
4. Visualize the data: Serve a static website under the `webpage` directory. You can use `python3 -m http.server` to serve the website locally.
### Data Format and Contribution
We use a data format encoded with JSON Lines for evaluation. The format includes information on models, prompts, reviewers, questions, answers, and reviews.
You can customize the evaluation process or contribute to our project by accessing the relevant [data](fastchat/eval/table/).
For detailed instructions, please refer to the [evaluation](fastchat/eval) documentation.
## Fine-tuning
### Data
Vicuna is created by fine-tuning a LLaMA base model using approximately 70K user-shared conversations gathered from ShareGPT.com with public APIs. To ensure data quality, we convert the HTML back to markdown and filter out some inappropriate or low-quality samples. Additionally, we divide lengthy conversations into smaller segments that fit the model's maximum context length. For detailed instructions to clean the ShareGPT data, check out [here](docs/commands/data_cleaning.md).
Due to some concerns, we may not release the data at the moment. If you would like to try the fine-tuning code, you can try to run it with our [preprocessed alpaca dataset](playground/data/alpaca-data-conversation.json) (originally from [here](https://github.com/tatsu-lab/stanford_alpaca)).
### Code and Hyperparameters
We fine-tune the model using the code from [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), with some modifications to support gradient checkpointing and [Flash Attention](https://github.com/HazyResearch/flash-attention). We use similar hyperparameters as the Stanford Alpaca.
| Hyperparameter | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
| --- | ---: | ---: | ---: | ---: | ---: |
| Vicuna-13B | 128 | 2e-5 | 3 | 2048 | 0 |
### Fine-tuning on Any Cloud with SkyPilot
[SkyPilot](https://github.com/skypilot-org/skypilot) is a framework built by UC Berkeley for easily and cost effectively running ML workloads on any cloud (AWS, GCP, Azure, Lambda, etc.).
To use SkyPilot, install it with the following command and setup the cloud credentials locally following the instructions [here](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html).
```bash
# Install skypilot from the master branch
pip install git+https://github.com/skypilot-org/skypilot.git
```
#### Vicuna
Vicuna can be trained on 8 A100 GPUs with 80GB memory. The following command will automatically launch a node satisfying the requirement, setup and run the training job on it.
```bash
sky launch -c vicuna -s scripts/train-vicuna.yaml --env WANDB_API_KEY
```
Other options are also valid:
```bash
# Launch it on managed spot to save 3x cost (train Vicuna-13B with around $300)
sky spot launch -n vicuna scripts/train-vicuna.yaml --env WANDB_API_KEY
# Train a 7B model
sky launch -c vicuna -s scripts/train-vicuna.yaml --env WANDB_API_KEY --env MODEL_SIZE=7
```
Note: Please make sure the `WANDB_API_KEY` has been setup on your local machine. You can find the API key on your [wandb profile page](https://wandb.ai/authorize). If you would like to train the model without using wandb, you can replace the `--env WANDB_API_KEY` flag with `--env WANDB_MODE=offline`.
#### Alpaca
Launch the training job with the following line (will be launched on a single node with 4 A100-80GB GPUs)
```
sky launch -c alpaca -s scripts/train-alpaca.yaml --env WANDB_API_KEY
```
### Fine-tuning with Local GPUs
Vicuna can also be trained on 8 A100 GPUs with 80GB memory with the following code. To train on fewer GPUs, you can reduce the `per_device_train_batch_size` and increase the `gradient_accumulation_steps` accordingly to keep the global batch size the same. To setup the environment, please see the setup section in [scripts/train-vicuna.yaml](scripts/train-vicuna.yaml).
```bash
torchrun --nnodes=1 --nproc_per_node=8 --master_port=<your_random_port> \
fastchat/train/train_mem.py \
--model_name_or_path <path-to-llama-model-weight> \
--data_path <path-to-data> \
--bf16 True \
--output_dir ./checkpoints \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1200 \
--save_total_limit 100 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True
```
This image diff could not be displayed because it is too large. You can view the blob instead.
### Data cleaning
```
# Convert html to markdown
python3 -m fastchat.data.clean_sharegpt --in sharegpt_20230322_html.json --out sharegpt_20230322_clean.json
# Keep or remove specific languages
python3 -m fastchat.data.optional_clean --in sharegpt_20230322_clean.json --out sharegpt_20230322_clean_lang.json --skip-lang SOME_LANGUAGE_CODE
# Split long conversations
python3 -m fastchat.data.split_long_conversation --in sharegpt_20230322_clean_lang.json --out sharegpt_20230322_clean_lang_split.json --model-name /home/ubuntu/model_weights/llama-7b/
```
### Local GPU cluster (node-01)
```
python3 -m fastchat.serve.controller --host 0.0.0.0 --port 10002
CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://localhost:10002 --port 31000 --worker http://localhost:31000
CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://localhost:10002 --port 31001 --worker http://localhost:31001
CUDA_VISIBLE_DEVICES=2 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/bair-chat-13b/ --controller http://localhost:10002 --port 31002 --worker http://localhost:31002
CUDA_VISIBLE_DEVICES=3 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/alpaca-chat-13b/ --controller http://localhost:10002 --port 31003 --worker http://localhost:31003
python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:10002
```
### Web server
```
python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001
python3 -m fastchat.serve.register_worker --controller http://localhost:21001 --worker-name https://
python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:21001
python3 -m fastchat.serve.gradio_web_server --controller http://localhost:21001
```
### Local GPU cluster (node-02)
```
CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://node-01:10002 --host 0.0.0.0 --port 31000 --worker http://$(hostname):31000
CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://node-01:10002 --host 0.0.0.0 --port 31001 --worker http://$(hostname):31001
CUDA_VISIBLE_DEVICES=2 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://node-01:10002 --host 0.0.0.0 --port 31002 --worker http://$(hostname):31002
CUDA_VISIBLE_DEVICES=3 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://node-01:10002 --host 0.0.0.0 --port 31003 --worker http://$(hostname):31003
```
### Requirement
```
python3 -m pip install twine
python3 -m pip install --upgrade pip
pip3 install build
```
### Upload
```
bash scripts/upload_pypi.sh
```
### Install
```
sudo apt update
sudo apt install tmux htop
wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
bash Anaconda3-2022.10-Linux-x86_64.sh
conda create -n fastchat python=3.9
conda activate fastchat
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip3 install -e .
# Install the latest main branch of huggingface/transformers
pip3 install git+https://github.com/huggingface/transformers
```
### Launch servers
```
python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001
python3 -m fastchat.serve.register_worker --controller http://localhost:21001 --worker-name https://
python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:21001
export OPENAI_API_KEY=
python3 -m fastchat.serve.gradio_web_server --controller http://localhost:21001 --moderate --concurrency 20
```
### Increase the limit of max open files
One process (do not need reboot)
```
sudo prlimit --nofile=1048576:1048576 --pid=$id
```
System (need reboot): Add the lines below to `/etc/security/limits.conf`
```
* hard nofile 65535
* soft nofile 65535
```
CONTROLLER_HEART_BEAT_EXPIRATION = 2 * 60
WORKER_HEART_BEAT_INTERVAL = 30
LOGDIR = "."
import dataclasses
from enum import auto, Enum
from typing import List, Tuple, Any
class SeparatorStyle(Enum):
"""Different separator style."""
SINGLE = auto()
TWO = auto()
@dataclasses.dataclass
class Conversation:
"""A class that keeps all conversation history."""
system: str
roles: List[str]
messages: List[List[str]]
offset: int
sep_style: SeparatorStyle = SeparatorStyle.SINGLE
sep: str = "###"
sep2: str = None
skip_next: bool = False
conv_id: Any = None
def get_prompt(self):
if self.sep_style == SeparatorStyle.SINGLE:
ret = self.system + self.sep
for role, message in self.messages:
if message:
ret += role + ": " + message + self.sep
else:
ret += role + ":"
return ret
elif self.sep_style == SeparatorStyle.TWO:
seps = [self.sep, self.sep2]
ret = self.system + seps[0]
for i, (role, message) in enumerate(self.messages):
if message:
ret += role + ": " + message + seps[i % 2]
else:
ret += role + ":"
return ret
else:
raise ValueError(f"Invalid style: {self.sep_style}")
def append_message(self, role, message):
self.messages.append([role, message])
def to_gradio_chatbot(self):
ret = []
for i, (role, msg) in enumerate(self.messages[self.offset:]):
if i % 2 == 0:
ret.append([msg, None])
else:
ret[-1][-1] = msg
return ret
def copy(self):
return Conversation(
system=self.system,
roles=self.roles,
messages=[[x, y] for x, y in self.messages],
offset=self.offset,
sep_style=self.sep_style,
sep=self.sep,
sep2=self.sep2,
conv_id=self.conv_id)
def dict(self):
return {
"system": self.system,
"roles": self.roles,
"messages": self.messages,
"offset": self.offset,
"sep": self.sep,
"sep2": self.sep2,
"conv_id": self.conv_id,
}
conv_v1 = Conversation(
system="A chat between a curious human and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the human's questions.",
roles=("Human", "Assistant"),
messages=(
("Human", "Give three tips for staying healthy."),
("Assistant",
"Sure, here are three tips for staying healthy:\n"
"1. Exercise regularly: Regular physical activity can help improve your overall health and wellbeing. "
"It can also help reduce your risk of chronic conditions such as obesity, diabetes, heart disease, "
"and certain cancers. Aim for at least 150 minutes of moderate-intensity aerobic exercise or "
"75 minutes of vigorous-intensity aerobic exercise per week, along with muscle-strengthening "
"activities at least two days per week.\n"
"2. Eat a balanced diet: Eating a balanced diet that is rich in fruits, "
"vegetables, whole grains, lean proteins, and healthy fats can help support "
"your overall health. Try to limit your intake of processed and high-sugar foods, "
"and aim to drink plenty of water throughout the day.\n"
"3. Get enough sleep: Getting enough quality sleep is essential for your physical "
"and mental health. Adults should aim for seven to nine hours of sleep per night. "
"Establish a regular sleep schedule and try to create a relaxing bedtime routine to "
"help improve the quality of your sleep.")
),
offset=2,
sep_style=SeparatorStyle.SINGLE,
sep="###",
)
conv_v1_2 = Conversation(
system="A chat between a curious human and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the human's questions.",
roles=("Human", "Assistant"),
messages=(
("Human", "What are the key differences between renewable and non-renewable energy sources?"),
("Assistant",
"Renewable energy sources are those that can be replenished naturally in a relatively "
"short amount of time, such as solar, wind, hydro, geothermal, and biomass. "
"Non-renewable energy sources, on the other hand, are finite and will eventually be "
"depleted, such as coal, oil, and natural gas. Here are some key differences between "
"renewable and non-renewable energy sources:\n"
"1. Availability: Renewable energy sources are virtually inexhaustible, while non-renewable "
"energy sources are finite and will eventually run out.\n"
"2. Environmental impact: Renewable energy sources have a much lower environmental impact "
"than non-renewable sources, which can lead to air and water pollution, greenhouse gas emissions, "
"and other negative effects.\n"
"3. Cost: Renewable energy sources can be more expensive to initially set up, but they typically "
"have lower operational costs than non-renewable sources.\n"
"4. Reliability: Renewable energy sources are often more reliable and can be used in more remote "
"locations than non-renewable sources.\n"
"5. Flexibility: Renewable energy sources are often more flexible and can be adapted to different "
"situations and needs, while non-renewable sources are more rigid and inflexible.\n"
"6. Sustainability: Renewable energy sources are more sustainable over the long term, while "
"non-renewable sources are not, and their depletion can lead to economic and social instability.\n")
),
offset=2,
sep_style=SeparatorStyle.SINGLE,
sep="###",
)
conv_bair_v1 = Conversation(
system="BEGINNING OF CONVERSATION:",
roles=("USER", "GPT"),
messages=(),
offset=0,
sep_style=SeparatorStyle.TWO,
sep=" ",
sep2="</s>",
)
default_conversation = conv_v1_2
conv_templates = {
"v1": conv_v1_2,
"bair_v1": conv_bair_v1,
}
if __name__ == "__main__":
print(default_conversation.get_prompt())
import argparse
import json
import pathlib
# Prompt from stanford alpaca's training script
PROMPT_DICT = {
"prompt_input": (
"Below is an instruction that describes a task, paired with an input that provides further context. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
),
"prompt_no_input": (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Response:"
),
}
def main(args):
data_path = pathlib.Path(args.data_path)
with data_path.open() as f:
data = json.load(f)
prompt_input, prompt_no_input = PROMPT_DICT["prompt_input"], PROMPT_DICT["prompt_no_input"]
sources = [
prompt_input.format_map(example) if example.get("input", "") != "" else prompt_no_input.format_map(example)
for example in data
]
targets = [example['output'] for example in data]
new_data = []
cnt = 1
for s, t in zip(sources, targets):
new_data.append({
'id': str(cnt),
'conversations': [
{
'from': 'human',
'value': s,
},
{
'from': 'gpt',
'value': t,
}
]
})
cnt += 1
json.dump(new_data, open(args.output_path, 'w'), indent=2)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_path', type=str, default='alpaca-data.json')
parser.add_argument('--output_path', type=str, default='alpaca-data-conversation.json')
args = parser.parse_args()
main(args)
"""
Convert html to markdown with basic data cleaning.
Usage:
python3 -m fastchat.data.clean_sharegpt --in sharegpt_html.json --out sharegpt_clean.json
"""
import argparse
import json
import logging
import re
from typing import Dict, Union
import bs4
import markdownify # == 0.11.6
import tqdm
div_pattern = re.compile("<div.*?>")
span_pattern = re.compile("<span.*?>")
code_lang_pattern = re.compile("```\s*" + "(.*?)" + "(?:Copy code)+" + "(.+?)" + "\s*?```", re.DOTALL)
code_lang_format = "```\g<1>\n\g<2>\n```"
regenerate_pattern = re.compile("\d+ / \d+")
copy_chars_pattern = re.compile("Copy\d+ chars / \d+ words")
copy_code_pattern = re.compile("```(.*?)Copy code\s*```")
def reformat_code(val: str) -> str:
# Input code format is:
# ```
# $<language>Copy code$<exact_code_here>
#
# ```
# This function convert it into the correct markdown format
return re.sub(code_lang_pattern, code_lang_format, val)
def html_to_markdown(val: str) -> str:
# Remove all <div>. This is required to make intent work in code blocks.
val = re.sub(div_pattern, "", val)
# Remove all <span>. This is required to make underscores work in code blocks.
val = re.sub(span_pattern, "", val)
# Markdown to html
val = markdownify.markdownify(val).strip()
# Reformat code
val = reformat_code(val)
# Remove noisy "[number] / [number]" at the beginning
noise = re.search(regenerate_pattern, val)
if noise and noise.start() == 0:
val = val[noise.end():]
# Remove noisy "Copy[number] chars / [number] words"
val = re.sub(copy_chars_pattern, "", val)
# Remove empty code block ```\nCopy code\n```
val = re.sub(copy_code_pattern, "", val)
# Strip
val = val.replace("\n\n\n", "\n").strip()
if args.debug:
print(val)
exit()
return val
def should_skip(val: str) -> bool:
black_list = ["openai", "chatgpt"]
for w in black_list:
if w in val.lower():
return True
return False
def clean_html_source(content, begin, end, check_tag, check_num):
"""
clean the input json content.
Args:
content: json file loaded in memory.
check_tag: a debug purpose arg. If a conversation contains the tag, log
it before and after cleaning.
check_num: number of matched conversations logged.
"""
BARRIER = "\n" + "=" * 20 + "\n"
skip_cnt = 0
tag_cnt = 0
content = content[begin:end]
new_content = []
for sample in tqdm.tqdm(content):
skipped = False
if len(sample["conversations"]) <= 1:
# The conversation is too short
skipped = True
else:
for c in sample["conversations"]:
if should_skip(c["value"]):
skipped = True
break
try:
new_val = html_to_markdown(c["value"])
except (bs4.builder.ParserRejectedMarkup, AssertionError):
skipped = True
break
c["value"] = new_val
# Debug
if (check_tag is not None and check_tag in c["value"]
and tag_cnt < check_num):
logging.debug(BARRIER + c["value"] + "\n" + BARRIER + new_val +
"\n" + BARRIER + "\n")
tag_cnt += 1
if tag_cnt == check_num:
break
if not skipped:
new_content.append(sample)
else:
skip_cnt += 1
print(f"total: {len(content)}, skip: {skip_cnt}, new: {len(new_content)}")
return new_content
def main(args):
content = json.load(open(args['in_file'], "r"))
content = clean_html_source(
content, args['begin'], args['end'],
args['check_tag'], args['check_num'])
json.dump(content, open(args['out_file'], "w"), indent=2)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--in-file", type=str, required=True)
parser.add_argument("--out-file", type=str, default="sharegpt_clean.json")
parser.add_argument("--begin", type=int)
parser.add_argument("--end", type=int)
parser.add_argument("--debug", action="store_true")
parser.add_argument("--check-tag", type=str)
parser.add_argument("--check-num", type=int, default=1)
args = parser.parse_args()
main(vars(args))
"""
Usage:
python3 -m fastchat.data.inspect --in sharegpt_20230322_clean_lang_split.json
"""
import argparse
import json
import tqdm
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--in-file", type=str, required=True)
parser.add_argument("--begin", type=int)
args = parser.parse_args()
content = json.load(open(args.in_file, "r"))
for sample in tqdm.tqdm(content[args.begin:]):
print(f"id: {sample['id']}")
for conv in sample["conversations"]:
print(conv["from"] + ": ")
print(conv["value"])
input()
"""
Do optional cleaning (e.g., remove some languages).
Usage:
python3 -m fastchat.data.optional_clean --in input.json --out output.json --keep-lang en
python3 -m fastchat.data.optional_clean --in input.json --out output.json --skip-lang en
Requirement:
pip3 install polyglot icu pyicu pycld2 morfessor
"""
import argparse
import json
import re
import polyglot
from polyglot.detect import Detector
import pycld2
from tqdm import tqdm
def skip(conv, args):
# Remove certain languages
if args.keep_lang != "all" or args.skip_lang is not None:
text = "\n".join([x["value"] for x in conv["conversations"]])
try:
lang_code = Detector(text).language.code
except (pycld2.error, polyglot.detect.base.UnknownLanguage):
lang_code = "unknown"
if args.keep_lang != "all" and lang_code != args.keep_lang:
return True
if lang_code == args.skip_lang:
return True
# Remove repetitive numbers
if args.reduce_rep:
for sentence in conv["conversations"]:
val = sentence["value"]
sub = re.search(r"(\d)\1{8}", val)
if sub is not None:
return True
return False
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--in-file", type=str, required=True)
parser.add_argument("--out-file", type=str)
parser.add_argument("--keep-lang", type=str, default="all", choices=["all", "en"],
help="Only keep certain langauges.")
parser.add_argument("--skip-lang", type=str,
help="Skip a specific language.")
# NOTE: Be careful about reduce_rep which may remove some good data.
# For example, addresses could have long consecutive 0's
parser.add_argument("--reduce-rep", action="store_true")
args = parser.parse_args()
in_file = args.in_file
out_file = args.out_file
keep_lang = args.keep_lang
skip_lang = args.skip_lang
reduce_rep = args.reduce_rep
assert (keep_lang == "all" or skip_lang is None)
if out_file is None:
out_file = "sharegpt_clean"
if keep_lang != "all":
out_file += "_" + keep_lang
if skip_lang is not None:
out_file += "_skip_" + skip_lang
if reduce_rep:
out_file += "_reduce_rep"
out_file += ".json"
content = json.load(open(in_file, "r"))
num_conv = len(content)
new_content = []
for conv in tqdm(content):
if not skip(conv, args):
new_content.append(conv)
print(f"return {len(new_content)} out of {len(content)}, start dump ...")
json.dump(new_content, open(out_file, "w"), indent=2)
"""
Usage:
python3 pretty_json.py --in in.json --out out.json
"""
import argparse
import json
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--in-file", type=str, required=True)
parser.add_argument("--out-file", type=str, required=True)
args = parser.parse_args()
with open(args.in_file, "r") as fin:
data = json.load(fin)
with open(args.out_file, "w") as fout:
json.dump(data, fout, indent=2)
"""
Split long conversations based on certain max length.
Usage: python3 -m fastchat.data.split_long_conversation \
--in sharegpt_clean.json \
--out sharegpt_split.json \
--model-name-or-path $<model-name>
"""
import argparse
import json
from typing import Dict, Sequence, Optional
import transformers
import tqdm
from fastchat import conversation as conversation_lib
DEFAULT_PAD_TOKEN = "[PAD]"
BEGIN_SIGNAL = "### "
END_SIGNAL = "\n"
def split_sample(sample, start_idx, end_idx):
# only ends in the bot because otherwise the last human part is useless.
end_speaker = sample["conversations"][end_idx]["from"]
end_idx = end_idx + 1 if end_speaker != "human" else end_idx
return {
"id": sample["id"] + "_" + str(start_idx),
"conversations": sample["conversations"][start_idx:end_idx]
}
def split_contents(content, begin, end, tokenizer, max_length):
"""
Keep the maximum round of conversations within the max token length constraint
"""
content = content[begin:end]
new_content = []
for sample in tqdm.tqdm(content):
tokenized_lens = []
for c in sample["conversations"]:
from_str = c["from"]
if from_str.lower() == "human":
from_str = conversation_lib.default_conversation.roles[0]
elif from_str.lower() == "gpt":
from_str = conversation_lib.default_conversation.roles[1]
else:
from_str = 'unknown'
sentence = (BEGIN_SIGNAL + from_str + ": " + c["value"] +
END_SIGNAL)
length = tokenizer(sentence, return_tensors="pt", padding="longest"
).input_ids.ne(tokenizer.pad_token_id).sum().item()
tokenized_lens.append(length)
num_tokens = 0
start_idx = 0
for idx, l in enumerate(tokenized_lens):
# TODO: shall we also only starts from a specific speaker?
if num_tokens + l > max_length:
new_content.append(split_sample(sample, start_idx, idx))
start_idx = idx
num_tokens = l
else:
num_tokens += l
if idx == len(tokenized_lens) - 1:
new_content.append(split_sample(sample, start_idx, idx))
print(f"total: {len(content)}, new: {len(new_content)}")
return new_content
def main(args):
content = json.load(open(args.in_file, "r"))
tokenizer = transformers.AutoTokenizer.from_pretrained(
args.model_name_or_path,
model_max_length=args.max_length,
padding_side="right",
use_fast=False,
)
if tokenizer.pad_token is None:
tokenizer.add_special_tokens(dict(pad_token=DEFAULT_PAD_TOKEN))
content = split_contents(content, args.begin, args.end,
tokenizer, args.max_length)
json.dump(content, open(args.out_file, "w"), indent=2)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--in-file", type=str, required=True)
parser.add_argument("--out-file", type=str, default="sharegpt_split.json")
parser.add_argument("--begin", type=int)
parser.add_argument("--end", type=int)
parser.add_argument("--model-name-or-path", type=str, required=True)
parser.add_argument("--max-length", type=int, default=2304)
args = parser.parse_args()
main(args)
# Evaluations
This directory contains end-to-end pipelines for AI-enhanced evaluation. We will introduce the evaluation pipeline and the data format in this document.
## Generate Answers
### ChatGPT (gpt-3.5-turbo)
Make sure you have setup the OpenAI API Key in your environment. Then run:
```bash
python qa_baseline_gpt35.py --question table/question.jsonl --output table/answer/answer_gpt35.jsonl
```
### Bard
Unfortunately, Bard has not release its public APIs till now. You may have to enter the anwsers manually. Or you could find a third-party project that interfaces with Bard.
### Vicuna and others
To generate answers with Vicuna or other models, specify path to the model checkpoint, a desired model ID and run:
```bash
python get_model_answer.py --model-id [MODEL-ID] --model-path /model/path --question-file tables/question.jsonl --answer-file table/answer/answer.jsonl --num-gpus [NUM-GPUS]
```
Then the answers to the questions will be saved in `table/answer/answer.jsonl`.
Note: we assume the model can be loaded with a single GPU.
## Evaluate Answers Automatically
### Generete Reviews with GPT-4
Note: Below script requires access to GPT-4 API. If you only have access to GPT-4 on web interface, you can evaluate the answers by manually formatting the prompt. See more details in the **Reviewers** and **Prompts** sections in **Data Format**.
It is critical to follow the prompt templates; otherwise GPT-4 may not give fair reviews. `table/review/*.jsonl` are some review examples generated by GPT-4 or you can view them on our eval [webpage](https://vicuna.lmsys.org/eval/).
To use the script for generating reviews with GPT-4, you need to `export` your OpenAI API key in environment variable. Then run:
```bash
python eval_gpt_review.py -q table/question.jsonl -a /path/to/answer_1.jsonl /path/to/answer_2.jsonl -p table/prompt.jsonl -r table/reviewer.jsonl -o /path/to/review_output.jsonl
```
The GPT-4 reviews will be saved in `/path/to/review_output.jsonl`. Note: we implement some simple parsing code to extract the score pairs from GPT-4's reviews. However, you need to double check whether the parsed score pair are correct. Sometime the parsing logic may fail if GPT-4 doesn't give a structured answer.
## Visualize Results
You can generate the data for the webpage by running:
```bash
python eval/generate_webpage_data_from_table.py
```
Then you can serve a static website in `webpage` to see the results.
## Data Format
If you want to have a deeper understanding of our evaluation pipeline or want to contribute to the evaluation process, you need to learn the data format we used for evaluation.
Our evaluation data are encoded with [JSON Lines](https://jsonlines.org/).
### Random ID Generation
We use the `shortuuid` Python library for generating short random UUIDs.
```python
import shortuuid
shortuuid.uuid() -> str
```
### Models
`model.jsonl` contains model information we used for generating anwsers.
Each row contains a record of a model with the following field:
* `model_id` (str): A unique ID for a model. Models with different IDs is supposed to have different performance. This ID is generated by `{model_name}:{model_version}`.
* `model_name` (str): The name of a model. This is not unique, because a model could be trained and updated continuously, but it is still considered as the same model with different versions.
* `model_version` (str): The version of a model.
* `model_metadata` (Any): Any metadata of a model (descriptions etc). This is optional.
For example:
```json
{
"model_id": "vicuna-13b:v1",
"model_name": "vicuna-13b",
"model_version": "v1",
"model_metadata": "learning rate 1e-5, 3 epochs, 13b"
}
```
### Prompts
We store prompts in `prompt.jsonl`. Each row contains a record of a prompt with the following field:
* `prompt_id` (int): A unique integer ID for a prompt. Prompts with different IDs are supposed to have different purpose.
* `system_prompt` (str): The system prompt given to a model. This is the prompt that the model sees first.
* `prompt_template` (str): The prompt body. This is the user prompt that the model sees after the system prompt. It is a Python f-string template, so that we can fill in the inputs later.
* `defaults` (dict): A dictionary of default values for the prompt template. It can be empty.
* `description` (str): A description of the functionality of the prompt.
For example:
```json
{
"prompt_id": 1,
"system_prompt": "You are a helpful assistant.",
"prompt_template": "[Question]\n{question}\n\n[Assistant 1]\n{answer_1}\n\n[End of Assistant 1]\n\n[Assistant 2]\n{answer_2}\n\n[End of Assistant 2]\n\n[System]\n{prompt}\n\n",
"defaults": {"prompt": "Which assistant is more helpful?"},
"description": "Compare two assistants' answers to a question."
}
```
### Reviewers
`reviewer.jsonl` contains reviewer information we used for reviewing answers generated by different models. Each row contains a record of a reviewer with the following field:
* `reviewer_id` (str): A unique ID for a reviewer. Reviewers with different IDs is supposed to have different reviewing performance.
* `prompt_id` (str): The ID of the prompt given to the reviewer (e.g., an AI assistant). Different prompts could result in different reviewing performance.
* `metadata` (dict): Metadata of a reviewer about its configurations.
* `description` (str): A description of the reviewer.
* `category` (str): The category that the reviewer belongs to.
For example:
```json
{
"reviewer_id": "gpt-4-0328-default",
"prompt_id": 1,
"temperature": 0.2,
"max_tokens": 8192,
"description": "GPT-4 for general questions.",
"category": "general"
}
```
### Questions
`question.jsonl` contains questions we used for evaluation. Each row contains a record of a question with the following field:
* `question_id` (int): A unique integer for a question. Questions with different IDs is supposed to be different.
* `text` (str): The question text.
* `category` (str): The category of the question. Questions with the same category are supposed to be similar or originate from the same source.
### Answers
`answer/xxx.jsonl` contains answers generated by different models. Each row contains a record of an answer with the following field:
* `answer_id` (str): A unique UUID for an answer. Answers with different IDs is supposed to be different.
* `question_id` (int): The ID of the question the answer is generated for.
* `model_id` (str): The ID of the model the answer is generated by.
* `text` (str): The answer text.
* `metadata` (dict): Any metadata of the answer.
Example:
```json
{
"answer_id": "[short uuid]",
"question_id": 1,
"model_id": "vicuna-13b:v1",
"text": "Here are five tips...",
"metadata": {}
}
```
### Reviews
`review/xxx.jsonl` contains reviews given by reviewers, comparing peformance between a pair of models. Each row contains a record of a review with the following field:
* `review_id` (str): A unique UUID for a review. Reviews with different IDs is supposed to be different.
* `question_id` (int): The ID of the question the review is given for.
* `answer1_id` (str): The ID of the first answer.
* `answer2_id` (str): The ID of the second answer.
* `text` (str): The review text.
* `score` (list): A list of scores given by the reviewer. The first score is for the first answer, and the second score is for the second answer.
* `reviewer_id` (str): The ID of the reviewer.
* `metadata` (dict): Any metadata of the review.
```json
{
"review_id": "[short uuid]",
"question_id": 1,
"answer1_id": "[answer1_id]",
"answer2_id": "[answer2_id]",
"text": "Assistant 2 is better...",
"score": [9.0, 7.5],
"reviewer_id": "gpt-4-0328-default",
"metadata": {}
}
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment