llama_fastchat_pytorch

cc338b7c · zhaoying1 · cc338b7c · cc338b7c · cc338b7c · cc338b7c
Commit cc338b7c authored Sep 07, 2023 by zhaoying1
20 changed files
--- a/FastChat-main/.gitignore
+++ b/FastChat-main/.gitignore
+# Python
+__pycache__
+*.pyc
+*.egg-info
+dist
+# Log
+*.log
+*.json
+# Data
+!**/alpaca-data-conversation.json
+# Editor
+.idea
+*.swp
+# Other
+.DS_Store
+wandb
+output
--- a/FastChat-main/LICENSE
+++ b/FastChat-main/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/FastChat-main/README.md
+++ b/FastChat-main/README.md
+# FastChat
+An open platform for training, serving, and evaluating large language model based chatbots.
+## Release
+<p align="center">
+<a href="https://vicuna.lmsys.org"><img src="assets/vicuna-logo.jpeg" width="20%"></a>
+</p>
+- 🔥 We released **Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality**. Checkout the blog [post](https://vicuna.lmsys.org) and [demo](https://chat.lmsys.org/).
+<a href="https://chat.lmsys.org"><img src="assets/demo-narrow.gif" width="70%"></a>
+Join our [Discord](https://discord.gg/h6kCZb72G7) server and follow our [Twitter](https://twitter.com/lmsysorg) to get the latest updates.
+## Contents
+- [Install](#install)
+- [Vicuna Weights](#vicuna-weights)
+- [Inference with Command Line Interface](#inference-with-command-line-interface)
+- [Serving with Web GUI](#serving-with-web-gui)
+- [Evaluation](#evaluation)
+- [Fine-tuning](#fine-tuning)
+## Install
+### Method 1: With pip
+```bash
+# Install FastChat
+pip3 install fschat
+# Install the latest main branch of huggingface/transformers
+pip3 install git+https://github.com/huggingface/transformers
+```
+### Method 2: From source
+1. Clone this repository and navigate to the FastChat folder
+```bash
+git clone https://github.com/lm-sys/FastChat.git
+cd FastChat
+```
+If you are running on Mac:
+```bash
+brew install rust cmake
+```
+2. Install Package
+```bash
+pip3 install --upgrade pip  # enable PEP 660 support
+pip3 install -e .
+```
+## Vicuna Weights
+We release [Vicuna](https://vicuna.lmsys.org/) weights as delta weights to comply with the LLaMA model license.
+You can add our delta to the original LLaMA weights to obtain the Vicuna weights. Instructions:
+1. Get the original LLaMA weights in the huggingface format by following the instructions [here](https://huggingface.co/docs/transformers/main/model_doc/llama).
+2. Use the following scripts to get Vicuna weights by applying our delta. They will automatically download delta weights from our Hugging Face [account](https://huggingface.co/lmsys).
+**NOTE**:
+Our released weights are only compatible with the latest main branch of huggingface/transformers.
+We install the correct version of transformers when fastchat is installed.
+### Vicuna-13B
+This conversion command needs around 60 GB of CPU RAM.
+If you do not have enough memory, you can create a large swap file that allows the operating system to automatically utilize the disk as virtual memory.
+```bash
+python3 -m fastchat.model.apply_delta \
+    --base /path/to/llama-13b \
+    --target /output/path/to/vicuna-13b \
+    --delta lmsys/vicuna-13b-delta-v0
+```
+### Vicuna-7B
+This conversion command needs around 30 GB of CPU RAM.
+If you do not have enough memory, you can create a large swap file that allows the operating system to automatically utilize the disk as virtual memory.
+```bash
+python3 -m fastchat.model.apply_delta \
+    --base /path/to/llama-7b \
+    --target /output/path/to/vicuna-7b \
+    --delta lmsys/vicuna-7b-delta-v0
+```
+## Inference with Command Line Interface
+(Experimental Feature: You can specify `--style rich` to enable rich text output. This may not work on certain terminals.)
+#### Single GPU
+The command below requires around 28GB of GPU memory for Vicuna-13B and 14GB of GPU memory for Vicuna-7B.
+See the "No Enough Memory" section below if you do not have enough memory.
+```
+python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights
+```
+#### Multiple GPUs
+If you do not have enough GPU memory, you can use model parallelism to aggregate memory from multiple GPUs on the same machine.
+```
+python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights --num-gpus 2
+```
+#### CPU Only
+This runs on the CPU only and does not require GPU. It requires around 60GB of CPU memory for Vicuna-13B and around 30GB of CPU memory for Vicuna-7B.
+```
+python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights --device cpu
+```
+#### Metal Backend (Mac Computers with Apple Silicon or AMD GPUs)
+Use `--device mps` to enable GPU acceleration on Mac computers (requires torch >= 2.0).
+Use `--load-8bit` to turn on 8-bit compression.
+```
+python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights --device mps --load-8bit
+```
+Vicuna-7B can run on a 32GB M1 Macbook with 1 - 2 words / second.
+#### No Enough Memory or Other Platforms
+If you do not have enough memory, you can enable 8-bit compression by adding `--load-8bit` to commands above.
+This can reduce memory usage by around half with slightly degraded model quality.
+It is compatible with the CPU, GPU, and Metal backend.
+Vicuna-13B with 8-bit compression can run on a single NVIDIA 3090/4080/V100(16GB) GPU.
+```
+python3 -m fastchat.serve.cli --model-name /path/to/vicuna/weights --load-8bit
+```
+Besides, we are actively exploring more methods to make the model easier to run on more platforms.
+Contributions and pull requests are welcome.
+## Serving with Web GUI
+To serve using the web UI, you need three main components: web servers that interface with users, model workers that host one or more models, and a controller to coordinate the webserver and model workers. Here are the commands to follow in your terminal:
+#### Launch the controller
+```bash
+python3 -m fastchat.serve.controller
+```
+This controller manages the distributed workers.
+#### Launch the model worker
+```bash
+python3 -m fastchat.serve.model_worker --model-path /path/to/vicuna/weights
+```
+Wait until the process finishes loading the model and you see "Uvicorn running on ...". You can launch multiple model workers to serve multiple models concurrently. The model worker will connect to the controller automatically.
+To ensure that your model worker is connected to your controller properly, send a test message using the following command:
+```bash
+python3 -m fastchat.serve.test_message --model-name vicuna-13b
+```
+#### Launch the Gradio web server
+```bash
+python3 -m fastchat.serve.gradio_web_server
+```
+This is the user interface that users will interact with.
+By following these steps, you will be able to serve your models using the web UI. You can open your browser and chat with a model now.
+## Evaluation
+Our AI-enhanced evaluation pipeline is based on GPT-4. This section provides a high-level summary of the pipeline. For detailed instructions, please refer to the [evaluation](fastchat/eval) documentation.
+### Pipeline Steps
+1. Generate answers from different models: Use `qa_baseline_gpt35.py` for ChatGPT, or specify the model checkpoint and run `get_model_answer.py` for Vicuna and other models.
+2. Generate reviews with GPT-4: Use GPT-4 to generate reviews automatically. This step can also be performed manually if the GPT-4 API is not available to you.
+3. Generate visualization data: Run `generate_webpage_data_from_table.py` to generate data for a static website, which allows you to visualize the evaluation data.
+4. Visualize the data: Serve a static website under the `webpage` directory. You can use `python3 -m http.server` to serve the website locally.
+### Data Format and Contribution
+We use a data format encoded with JSON Lines for evaluation. The format includes information on models, prompts, reviewers, questions, answers, and reviews.
+You can customize the evaluation process or contribute to our project by accessing the relevant [data](fastchat/eval/table/).
+For detailed instructions, please refer to the [evaluation](fastchat/eval) documentation.
+## Fine-tuning
+### Data
+Vicuna is created by fine-tuning a LLaMA base model using approximately 70K user-shared conversations gathered from ShareGPT.com with public APIs. To ensure data quality, we convert the HTML back to markdown and filter out some inappropriate or low-quality samples. Additionally, we divide lengthy conversations into smaller segments that fit the model's maximum context length. For detailed instructions to clean the ShareGPT data, check out [here](docs/commands/data_cleaning.md).
+Due to some concerns, we may not release the data at the moment. If you would like to try the fine-tuning code, you can try to run it with our [preprocessed alpaca dataset](playground/data/alpaca-data-conversation.json) (originally from [here](https://github.com/tatsu-lab/stanford_alpaca)).
+### Code and Hyperparameters
+We fine-tune the model using the code from [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), with some modifications to support gradient checkpointing and [Flash Attention](https://github.com/HazyResearch/flash-attention). We use similar hyperparameters as the Stanford Alpaca.
+| Hyperparameter | Global Batch Size | Learning rate | Epochs | Max length | Weight decay |
+| --- | ---: | ---: | ---: | ---: | ---: |
+| Vicuna-13B | 128 | 2e-5 | 3 | 2048 | 0 |
+### Fine-tuning on Any Cloud with SkyPilot
+[SkyPilot](https://github.com/skypilot-org/skypilot) is a framework built by UC Berkeley for easily and cost effectively running ML workloads on any cloud (AWS, GCP, Azure, Lambda, etc.). 
+To use SkyPilot, install it with the following command and setup the cloud credentials locally following the instructions [here](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html).
+```bash
+# Install skypilot from the master branch
+pip install git+https://github.com/skypilot-org/skypilot.git
+```
+#### Vicuna
+Vicuna can be trained on 8 A100 GPUs with 80GB memory. The following command will automatically launch a node satisfying the requirement, setup and run the training job on it.
+```bash
+sky launch -c vicuna -s scripts/train-vicuna.yaml --env WANDB_API_KEY
+```
+Other options are also valid:
+```bash
+# Launch it on managed spot to save 3x cost (train Vicuna-13B with around $300)
+sky spot launch -n vicuna scripts/train-vicuna.yaml --env WANDB_API_KEY
+# Train a 7B model
+sky launch -c vicuna -s scripts/train-vicuna.yaml --env WANDB_API_KEY --env MODEL_SIZE=7
+```
+Note: Please make sure the `WANDB_API_KEY` has been setup on your local machine. You can find the API key on your [wandb profile page](https://wandb.ai/authorize). If you would like to train the model without using wandb, you can replace the `--env WANDB_API_KEY` flag with `--env WANDB_MODE=offline`.
+#### Alpaca
+Launch the training job with the following line (will be launched on a single node with 4 A100-80GB GPUs)
+```
+sky launch -c alpaca -s scripts/train-alpaca.yaml --env WANDB_API_KEY
+```
+### Fine-tuning with Local GPUs
+Vicuna can also be trained on 8 A100 GPUs with 80GB memory with the following code. To train on fewer GPUs, you can reduce the `per_device_train_batch_size` and increase the `gradient_accumulation_steps` accordingly to keep the global batch size the same. To setup the environment, please see the setup section in [scripts/train-vicuna.yaml](scripts/train-vicuna.yaml).
+```bash
+torchrun --nnodes=1 --nproc_per_node=8 --master_port=<your_random_port> \
+    fastchat/train/train_mem.py \
+    --model_name_or_path <path-to-llama-model-weight> \
+    --data_path <path-to-data> \
+    --bf16 True \
+    --output_dir ./checkpoints \
+    --num_train_epochs 3 \
+    --per_device_train_batch_size 4 \
+    --per_device_eval_batch_size 4 \
+    --gradient_accumulation_steps 1 \
+    --evaluation_strategy "no" \
+    --save_strategy "steps" \
+    --save_steps 1200 \
+    --save_total_limit 100 \
+    --learning_rate 2e-5 \
+    --weight_decay 0. \
+    --warmup_ratio 0.03 \
+    --lr_scheduler_type "cosine" \
+    --logging_steps 1 \
+    --fsdp "full_shard auto_wrap" \
+    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
+    --tf32 True \
+    --model_max_length 2048 \
+    --gradient_checkpointing True \
+    --lazy_preprocess True
+```
--- a/FastChat-main/assets/demo-narrow.gif
+++ b/FastChat-main/assets/demo-narrow.gif
--- a/FastChat-main/assets/vicuna-logo.jpeg
+++ b/FastChat-main/assets/vicuna-logo.jpeg
--- a/FastChat-main/docs/commands/data_cleaning.md
+++ b/FastChat-main/docs/commands/data_cleaning.md
+### Data cleaning
+```
+# Convert html to markdown
+python3 -m fastchat.data.clean_sharegpt --in sharegpt_20230322_html.json --out sharegpt_20230322_clean.json
+# Keep or remove specific languages
+python3 -m fastchat.data.optional_clean --in sharegpt_20230322_clean.json --out sharegpt_20230322_clean_lang.json --skip-lang SOME_LANGUAGE_CODE
+# Split long conversations
+python3 -m fastchat.data.split_long_conversation --in sharegpt_20230322_clean_lang.json --out sharegpt_20230322_clean_lang_split.json --model-name /home/ubuntu/model_weights/llama-7b/
+```
--- a/FastChat-main/docs/commands/local_cluster.md
+++ b/FastChat-main/docs/commands/local_cluster.md
+### Local GPU cluster (node-01)
+```
+python3 -m fastchat.serve.controller --host 0.0.0.0 --port 10002
+CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://localhost:10002 --port 31000 --worker http://localhost:31000
+CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://localhost:10002 --port 31001 --worker http://localhost:31001
+CUDA_VISIBLE_DEVICES=2 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/bair-chat-13b/ --controller http://localhost:10002 --port 31002 --worker http://localhost:31002
+CUDA_VISIBLE_DEVICES=3 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/alpaca-chat-13b/ --controller http://localhost:10002 --port 31003 --worker http://localhost:31003
+python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:10002
+```
+### Web server
+```
+python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001
+python3 -m fastchat.serve.register_worker --controller http://localhost:21001 --worker-name https://
+python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:21001
+python3 -m fastchat.serve.gradio_web_server --controller http://localhost:21001
+```
+### Local GPU cluster (node-02)
+```
+CUDA_VISIBLE_DEVICES=0 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://node-01:10002 --host 0.0.0.0 --port 31000 --worker http://$(hostname):31000
+CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://node-01:10002 --host 0.0.0.0 --port 31001 --worker http://$(hostname):31001
+CUDA_VISIBLE_DEVICES=2 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://node-01:10002 --host 0.0.0.0 --port 31002 --worker http://$(hostname):31002
+CUDA_VISIBLE_DEVICES=3 python3 -m fastchat.serve.model_worker --model-path ~/model_weights/vicuna-13b/ --controller http://node-01:10002 --host 0.0.0.0 --port 31003 --worker http://$(hostname):31003
+```
--- a/FastChat-main/docs/commands/pypi.md
+++ b/FastChat-main/docs/commands/pypi.md
+### Requirement
+```
+python3 -m pip install twine
+python3 -m pip install --upgrade pip
+pip3 install build
+```
+### Upload
+```
+bash scripts/upload_pypi.sh
+```
--- a/FastChat-main/docs/commands/webserver.md
+++ b/FastChat-main/docs/commands/webserver.md
+### Install
+```
+sudo apt update
+sudo apt install tmux htop
+wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
+bash Anaconda3-2022.10-Linux-x86_64.sh
+conda create -n fastchat python=3.9
+conda activate fastchat
+git clone https://github.com/lm-sys/FastChat.git
+cd FastChat
+pip3 install -e .
+# Install the latest main branch of huggingface/transformers
+pip3 install git+https://github.com/huggingface/transformers
+```
+### Launch servers
+```
+python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001
+python3 -m fastchat.serve.register_worker --controller http://localhost:21001 --worker-name https://
+python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://localhost:21001
+export OPENAI_API_KEY=
+python3 -m fastchat.serve.gradio_web_server --controller http://localhost:21001 --moderate --concurrency 20
+```
+### Increase the limit of max open files
+One process (do not need reboot)
+```
+sudo prlimit --nofile=1048576:1048576 --pid=$id
+```
+System (need reboot): Add the lines below to `/etc/security/limits.conf`
+```
+* hard nofile 65535
+* soft nofile 65535
+```
--- a/FastChat-main/fastchat/__init__.py
+++ b/FastChat-main/fastchat/__init__.py
--- a/FastChat-main/fastchat/constants.py
+++ b/FastChat-main/fastchat/constants.py
+CONTROLLER_HEART_BEAT_EXPIRATION = 2 * 60
+WORKER_HEART_BEAT_INTERVAL = 30
+LOGDIR = "."
--- a/FastChat-main/fastchat/conversation.py
+++ b/FastChat-main/fastchat/conversation.py
+import dataclasses
+from enum import auto, Enum
+from typing import List, Tuple, Any
+class SeparatorStyle(Enum):
+    """Different separator style."""
+    SINGLE = auto()
+    TWO = auto()
+@dataclasses.dataclass
+class Conversation:
+    """A class that keeps all conversation history."""
+    system: str
+    roles: List[str]
+    messages: List[List[str]]
+    offset: int
+    sep_style: SeparatorStyle = SeparatorStyle.SINGLE
+    sep: str = "###"
+    sep2: str = None
+    skip_next: bool = False
+    conv_id: Any = None
+    def get_prompt(self):
+        if self.sep_style == SeparatorStyle.SINGLE:
+            ret = self.system + self.sep
+            for role, message in self.messages:
+                if message:
+                    ret += role + ": " + message + self.sep
+                else:
+                    ret += role + ":"
+            return ret
+        elif self.sep_style == SeparatorStyle.TWO:
+            seps = [self.sep, self.sep2]
+            ret = self.system + seps[0]
+            for i, (role, message) in enumerate(self.messages):
+                if message:
+                    ret += role + ": " + message + seps[i % 2]
+                else:
+                    ret += role + ":"
+            return ret
+        else:
+            raise ValueError(f"Invalid style: {self.sep_style}")
+    def append_message(self, role, message):
+        self.messages.append([role, message])
+    def to_gradio_chatbot(self):
+        ret = []
+        for i, (role, msg) in enumerate(self.messages[self.offset:]):
+            if i % 2 == 0:
+                ret.append([msg, None])
+            else:
+                ret[-1][-1] = msg
+        return ret
+    def copy(self):
+        return Conversation(
+            system=self.system,
+            roles=self.roles,
+            messages=[[x, y] for x, y in self.messages],
+            offset=self.offset,
+            sep_style=self.sep_style,
+            sep=self.sep,
+            sep2=self.sep2,
+            conv_id=self.conv_id)
+    def dict(self):
+        return {
+            "system": self.system,
+            "roles": self.roles,
+            "messages": self.messages,
+            "offset": self.offset,
+            "sep": self.sep,
+            "sep2": self.sep2,
+            "conv_id": self.conv_id,
+        }
+conv_v1 = Conversation(
+    system="A chat between a curious human and an artificial intelligence assistant. "
+           "The assistant gives helpful, detailed, and polite answers to the human's questions.",
+    roles=("Human", "Assistant"),
+    messages=(
+        ("Human", "Give three tips for staying healthy."),
+        ("Assistant",
+            "Sure, here are three tips for staying healthy:\n"
+            "1. Exercise regularly: Regular physical activity can help improve your overall health and wellbeing. "
+            "It can also help reduce your risk of chronic conditions such as obesity, diabetes, heart disease, "
+            "and certain cancers. Aim for at least 150 minutes of moderate-intensity aerobic exercise or "
+            "75 minutes of vigorous-intensity aerobic exercise per week, along with muscle-strengthening "
+            "activities at least two days per week.\n"
+            "2. Eat a balanced diet: Eating a balanced diet that is rich in fruits, "
+            "vegetables, whole grains, lean proteins, and healthy fats can help support "
+            "your overall health. Try to limit your intake of processed and high-sugar foods, "
+            "and aim to drink plenty of water throughout the day.\n"
+            "3. Get enough sleep: Getting enough quality sleep is essential for your physical "
+            "and mental health. Adults should aim for seven to nine hours of sleep per night. "
+            "Establish a regular sleep schedule and try to create a relaxing bedtime routine to "
+            "help improve the quality of your sleep.")
+    ),
+    offset=2,
+    sep_style=SeparatorStyle.SINGLE,
+    sep="###",
+)
+conv_v1_2 = Conversation(
+    system="A chat between a curious human and an artificial intelligence assistant. "
+           "The assistant gives helpful, detailed, and polite answers to the human's questions.",
+    roles=("Human", "Assistant"),
+    messages=(
+        ("Human", "What are the key differences between renewable and non-renewable energy sources?"),
+        ("Assistant",
+            "Renewable energy sources are those that can be replenished naturally in a relatively "
+            "short amount of time, such as solar, wind, hydro, geothermal, and biomass. "
+            "Non-renewable energy sources, on the other hand, are finite and will eventually be "
+            "depleted, such as coal, oil, and natural gas. Here are some key differences between "
+            "renewable and non-renewable energy sources:\n"
+            "1. Availability: Renewable energy sources are virtually inexhaustible, while non-renewable "
+            "energy sources are finite and will eventually run out.\n"
+            "2. Environmental impact: Renewable energy sources have a much lower environmental impact "
+            "than non-renewable sources, which can lead to air and water pollution, greenhouse gas emissions, "
+            "and other negative effects.\n"
+            "3. Cost: Renewable energy sources can be more expensive to initially set up, but they typically "
+            "have lower operational costs than non-renewable sources.\n"
+            "4. Reliability: Renewable energy sources are often more reliable and can be used in more remote "
+            "locations than non-renewable sources.\n"
+            "5. Flexibility: Renewable energy sources are often more flexible and can be adapted to different "
+            "situations and needs, while non-renewable sources are more rigid and inflexible.\n"
+            "6. Sustainability: Renewable energy sources are more sustainable over the long term, while "
+            "non-renewable sources are not, and their depletion can lead to economic and social instability.\n")
+    ),
+    offset=2,
+    sep_style=SeparatorStyle.SINGLE,
+    sep="###",
+)
+conv_bair_v1 = Conversation(
+    system="BEGINNING OF CONVERSATION:",
+    roles=("USER", "GPT"),
+    messages=(),
+    offset=0,
+    sep_style=SeparatorStyle.TWO,
+    sep=" ",
+    sep2="</s>",
+)
+default_conversation = conv_v1_2
+conv_templates = {
+    "v1": conv_v1_2,
+    "bair_v1": conv_bair_v1,
+}
+if __name__ == "__main__":
+    print(default_conversation.get_prompt())
--- a/FastChat-main/fastchat/data/__init__.py
+++ b/FastChat-main/fastchat/data/__init__.py
--- a/FastChat-main/fastchat/data/alpaca-converter.py
+++ b/FastChat-main/fastchat/data/alpaca-converter.py
+import argparse
+import json
+import pathlib
+# Prompt from stanford alpaca's training script
+PROMPT_DICT = {
+    "prompt_input": (
+        "Below is an instruction that describes a task, paired with an input that provides further context. "
+        "Write a response that appropriately completes the request.\n\n"
+        "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
+    ),
+    "prompt_no_input": (
+        "Below is an instruction that describes a task. "
+        "Write a response that appropriately completes the request.\n\n"
+        "### Instruction:\n{instruction}\n\n### Response:"
+    ),
+}
+def main(args):
+    data_path = pathlib.Path(args.data_path)
+    with data_path.open() as f:
+        data = json.load(f)
+    prompt_input, prompt_no_input = PROMPT_DICT["prompt_input"], PROMPT_DICT["prompt_no_input"]
+    sources = [
+        prompt_input.format_map(example) if example.get("input", "") != "" else prompt_no_input.format_map(example)
+        for example in data
+    ]
+    targets = [example['output'] for example in data]
+    new_data = []
+    cnt = 1
+    for s, t in zip(sources, targets):
+        new_data.append({
+            'id': str(cnt),
+            'conversations': [
+                {
+                    'from': 'human',
+                    'value': s,
+                },
+                {
+                    'from': 'gpt',
+                    'value': t,
+                }
+            ]
+        })
+        cnt += 1
+    json.dump(new_data, open(args.output_path, 'w'), indent=2)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--data_path', type=str, default='alpaca-data.json')
+    parser.add_argument('--output_path', type=str, default='alpaca-data-conversation.json')
+    args = parser.parse_args()
+    main(args)
--- a/FastChat-main/fastchat/data/clean_sharegpt.py
+++ b/FastChat-main/fastchat/data/clean_sharegpt.py
+"""
+Convert html to markdown with basic data cleaning.
+Usage:
+python3 -m fastchat.data.clean_sharegpt --in sharegpt_html.json --out sharegpt_clean.json
+"""
+import argparse
+import json
+import logging
+import re
+from typing import Dict, Union
+import bs4
+import markdownify  # == 0.11.6
+import tqdm
+div_pattern = re.compile("<div.*?>")
+span_pattern = re.compile("<span.*?>")
+code_lang_pattern = re.compile("```\s*" + "(.*?)" + "(?:Copy code)+" + "(.+?)" + "\s*?```", re.DOTALL)
+code_lang_format = "```\g<1>\n\g<2>\n```"
+regenerate_pattern = re.compile("\d+ / \d+")
+copy_chars_pattern = re.compile("Copy\d+ chars / \d+ words")
+copy_code_pattern = re.compile("```(.*?)Copy code\s*```")
+def reformat_code(val: str) -> str:
+    # Input code format is:
+    # ```
+    # $<language>Copy code$<exact_code_here>
+    #
+    # ```
+    # This function convert it into the correct markdown format
+    return re.sub(code_lang_pattern, code_lang_format, val)
+def html_to_markdown(val: str) -> str:
+    # Remove all <div>. This is required to make intent work in code blocks.
+    val = re.sub(div_pattern, "", val)
+    # Remove all <span>. This is required to make underscores work in code blocks.
+    val = re.sub(span_pattern, "", val)
+    # Markdown to html
+    val = markdownify.markdownify(val).strip()
+    # Reformat code
+    val = reformat_code(val)
+    # Remove noisy "[number] / [number]" at the beginning
+    noise = re.search(regenerate_pattern, val)
+    if noise and noise.start() == 0:
+        val = val[noise.end():]
+    # Remove noisy "Copy[number] chars / [number] words"
+    val = re.sub(copy_chars_pattern, "", val)
+    # Remove empty code block ```\nCopy code\n```
+    val = re.sub(copy_code_pattern, "", val)
+    # Strip
+    val = val.replace("\n\n\n", "\n").strip()
+    if args.debug:
+        print(val)
+        exit()
+    return val
+def should_skip(val: str) -> bool:
+    black_list = ["openai", "chatgpt"]
+    for w in black_list:
+        if w in val.lower():
+            return True
+    return False
+def clean_html_source(content, begin, end, check_tag, check_num):
+    """
+    clean the input json content.
+    Args:
+        content: json file loaded in memory.
+        check_tag: a debug purpose arg. If a conversation contains the tag, log
+          it before and after cleaning.
+        check_num: number of matched conversations logged.
+    """
+    BARRIER = "\n" + "=" * 20 + "\n"
+    skip_cnt = 0
+    tag_cnt = 0
+    content = content[begin:end]
+    new_content = []
+    for sample in tqdm.tqdm(content):
+        skipped = False
+        if len(sample["conversations"]) <= 1:
+            # The conversation is too short
+            skipped = True
+        else:
+            for c in sample["conversations"]:
+                if should_skip(c["value"]):
+                    skipped = True
+                    break
+                try:
+                    new_val = html_to_markdown(c["value"])
+                except (bs4.builder.ParserRejectedMarkup, AssertionError):
+                    skipped = True
+                    break
+                c["value"] = new_val
+                # Debug
+                if (check_tag is not None and check_tag in c["value"]
+                        and tag_cnt < check_num):
+                    logging.debug(BARRIER + c["value"] + "\n" + BARRIER + new_val +
+                                  "\n" + BARRIER + "\n")
+                    tag_cnt += 1
+                    if tag_cnt == check_num:
+                        break
+        if not skipped:
+            new_content.append(sample)
+        else:
+            skip_cnt += 1
+    print(f"total: {len(content)}, skip: {skip_cnt}, new: {len(new_content)}")
+    return new_content
+def main(args):
+    content = json.load(open(args['in_file'], "r"))
+    content = clean_html_source(
+        content, args['begin'], args['end'],
+        args['check_tag'], args['check_num'])
+    json.dump(content, open(args['out_file'], "w"), indent=2)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--in-file", type=str, required=True)
+    parser.add_argument("--out-file", type=str, default="sharegpt_clean.json")
+    parser.add_argument("--begin", type=int)
+    parser.add_argument("--end", type=int)
+    parser.add_argument("--debug", action="store_true")
+    parser.add_argument("--check-tag", type=str)
+    parser.add_argument("--check-num", type=int, default=1)
+    args = parser.parse_args()
+    main(vars(args))
--- a/FastChat-main/fastchat/data/inspect.py
+++ b/FastChat-main/fastchat/data/inspect.py
+"""
+Usage:
+python3 -m fastchat.data.inspect --in sharegpt_20230322_clean_lang_split.json
+"""
+import argparse
+import json
+import tqdm
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--in-file", type=str, required=True)
+    parser.add_argument("--begin", type=int)
+    args = parser.parse_args()
+    content = json.load(open(args.in_file, "r"))
+    for sample in tqdm.tqdm(content[args.begin:]):
+        print(f"id: {sample['id']}")
+        for conv in sample["conversations"]:
+            print(conv["from"] + ": ")
+            print(conv["value"])
+            input()
--- a/FastChat-main/fastchat/data/optional_clean.py
+++ b/FastChat-main/fastchat/data/optional_clean.py
+"""
+Do optional cleaning (e.g., remove some languages).
+Usage:
+python3 -m fastchat.data.optional_clean --in input.json --out output.json --keep-lang en
+python3 -m fastchat.data.optional_clean --in input.json --out output.json --skip-lang en
+Requirement:
+pip3 install polyglot icu pyicu pycld2 morfessor
+"""
+import argparse
+import json
+import re
+import polyglot
+from polyglot.detect import Detector
+import pycld2
+from tqdm import tqdm
+def skip(conv, args):
+    # Remove certain languages
+    if args.keep_lang != "all" or args.skip_lang is not None:
+        text = "\n".join([x["value"] for x in conv["conversations"]])
+        try:
+            lang_code = Detector(text).language.code
+        except (pycld2.error, polyglot.detect.base.UnknownLanguage):
+            lang_code = "unknown"
+        if args.keep_lang != "all" and lang_code != args.keep_lang:
+            return True
+        if lang_code == args.skip_lang:
+            return True
+    # Remove repetitive numbers
+    if args.reduce_rep:
+        for sentence in conv["conversations"]:
+            val = sentence["value"]
+            sub = re.search(r"(\d)\1{8}", val)
+            if sub is not None:
+                return True
+    return False
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--in-file", type=str, required=True)
+    parser.add_argument("--out-file", type=str)
+    parser.add_argument("--keep-lang", type=str, default="all", choices=["all", "en"],
+        help="Only keep certain langauges.")
+    parser.add_argument("--skip-lang", type=str,
+        help="Skip a specific language.")
+    # NOTE: Be careful about reduce_rep which may remove some good data.
+    # For example, addresses could have long consecutive 0's
+    parser.add_argument("--reduce-rep", action="store_true")
+    args = parser.parse_args()
+    in_file = args.in_file
+    out_file = args.out_file
+    keep_lang = args.keep_lang
+    skip_lang = args.skip_lang
+    reduce_rep = args.reduce_rep
+    assert (keep_lang == "all" or skip_lang is None)
+    if out_file is None:
+        out_file = "sharegpt_clean"
+        if keep_lang != "all":
+            out_file += "_" + keep_lang
+        if skip_lang is not None:
+            out_file += "_skip_" + skip_lang
+        if reduce_rep:
+            out_file += "_reduce_rep"
+        out_file += ".json"
+    content = json.load(open(in_file, "r"))
+    num_conv = len(content)
+    new_content = []
+    for conv in tqdm(content):
+        if not skip(conv, args):
+            new_content.append(conv)
+    print(f"return {len(new_content)} out of {len(content)}, start dump ...")
+    json.dump(new_content, open(out_file, "w"), indent=2)
--- a/FastChat-main/fastchat/data/pretty_json.py
+++ b/FastChat-main/fastchat/data/pretty_json.py
+"""
+Usage:
+python3 pretty_json.py --in in.json --out out.json
+"""
+import argparse
+import json
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--in-file", type=str, required=True)
+    parser.add_argument("--out-file", type=str, required=True)
+    args = parser.parse_args()
+    with open(args.in_file, "r") as fin:
+        data = json.load(fin)
+    with open(args.out_file, "w") as fout:
+        json.dump(data, fout, indent=2)
--- a/FastChat-main/fastchat/data/split_long_conversation.py
+++ b/FastChat-main/fastchat/data/split_long_conversation.py
+"""
+Split long conversations based on certain max length.
+Usage: python3 -m fastchat.data.split_long_conversation \
+    --in sharegpt_clean.json \
+    --out sharegpt_split.json \
+    --model-name-or-path $<model-name>
+"""
+import argparse
+import json
+from typing import Dict, Sequence, Optional
+import transformers
+import tqdm
+from fastchat import conversation as conversation_lib
+DEFAULT_PAD_TOKEN = "[PAD]"
+BEGIN_SIGNAL = "### "
+END_SIGNAL = "\n"
+def split_sample(sample, start_idx, end_idx):
+    # only ends in the bot because otherwise the last human part is useless.
+    end_speaker = sample["conversations"][end_idx]["from"]
+    end_idx = end_idx + 1 if end_speaker != "human" else end_idx
+    return {
+        "id": sample["id"] + "_" + str(start_idx),
+        "conversations": sample["conversations"][start_idx:end_idx]
+    }
+def split_contents(content, begin, end, tokenizer, max_length):
+    """
+    Keep the maximum round of conversations within the max token length constraint
+    """
+    content = content[begin:end]
+    new_content = []
+    for sample in tqdm.tqdm(content):
+        tokenized_lens = []
+        for c in sample["conversations"]:
+            from_str = c["from"]
+            if from_str.lower() == "human":
+                from_str = conversation_lib.default_conversation.roles[0]
+            elif from_str.lower() == "gpt":
+                from_str = conversation_lib.default_conversation.roles[1]
+            else:
+                from_str = 'unknown'
+            sentence = (BEGIN_SIGNAL + from_str + ": " + c["value"] +
+                        END_SIGNAL)
+            length = tokenizer(sentence, return_tensors="pt", padding="longest"
+                ).input_ids.ne(tokenizer.pad_token_id).sum().item()
+            tokenized_lens.append(length)
+        num_tokens = 0
+        start_idx = 0
+        for idx, l in enumerate(tokenized_lens):
+            # TODO: shall we also only starts from a specific speaker?
+            if num_tokens + l > max_length:
+                new_content.append(split_sample(sample, start_idx, idx))
+                start_idx = idx
+                num_tokens = l
+            else:
+                num_tokens += l
+                if idx == len(tokenized_lens) - 1:
+                    new_content.append(split_sample(sample, start_idx, idx))
+    print(f"total: {len(content)}, new: {len(new_content)}")
+    return new_content
+def main(args):
+    content = json.load(open(args.in_file, "r"))
+    tokenizer = transformers.AutoTokenizer.from_pretrained(
+        args.model_name_or_path,
+        model_max_length=args.max_length,
+        padding_side="right",
+        use_fast=False,
+    )
+    if tokenizer.pad_token is None:
+        tokenizer.add_special_tokens(dict(pad_token=DEFAULT_PAD_TOKEN))
+    content = split_contents(content, args.begin, args.end,
+        tokenizer, args.max_length)
+    json.dump(content, open(args.out_file, "w"), indent=2)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--in-file", type=str, required=True)
+    parser.add_argument("--out-file", type=str, default="sharegpt_split.json")
+    parser.add_argument("--begin", type=int)
+    parser.add_argument("--end", type=int)
+    parser.add_argument("--model-name-or-path", type=str, required=True)
+    parser.add_argument("--max-length", type=int, default=2304)
+    args = parser.parse_args()
+    main(args)
--- a/FastChat-main/fastchat/eval/README.md
+++ b/FastChat-main/fastchat/eval/README.md
+# Evaluations
+This directory contains end-to-end pipelines for AI-enhanced evaluation. We will introduce the evaluation pipeline and the data format in this document.
+## Generate Answers
+### ChatGPT (gpt-3.5-turbo)
+Make sure you have setup the OpenAI API Key in your environment. Then run:
+```bash
+python qa_baseline_gpt35.py --question table/question.jsonl --output table/answer/answer_gpt35.jsonl
+```
+### Bard
+Unfortunately, Bard has not release its public APIs till now. You may have to enter the anwsers manually. Or you could find a third-party project that interfaces with Bard.
+### Vicuna and others
+To generate answers with Vicuna or other models, specify path to the model checkpoint, a desired model ID and run:
+```bash
+python get_model_answer.py --model-id [MODEL-ID] --model-path /model/path --question-file tables/question.jsonl --answer-file table/answer/answer.jsonl --num-gpus [NUM-GPUS]
+```
+Then the answers to the questions will be saved in `table/answer/answer.jsonl`.
+Note: we assume the model can be loaded with a single GPU.
+## Evaluate Answers Automatically
+### Generete Reviews with GPT-4
+Note: Below script requires access to GPT-4 API. If you only have access to GPT-4 on web interface, you can evaluate the answers by manually formatting the prompt. See more details in the **Reviewers** and **Prompts** sections in **Data Format**.
+It is critical to follow the prompt templates; otherwise GPT-4 may not give fair reviews. `table/review/*.jsonl` are some review examples generated by GPT-4 or you can view them on our eval [webpage](https://vicuna.lmsys.org/eval/).
+To use the script for generating reviews with GPT-4, you need to `export` your OpenAI API key in environment variable. Then run:
+```bash
+python eval_gpt_review.py -q table/question.jsonl -a /path/to/answer_1.jsonl /path/to/answer_2.jsonl -p table/prompt.jsonl -r table/reviewer.jsonl -o /path/to/review_output.jsonl
+```
+The GPT-4 reviews will be saved in `/path/to/review_output.jsonl`. Note: we implement some simple parsing code to extract the score pairs from GPT-4's reviews. However, you need to double check whether the parsed score pair are correct. Sometime the parsing logic may fail if GPT-4 doesn't give a structured answer.
+## Visualize Results
+You can generate the data for the webpage by running:
+```bash
+python eval/generate_webpage_data_from_table.py
+```
+Then you can serve a static website in `webpage` to see the results.
+## Data Format
+If you want to have a deeper understanding of our evaluation pipeline or want to contribute to the evaluation process, you need to learn the data format we used for evaluation.
+Our evaluation data are encoded with [JSON Lines](https://jsonlines.org/).
+### Random ID Generation
+We use the `shortuuid` Python library for generating short random UUIDs.
+```python
+import shortuuid
+shortuuid.uuid() -> str
+```
+### Models
+`model.jsonl` contains model information we used for generating anwsers.
+Each row contains a record of a model with the following field:
+* `model_id` (str): A unique ID for a model. Models with different IDs is supposed to have different performance. This ID is generated by `{model_name}:{model_version}`.
+* `model_name` (str): The name of a model. This is not unique, because a model could be trained and updated continuously, but it is still considered as the same model with different versions.
+* `model_version` (str): The version of a model.
+* `model_metadata` (Any): Any metadata of a model (descriptions etc). This is optional.
+For example:
+```json
+{
+  "model_id": "vicuna-13b:v1",
+  "model_name": "vicuna-13b",
+  "model_version": "v1",
+  "model_metadata": "learning rate 1e-5, 3 epochs, 13b"
+}
+```
+### Prompts
+We store prompts in `prompt.jsonl`. Each row contains a record of a prompt with the following field:
+* `prompt_id` (int): A unique integer ID for a prompt. Prompts with different IDs are supposed to have different purpose.
+* `system_prompt` (str): The system prompt given to a model. This is the prompt that the model sees first.
+* `prompt_template` (str): The prompt body. This is the user prompt that the model sees after the system prompt. It is a Python f-string template, so that we can fill in the inputs later.
+* `defaults` (dict): A dictionary of default values for the prompt template. It can be empty.
+* `description` (str): A description of the functionality of the prompt.
+For example:
+```json
+{
+  "prompt_id": 1,
+  "system_prompt": "You are a helpful assistant.",
+  "prompt_template": "[Question]\n{question}\n\n[Assistant 1]\n{answer_1}\n\n[End of Assistant 1]\n\n[Assistant 2]\n{answer_2}\n\n[End of Assistant 2]\n\n[System]\n{prompt}\n\n",
+  "defaults": {"prompt": "Which assistant is more helpful?"},
+  "description": "Compare two assistants' answers to a question."
+}
+```
+### Reviewers
+`reviewer.jsonl` contains reviewer information we used for reviewing answers generated by different models. Each row contains a record of a reviewer with the following field:
+* `reviewer_id` (str): A unique ID for a reviewer. Reviewers with different IDs is supposed to have different reviewing performance.
+* `prompt_id` (str): The ID of the prompt given to the reviewer (e.g., an AI assistant). Different prompts could result in different reviewing performance.
+* `metadata` (dict): Metadata of a reviewer about its configurations.
+* `description` (str): A description of the reviewer.
+* `category` (str): The category that the reviewer belongs to.
+For example:
+```json
+{
+  "reviewer_id": "gpt-4-0328-default",
+  "prompt_id": 1,
+  "temperature": 0.2,
+  "max_tokens": 8192,
+  "description": "GPT-4 for general questions.",
+  "category": "general"
+}
+```
+### Questions
+`question.jsonl` contains questions we used for evaluation. Each row contains a record of a question with the following field:
+* `question_id` (int): A unique integer for a question. Questions with different IDs is supposed to be different.
+* `text` (str): The question text.
+* `category` (str): The category of the question. Questions with the same category are supposed to be similar or originate from the same source.
+### Answers
+`answer/xxx.jsonl` contains answers generated by different models. Each row contains a record of an answer with the following field:
+* `answer_id` (str): A unique UUID for an answer. Answers with different IDs is supposed to be different.
+* `question_id` (int): The ID of the question the answer is generated for.
+* `model_id` (str): The ID of the model the answer is generated by.
+* `text` (str): The answer text.
+* `metadata` (dict): Any metadata of the answer.
+Example:
+```json
+{
+  "answer_id": "[short uuid]",
+  "question_id": 1,
+  "model_id": "vicuna-13b:v1",
+  "text": "Here are five tips...",
+  "metadata": {}
+}
+```
+### Reviews
+`review/xxx.jsonl` contains reviews given by reviewers, comparing peformance between a pair of models. Each row contains a record of a review with the following field:
+* `review_id` (str): A unique UUID for a review. Reviews with different IDs is supposed to be different.
+* `question_id` (int): The ID of the question the review is given for.
+* `answer1_id` (str): The ID of the first answer.
+* `answer2_id` (str): The ID of the second answer.
+* `text` (str): The review text.
+* `score` (list): A list of scores given by the reviewer. The first score is for the first answer, and the second score is for the second answer.
+* `reviewer_id` (str): The ID of the reviewer.
+* `metadata` (dict): Any metadata of the review.
+```json
+{
+  "review_id": "[short uuid]",
+  "question_id": 1,
+  "answer1_id": "[answer1_id]",
+  "answer2_id": "[answer2_id]",
+  "text": "Assistant 2 is better...",
+  "score": [9.0, 7.5],
+  "reviewer_id": "gpt-4-0328-default",
+  "metadata": {}
+}
+```