Modify README to include info on loading LLaMA (#18)

e3f00d19 · Zhuohan Li · GitHub · 09e92454 · e3f00d19
Unverified Commit e3f00d19 authored Apr 01, 2023 by Zhuohan Li Committed by GitHub Apr 01, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 16 additions and 0 deletions

README.md README.md +16 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -53,3 +53,19 @@ python -m cacheflow.http_frontend.fastapi_frontend
 # At another terminal
 python -m cacheflow.http_frontend.gradio_webserver
 ```
+
+## Load LLaMA weights
+
+Since LLaMA weight is not fully public, we cannot directly download the LLaMA weights from huggingface. Therefore, you need to follow the following process to load the LLaMA weights.
+
+1. Converting LLaMA weights to huggingface format with [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py).
+    ```bash
+    python src/transformers/models/llama/convert_llama_weights_to_hf.py \
+        --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path/llama-7b
+    ```
+    Please make sure that `llama` is included in the output directory name.
+2. For all the commands above, specify the model with `--model /output/path/llama-7b` to load the model. For example:
+    ```bash
+    python simple_server.py --model /output/path/llama-7b
+    python -m cacheflow.http_frontend.fastapi_frontend --model /output/path/llama-7b
+    ```