Unverified Commit 7c6edc83 authored by pppppM's avatar pppppM Committed by GitHub
Browse files

add internlm url (#67)

parent f56f3d87
...@@ -74,6 +74,14 @@ pip install -e . ...@@ -74,6 +74,14 @@ pip install -e .
```shell ```shell
# 1. Download InternLM model # 1. Download InternLM model
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/internlm/internlm-7b /path/to/internlm-7b
# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1
# 2. Convert InternLM model to turbomind's format, which will be in "./workspace" by default # 2. Convert InternLM model to turbomind's format, which will be in "./workspace" by default
python3 -m lmdeploy.serve.turbomind.deploy internlm-7b /path/to/internlm-7b hf python3 -m lmdeploy.serve.turbomind.deploy internlm-7b /path/to/internlm-7b hf
......
...@@ -73,6 +73,14 @@ pip install -e . ...@@ -73,6 +73,14 @@ pip install -e .
```shell ```shell
# 1. 下载 InternLM 模型 # 1. 下载 InternLM 模型
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/internlm/internlm-7b /path/to/internlm-7b
# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1
# 2. 转换为 trubomind 要求的格式。默认存放路径为 ./workspace # 2. 转换为 trubomind 要求的格式。默认存放路径为 ./workspace
python3 -m lmdeploy.serve.turbomind.deploy internlm-7b /path/to/internlm-7b hf python3 -m lmdeploy.serve.turbomind.deploy internlm-7b /path/to/internlm-7b hf
......
...@@ -168,7 +168,7 @@ def main(model: str, ...@@ -168,7 +168,7 @@ def main(model: str,
save_path = out_dir / f'layers.{layer}.past_kv_scale.{tp}.weight' save_path = out_dir / f'layers.{layer}.past_kv_scale.{tp}.weight'
if symmetry: if symmetry:
# quant: q = f / scale # quant: q = f / scale
# dequant: f = q * scale # dequant: f = q * scale
k_scale = max(k_obs.buffer) / (2**(bits - 1) - 1) k_scale = max(k_obs.buffer) / (2**(bits - 1) - 1)
v_scale = max(v_obs.buffer) / (2**(bits - 1) - 1) v_scale = max(v_obs.buffer) / (2**(bits - 1) - 1)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment