Commit 000d7bab authored by luopl's avatar luopl
Browse files

Update README.md

parent 185e8d8c
...@@ -104,28 +104,21 @@ pip install e . ...@@ -104,28 +104,21 @@ pip install e .
运行推理时会自动连接huggingface下载最新数据集于缓存目录,如无法连接到huggingface,可通过export HF_ENDPOINT=https://hf-mirror.com 设置镜像地址 运行推理时会自动连接huggingface下载最新数据集于缓存目录,如无法连接到huggingface,可通过export HF_ENDPOINT=https://hf-mirror.com 设置镜像地址
``` ```
#数据集格式 #数据集格式
hellaswag/ openbookqa/
└── default └── main
└── 0.1.0 └── 0.0.0
├── 362ac471216900f3f7c021863caac4eb7886347d0f76d90b6b4361f59ffea4d7 ├── 388097ea7776314e93a529163e0fea805b8a6454
│ ├── cache-081f361bf081c0bf.arrow
│ ├── cache-5d43362a7601c065.arrow
│ ├── dataset_info.json │ ├── dataset_info.json
│ ├── hellaswag-test.arrow │ ├── openbookqa-test.arrow
│ ├── hellaswag-train.arrow │ ├── openbookqa-train.arrow
│ └── hellaswag-validation.arrow │ └── openbookqa-validation.arrow
├── 362ac471216900f3f7c021863caac4eb7886347d0f76d90b6b4361f59ffea4d7_builder.lock ├── 388097ea7776314e93a529163e0fea805b8a6454_builder.lock
└── 362ac471216900f3f7c021863caac4eb7886347d0f76d90b6b4361f59ffea4d7.incomplete_info.lock └── 388097ea7776314e93a529163e0fea805b8a6454.incomplete_info.lock
... ...
``` ```
也可下载离线数据,放于缓存目录~/.cache/huggingface/datasets/,根据自己的缓存地址存放:
数据集SCNet快速下载链接[datasets](http://113.200.138.88:18080/aidatasets/mamba2_data_test)
## 训练 ## 训练
...@@ -134,7 +127,7 @@ hellaswag/ ...@@ -134,7 +127,7 @@ hellaswag/
## 推理 ## 推理
运行推理时会自动连接huggingface下载模型文件,也可使用modelscope提前下载相关模型文件到缓存目录, 使用本地修改pretrained=/path_to_model/model_name 运行推理时会自动连接huggingface下载模型文件,也可使用[镜像网站](https://hf-mirror.com/)提前下载相关模型文件到缓存目录, 使用本地修改pretrained=/path_to_model/model_name
模型权重SCNet下载链接[models](http://113.200.138.88:18080/aimodels/state-spaces) 模型权重SCNet下载链接[models](http://113.200.138.88:18080/aimodels/state-spaces)
...@@ -145,15 +138,15 @@ Evaluate: ...@@ -145,15 +138,15 @@ Evaluate:
To run evaluations on Mamba-1 models To run evaluations on Mamba-1 models
``` ```
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba-130m --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256 lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba-130m --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
``` ```
To run evaluations on Mamba-2 models: To run evaluations on Mamba-2 models:
``` ```
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2-2.7b --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256 lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2-2.7b --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/transformerpp-2.7b --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256 lm_eval --model mamba_ssm --model_args pretrained=state-spaces/transformerpp-2.7b --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2attn-2.7b --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256 lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2attn-2.7b --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
``` ```
Inference : Inference :
...@@ -175,7 +168,7 @@ python benchmarks/benchmark_generation_mamba_simple.py --model-name "state-space ...@@ -175,7 +168,7 @@ python benchmarks/benchmark_generation_mamba_simple.py --model-name "state-space
多卡推理使用accelerate,样例如下: 多卡推理使用accelerate,样例如下:
``` ```
HIP_VISIBLE_DEVICES=0,1 accelerate launch -m lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2-2.7b --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256 HIP_VISIBLE_DEVICES=0,1 accelerate launch -m lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2-2.7b --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
``` ```
...@@ -196,35 +189,29 @@ state-spaces/mamba2-2.7b result: ...@@ -196,35 +189,29 @@ state-spaces/mamba2-2.7b result:
mamba_ssm (pretrained=state-spaces/mamba-130m), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 256 mamba_ssm (pretrained=state-spaces/mamba-130m), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 256
| Tasks |Version|Filter|n-shot| Metric | Value | |Stderr| | Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:| |--------------|------:|------|-----:|----------|------:|---|-----:|
|winogrande | 1|none | 0|acc | 0.5217|± |0.0140| |winogrande | 1|none | 0|acc | 0.5217|± |0.0140|
|piqa | 1|none | 0|acc | 0.6458|± |0.0112| |openbookqa | 1|none | 0|acc | 0.1680|± |0.0167|
| | |none | 0|acc_norm | 0.6306|± |0.0113| | | |none | 0|acc_norm | 0.2860|± |0.0202|
|openbookqa | 1|none | 0|acc | 0.1700|± |0.0168| |lambada_openai| 1|none | 0|perplexity|16.0435|± |0.5091|
| | |none | 0|acc_norm | 0.2880|± |0.0203| | | |none | 0|acc | 0.4421|± |0.0069|
|lambada_openai| 1|none | 0|perplexity|16.0456|± |0.5091| |arc_easy | 1|none | 0|acc | 0.4785|± |0.0103|
| | |none | 0|acc | 0.4428|± |0.0069| | | |none | 0|acc_norm | 0.4209|± |0.0101|
|hellaswag | 1|none | 0|acc | 0.3079|± |0.0046|
| | |none | 0|acc_norm | 0.3522|± |0.0048|
|arc_easy | 1|none | 0|acc | 0.4794|± |0.0103|
| | |none | 0|acc_norm | 0.4205|± |0.0101|
|arc_challenge | 1|none | 0|acc | 0.1988|± |0.0117| |arc_challenge | 1|none | 0|acc | 0.1988|± |0.0117|
| | |none | 0|acc_norm | 0.2457|± |0.0126| | | |none | 0|acc_norm | 0.2449|± |0.0126|
mamba_ssm (pretrained=state-spaces/mamba2-2.7b) mamba_ssm (pretrained=state-spaces/mamba2-2.7b)
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:| |--------------|------:|------|-----:|----------|-----:|---|-----:|
|winogrande | 1|none | 0|acc |0.6385|± |0.0135| |winogrande | 1|none | 0|acc |0.6385|± |0.0135|
|piqa | 1|none | 0|acc |0.7628|± |0.0099|
| | |none | 0|acc_norm |0.7617|± |0.0099|
|openbookqa | 1|none | 0|acc |0.2940|± |0.0204| |openbookqa | 1|none | 0|acc |0.2940|± |0.0204|
| | |none | 0|acc_norm |0.3880|± |0.0218| | | |none | 0|acc_norm |0.3880|± |0.0218|
|lambada_openai| 1|none | 0|perplexity|4.0934|± |0.0888| |lambada_openai| 1|none | 0|perplexity|4.0934|± |0.0888|
| | |none | 0|acc |0.6951|± |0.0064| | | |none | 0|acc |0.6951|± |0.0064|
|hellaswag | 1|none | 0|acc |0.4961|± |0.0050|
| | |none | 0|acc_norm |0.6660|± |0.0047|
|arc_easy | 1|none | 0|acc |0.6957|± |0.0094| |arc_easy | 1|none | 0|acc |0.6957|± |0.0094|
| | |none | 0|acc_norm |0.6481|± |0.0098| | | |none | 0|acc_norm |0.6481|± |0.0098|
|arc_challenge | 1|none | 0|acc |0.3328|± |0.0138| |arc_challenge | 1|none | 0|acc |0.3328|± |0.0138|
...@@ -236,14 +223,10 @@ mamba_ssm (pretrained=state-spaces/mamba2attn-2.7b), gen_kwargs: (None), limit: ...@@ -236,14 +223,10 @@ mamba_ssm (pretrained=state-spaces/mamba2attn-2.7b), gen_kwargs: (None), limit:
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:| |--------------|------:|------|-----:|----------|-----:|---|-----:|
|winogrande | 1|none | 0|acc |0.6519|± |0.0134| |winogrande | 1|none | 0|acc |0.6519|± |0.0134|
|piqa | 1|none | 0|acc |0.7573|± |0.0100|
| | |none | 0|acc_norm |0.7584|± |0.0100|
|openbookqa | 1|none | 0|acc |0.3040|± |0.0206| |openbookqa | 1|none | 0|acc |0.3040|± |0.0206|
| | |none | 0|acc_norm |0.3900|± |0.0218| | | |none | 0|acc_norm |0.3900|± |0.0218|
|lambada_openai| 1|none | 0|perplexity|3.8497|± |0.0810| |lambada_openai| 1|none | 0|perplexity|3.8497|± |0.0810|
| | |none | 0|acc |0.7105|± |0.0063| | | |none | 0|acc |0.7105|± |0.0063|
|hellaswag | 1|none | 0|acc |0.5029|± |0.0050|
| | |none | 0|acc_norm |0.6776|± |0.0047|
|arc_easy | 1|none | 0|acc |0.6987|± |0.0094| |arc_easy | 1|none | 0|acc |0.6987|± |0.0094|
| | |none | 0|acc_norm |0.6633|± |0.0097| | | |none | 0|acc_norm |0.6633|± |0.0097|
|arc_challenge | 1|none | 0|acc |0.3447|± |0.0139| |arc_challenge | 1|none | 0|acc |0.3447|± |0.0139|
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment