Commit 000d7bab authored by luopl's avatar luopl
Browse files

Update README.md

parent 185e8d8c
......@@ -104,28 +104,21 @@ pip install e .
运行推理时会自动连接huggingface下载最新数据集于缓存目录,如无法连接到huggingface,可通过export HF_ENDPOINT=https://hf-mirror.com 设置镜像地址
```
#数据集格式
hellaswag/
└── default
└── 0.1.0
├── 362ac471216900f3f7c021863caac4eb7886347d0f76d90b6b4361f59ffea4d7
│ ├── cache-081f361bf081c0bf.arrow
│ ├── cache-5d43362a7601c065.arrow
openbookqa/
└── main
└── 0.0.0
├── 388097ea7776314e93a529163e0fea805b8a6454
│ ├── dataset_info.json
│ ├── hellaswag-test.arrow
│ ├── hellaswag-train.arrow
│ └── hellaswag-validation.arrow
├── 362ac471216900f3f7c021863caac4eb7886347d0f76d90b6b4361f59ffea4d7_builder.lock
└── 362ac471216900f3f7c021863caac4eb7886347d0f76d90b6b4361f59ffea4d7.incomplete_info.lock
│ ├── openbookqa-test.arrow
│ ├── openbookqa-train.arrow
│ └── openbookqa-validation.arrow
├── 388097ea7776314e93a529163e0fea805b8a6454_builder.lock
└── 388097ea7776314e93a529163e0fea805b8a6454.incomplete_info.lock
...
```
也可下载离线数据,放于缓存目录~/.cache/huggingface/datasets/,根据自己的缓存地址存放:
数据集SCNet快速下载链接[datasets](http://113.200.138.88:18080/aidatasets/mamba2_data_test)
## 训练
......@@ -134,7 +127,7 @@ hellaswag/
## 推理
运行推理时会自动连接huggingface下载模型文件,也可使用modelscope提前下载相关模型文件到缓存目录, 使用本地修改pretrained=/path_to_model/model_name
运行推理时会自动连接huggingface下载模型文件,也可使用[镜像网站](https://hf-mirror.com/)提前下载相关模型文件到缓存目录, 使用本地修改pretrained=/path_to_model/model_name
模型权重SCNet下载链接[models](http://113.200.138.88:18080/aimodels/state-spaces)
......@@ -145,15 +138,15 @@ Evaluate:
To run evaluations on Mamba-1 models
```
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba-130m --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba-130m --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
```
To run evaluations on Mamba-2 models:
```
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2-2.7b --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/transformerpp-2.7b --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2attn-2.7b --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2-2.7b --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/transformerpp-2.7b --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2attn-2.7b --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
```
Inference :
......@@ -175,7 +168,7 @@ python benchmarks/benchmark_generation_mamba_simple.py --model-name "state-space
多卡推理使用accelerate,样例如下:
```
HIP_VISIBLE_DEVICES=0,1 accelerate launch -m lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2-2.7b --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
HIP_VISIBLE_DEVICES=0,1 accelerate launch -m lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba2-2.7b --tasks lambada_openai,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256
```
......@@ -196,35 +189,29 @@ state-spaces/mamba2-2.7b result:
mamba_ssm (pretrained=state-spaces/mamba-130m), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 256
| Tasks |Version|Filter|n-shot| Metric | Value | |Stderr|
|--------------|------:|------|-----:|----------|------:|---|-----:|
|winogrande | 1|none | 0|acc | 0.5217|± |0.0140|
|piqa | 1|none | 0|acc | 0.6458|± |0.0112|
| | |none | 0|acc_norm | 0.6306|± |0.0113|
|openbookqa | 1|none | 0|acc | 0.1700|± |0.0168|
| | |none | 0|acc_norm | 0.2880|± |0.0203|
|lambada_openai| 1|none | 0|perplexity|16.0456|± |0.5091|
| | |none | 0|acc | 0.4428|± |0.0069|
|hellaswag | 1|none | 0|acc | 0.3079|± |0.0046|
| | |none | 0|acc_norm | 0.3522|± |0.0048|
|arc_easy | 1|none | 0|acc | 0.4794|± |0.0103|
| | |none | 0|acc_norm | 0.4205|± |0.0101|
|openbookqa | 1|none | 0|acc | 0.1680|± |0.0167|
| | |none | 0|acc_norm | 0.2860|± |0.0202|
|lambada_openai| 1|none | 0|perplexity|16.0435|± |0.5091|
| | |none | 0|acc | 0.4421|± |0.0069|
|arc_easy | 1|none | 0|acc | 0.4785|± |0.0103|
| | |none | 0|acc_norm | 0.4209|± |0.0101|
|arc_challenge | 1|none | 0|acc | 0.1988|± |0.0117|
| | |none | 0|acc_norm | 0.2457|± |0.0126|
| | |none | 0|acc_norm | 0.2449|± |0.0126|
mamba_ssm (pretrained=state-spaces/mamba2-2.7b)
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|winogrande | 1|none | 0|acc |0.6385|± |0.0135|
|piqa | 1|none | 0|acc |0.7628|± |0.0099|
| | |none | 0|acc_norm |0.7617|± |0.0099|
|openbookqa | 1|none | 0|acc |0.2940|± |0.0204|
| | |none | 0|acc_norm |0.3880|± |0.0218|
|lambada_openai| 1|none | 0|perplexity|4.0934|± |0.0888|
| | |none | 0|acc |0.6951|± |0.0064|
|hellaswag | 1|none | 0|acc |0.4961|± |0.0050|
| | |none | 0|acc_norm |0.6660|± |0.0047|
|arc_easy | 1|none | 0|acc |0.6957|± |0.0094|
| | |none | 0|acc_norm |0.6481|± |0.0098|
|arc_challenge | 1|none | 0|acc |0.3328|± |0.0138|
......@@ -236,14 +223,10 @@ mamba_ssm (pretrained=state-spaces/mamba2attn-2.7b), gen_kwargs: (None), limit:
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|winogrande | 1|none | 0|acc |0.6519|± |0.0134|
|piqa | 1|none | 0|acc |0.7573|± |0.0100|
| | |none | 0|acc_norm |0.7584|± |0.0100|
|openbookqa | 1|none | 0|acc |0.3040|± |0.0206|
| | |none | 0|acc_norm |0.3900|± |0.0218|
|lambada_openai| 1|none | 0|perplexity|3.8497|± |0.0810|
| | |none | 0|acc |0.7105|± |0.0063|
|hellaswag | 1|none | 0|acc |0.5029|± |0.0050|
| | |none | 0|acc_norm |0.6776|± |0.0047|
|arc_easy | 1|none | 0|acc |0.6987|± |0.0094|
| | |none | 0|acc_norm |0.6633|± |0.0097|
|arc_challenge | 1|none | 0|acc |0.3447|± |0.0139|
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment