# Parse arguments to identify the path to the logs from
# the performance runs
parser=argparse.ArgumentParser()
parser.add_argument(
"--results_dir",
"-r",
help="Specifies the path to the corresponding results directory that contains the performance subdirectories containing the submission logs, i.e. inference_results_v0.7/closed/NVIDIA/results/T4x8/resnet/Offline.",
required=True,
)
parser.add_argument(
"--compliance_dir",
"-c",
help="Specifies the path to the directory containing the logs from the compliance test run.",
required=True,
)
parser.add_argument(
"--output_dir",
"-o",
help="Specifies the path to the output directory where compliance logs will be uploaded from, i.e. inference_results_v0.7/closed/NVIDIA/compliance/T4x8/resnet/Offline.",
required=True,
)
args=parser.parse_args()
print("Parsing arguments.")
results_dir=args.results_dir
compliance_dir=args.compliance_dir
output_dir=os.path.join(args.output_dir,"TEST04")
# run verify performance
verify_performance_binary=os.path.join(
os.path.dirname(__file__),"verify_performance.py"
)
verify_performance_command=(
"python3 "
+verify_performance_binary
+" -r "
+results_dir
+"/performance/run_1/mlperf_log_summary.txt"
+" -t "
+compliance_dir
+"/mlperf_log_summary.txt | tee verify_performance.txt"
# Test 06 - Verify consistency of the output of the Llama-v2-70b
This repository provides the config files and scripts to run and verify TEST 06 - Verify consistency of the Llama-v2-70b output.
# Table of contents
1.[Introduction](#introduction)
2.[Requisites](#Requisites)
2.[Instructions](#instructions)
## Introduction
The purpose of this test is to ensure the consistency of the output of the LLM (Llama2 and Mixtral) model and avoid a potential EOS exploit. This test will make a performance run, with a limit of 100 samples and logging them into `mlperf_log_accuracy.json`. To achieve a passing result in this test, three criteria must be met:
- In the case the first token is reported independently (not applicable for Offline scenario), it should match for every query with the first token of the model output.
- For each query, the model output should only end with zero or one EOS token.
- The number of reported tokens should match with the length of output sequence.
## Requisites
For this test, you need to be able to run the `Llama2-70b` benchmark. Therefore all it's requirements are also required for this test. Additionally, you need to have `numpy` installed.
```
pip install numpy
```
## Instructions
### Part I
Run the benchmark with the provided audit.config in the corresponding subdirectory. Note that audit.config must be copied to the directory where the benchmark is being run from. Verification that audit.config was properly read can be done by checking that loadgen has found audit.config in mlperf_log_detail.txt
- COMPLIANCE_DIR: Specifies the path to the directory containing the logs from the compliance test run.
- OUTPUT_DIR: Specifies the path to the output directory where compliance logs will be uploaded from, i.e. `inference_results_v0.7/closed/NVIDIA/compliance/TEST06/llama2-70b/Offline`
- SCENARIO: Specifies the scenario the benchmark was run. One of ["Offline", "Server", "SingleStream", "MultiStream"]
help="Specifies the path to the directory containing the logs from the compliance test run.",
required=True,
)
parser.add_argument(
"--output_dir",
"-o",
help="Specifies the path to the output directory where compliance logs will be uploaded from, i.e. inference_results_v0.7/closed/NVIDIA/compliance/T4x8/resnet/Offline.",
required=True,
)
parser.add_argument(
"--eos_token_id","-e",default=2,help="EOS token id of the tokenizer"
The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
=== "Full Dataset"
R-GAT validation run uses the IGBH dataset consisting of 547,306,935 nodes and 5,812,005,639 edges.
### Get Full Dataset
```
cm run script --tags=get,dataset,igbh,_full -j
```
=== "Debug Dataset"
R-GAT debug run uses the IGBH debug dataset(tiny).
### Get Full Dataset
```
cm run script --tags=get,dataset,igbh,_debug -j
```
## Model
The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
=== "Validation"
ResNet50 validation run uses the Imagenet 2012 validation dataset consisting of 50,000 images.
### Get Validation Dataset
```
cm run script --tags=get,dataset,imagenet,validation -j
```
=== "Calibration"
ResNet50 calibration dataset consist of 500 images selected from the Imagenet 2012 validation dataset. There are 2 alternative options for the calibration dataset.
### Get Calibration Dataset Using Option 1
```
cm run script --tags=get,dataset,imagenet,calibration,_mlperf.option1 -j
```
### Get Calibration Dataset Using Option 2
```
cm run script --tags=get,dataset,imagenet,calibration,_mlperf.option2 -j
```
## Model
The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
Get the Official MLPerf ResNet50 Model
=== "Tensorflow"
### Tensorflow
```
cm run script --tags=get,ml-model,resnet50,_tensorflow -j
```
=== "Onnx"
### Onnx
```
cm run script --tags=get,ml-model,resnet50,_onnx -j
Install CM following the [installation page](site:install).
Mobilenet models are not official MLPerf models and so cannot be used for a Closed division MLPerf inference submission. But since they can be run with Imagenet dataset, we are allowed to use them for Open division submission. Only CPU runs are supported now.
## TFLite Backend
=== "Mobilenet-V1"
### Mobilenet V1
```bash
cm run script --tags=run,mobilenet-models,_tflite,_mobilenet-v1 \
--adr.compiler.tags=gcc
```
=== "Mobilenet-V2"
### Mobilenet V2
```bash
cm run script --tags=run,mobilenet-models,_tflite,_mobilenet-v2 \
--adr.compiler.tags=gcc
```
=== "Mobilenet-V2"
### Mobilenet V2
```bash
cm run script --tags=run,mobilenet-models,_tflite,_mobilenet-v2 \
--adr.compiler.tags=gcc
```
=== "Mobilenets"
### Mobilenet V1,V2,V3
```bash
cm run script --tags=run,mobilenet-models,_tflite,_mobilenet \
--adr.compiler.tags=gcc
```
=== "Efficientnet"
### Efficientnet
```bash
cm run script --tags=run,mobilenet-models,_tflite,_efficientnet \
--adr.compiler.tags=gcc
```
## ARMNN Backend
=== "Mobilenet-V1"
### Mobilenet V1
```bash
cm run script --tags=run,mobilenet-models,_tflite,_armnn,_mobilenet-v1 \
--adr.compiler.tags=gcc
```
=== "Mobilenet-V2"
### Mobilenet V2
```bash
cm run script --tags=run,mobilenet-models,_tflite,_armnn,_mobilenet-v2 \
--adr.compiler.tags=gcc
```
=== "Mobilenet-V2"
### Mobilenet V2
```bash
cm run script --tags=run,mobilenet-models,_tflite,_armnn,_mobilenet-v2 \
--adr.compiler.tags=gcc
```
=== "Mobilenets"
### Mobilenet V1,V2,V3
```bash
cm run script --tags=run,mobilenet-models,_tflite,_armnn,_mobilenet \
--adr.compiler.tags=gcc
```
=== "Efficientnet"
### Efficientnet
```bash
cm run script --tags=run,mobilenet-models,_tflite,_armnn,_efficientnet \
The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
=== "Validation"
BERT validation run uses the SQuAD v1.1 dataset.
### Get Validation Dataset
```
cm run script --tags=get,dataset,squad,validation -j
```
## Model
The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
Get the Official MLPerf Bert-Large Model
=== "Pytorch"
### Pytorch
```
cm run script --tags=get,ml-model,bert-large,_pytorch -j
```
=== "Onnx"
### Onnx
```
cm run script --tags=get,ml-model,bert-large,_onnx -j
```
=== "Tensorflow"
### Tensorflow
```
cm run script --tags=get,ml-model,bert-large,_tensorflow -j
The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
=== "Validation"
GPT-J validation run uses the CNNDM dataset.
### Get Validation Dataset
```
cm run script --tags=get,dataset,cnndm,validation -j
```
## Model
The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
Get the Official MLPerf GPT-J Model
=== "Pytorch"
### Pytorch
```
cm run script --tags=get,ml-model,gptj,_pytorch -j
The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
=== "Validation"
LLAMA2-70b validation run uses the Open ORCA dataset.
### Get Validation Dataset
```
cm run script --tags=get,dataset,openorca,validation -j
```
## Model
The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
Get the Official MLPerf LLAMA2-70b Model
=== "Pytorch"
### Pytorch
```
cm run script --tags=get,ml-model,llama2-70b,_pytorch -j
```
!!! tip
Downloading llama2-70B model from Hugging Face will prompt you to enter the Hugging Face username and password. Please note that the password required is the [**access token**](https://huggingface.co/settings/tokens) generated for your account. Additionally, ensure that your account has access to the [llama2-70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) model.
The benchmark implementation run command will automatically download the validation and calibration datasets and do the necessary preprocessing. In case you want to download only the datasets, you can use the below commands.
=== "Validation"
### Get Validation Dataset
```
cm run script --tags=get,dataset,mlperf,inference,llama3,_validation --outdirname=<path to download> -j
```
=== "Calibration"
### Get Calibration Dataset
```
cm run script --tags=get,dataset,mlperf,inference,llama3,_calibration --outdirname=<path to download> -j
```
## Model
The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.
Get the Official MLPerf LLAMA3.1-405b Model
=== "Pytorch"
### Pytorch
```
cm run script --tags=get,ml-model,llama3 --outdirname=<path to download> --hf_token=<huggingface access token> -j
```
!!! tip
Downloading llama3.1-405B model from Hugging Face will require an [**access token**](https://huggingface.co/settings/tokens) which could be generated for your account. Additionally, ensure that your account has access to the [llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) model.
The benchmark implementation run command will automatically download the preprocessed validation and calibration datasets. In case you want to download only the datasets, you can use the below commands.
=== "Validation"
mixtral-8x7b validation run uses the combined dataset - Open ORCA, GSM8K and MBXP.
### Get Validation Dataset
```
cm run script --tags=get,dataset-mixtral,openorca-mbxp-gsm8k-combined -j
```
## Model
The benchmark implementation run command will automatically download the required model and do the necessary conversions. In case you want to only download the official model, you can use the below commands.