Skip to content

Updated readme with mlc commands for model,dataset,accuracy and submission generation #2143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions automotive/3d-object-detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,7 @@ Please click [here](https://github.com/mlcommons/inference/blob/master/automotiv
```
python accuracy_waymo.py --mlperf-accuracy-file <path to accuracy file>/mlperf_log_accuracy.json --waymo-dir /waymo/kitti_format/
```

## Automated command for submission generation via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow.
4 changes: 4 additions & 0 deletions graph/R-GAT/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,10 @@ mlcr process,mlperf,accuracy,_igbh --result_dir=<Path to directory where files a

Please click [here](https://github.com/mlcommons/inference/blob/dev/graph/R-GAT/tools/accuracy_igbh.py) to view the Python script for evaluating accuracy for the IGBH dataset.

## Automated command for submission generation via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow.

#### Run using docker

Not implemented yet
Expand Down
49 changes: 49 additions & 0 deletions language/bert/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,44 @@ Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/l
## Disclaimer
This benchmark app is a reference implementation that is not meant to be the fastest implementation possible.

## Automated command to run the benchmark via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/bert/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.

You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.

### Download model through MLCFlow Automation

**Pytorch Framework**

```
mlcr get,ml-model,bert-large,_pytorch --outdirname=<path_to_download> -j
```

**Onnx Framework**

```
mlcr get,ml-model,bert-large,_onnx --outdirname=<path_to_download> -j
```

**TensorFlow Framework**

```
mlcr get,ml-model,bert-large,_tensorflow --outdirname=<path_to_download> -j
```

### Download dataset through MLCFlow Automation

**Validation**
```
mlcr get,dataset,squad,validation --outdirname=<path_to_download> -j
```

**Calibration**
```
mlcr get,dataset,squad,_calib1 --outdirname=<path_to_download> -j
```

## Commands

Please run the following commands:
Expand All @@ -45,6 +83,17 @@ Please run the following commands:
- The script [tf_freeze_bert.py] freezes the TensorFlow model into pb file.
- The script [bert_tf_to_pytorch.py] converts the TensorFlow model into the PyTorch `BertForQuestionAnswering` module in [HuggingFace Transformers](https://github.com/huggingface/transformers) and also exports the model to [ONNX](https://github.com/onnx/onnx) format.

### Evaluate the accuracy through MLCFlow Automation
```bash
mlcr process,mlperf,accuracy,_squad --result_dir=<Path to directory where files are generated after the benchmark run>
```

Please click [here](https://github.com/mlcommons/inference/blob/master/language/bert/accuracy-squad.py) to view the Python script for evaluating accuracy for the squad dataset.

## Automated command for submission generation via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow.

## Loadgen over the Network

```
Expand Down
32 changes: 31 additions & 1 deletion language/gpt-j/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,28 @@
# GPT-J Reference Implementation

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/gpt-j) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
## Automated command to run the benchmark via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/gpt-j/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.

You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.

### Download model through MLCFlow Automation

```
mlcr get,ml-model,gptj,_pytorch --outdirname=<path_to_download> -j
```

### Download dataset through MLCFlow Automation

**Validation Dataset**
```
mlcr get,dataset,cnndm,_validation --outdirname=<path_to_download> -j
```

**Calibration Dataset**
```
mlcr get,dataset,cnndm,_calibration --outdirname=<path_to_download> -j
```


### Setup Instructions
Expand Down Expand Up @@ -113,6 +132,13 @@ Evaluates the ROGUE scores from the accuracy logs. Only applicable when specifyi
python evaluation.py --mlperf-accuracy-file ./build/logs/mlperf_log_accuracy.json --dataset-file ./data/cnn_eval.json
```

### Evaluate the accuracy through MLCFlow Automation
```bash
mlcr process,mlperf,accuracy,_cnndm --result_dir=<Path to directory where files are generated after the benchmark run>
```

Please click [here](https://github.com/mlcommons/inference/blob/master/language/gpt-j/evaluation.py) to view the Python script for evaluating accuracy for the cnndm dataset.

### Reference Model - ROUGE scores
The following are the rouge scores obtained when evaluating the GPT-J fp32 model on the entire validation set (13368 samples) using beam search, beam_size=4

Expand All @@ -122,6 +148,10 @@ ROUGE 2 - 20.1235

ROUGE L - 29.9881

## Automated command for submission generation via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow.

### License:
Apache License Version 2.0.

Expand Down
49 changes: 47 additions & 2 deletions language/llama2-70b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
- For server scenario, it is necessary to call `lg.FirstTokenComplete(response)` for each query. This way the first token will be reported and it's latency will be measured.
- For all scenarios, when calling `lg.QuerySamplesComplete(response)`, it is necessary that each of the elements in response is a `lg.QuerySampleResponse` that contains the number of tokens (can be create this way: `lg.QuerySampleResponse(qitem.id, bi[0], bi[1], n_tokens)`). The number of tokens reported should match with the number of tokens on your answer and this will be checked in [TEST06](../../compliance/nvidia/TEST06/)


## Automated command to run the benchmark via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/llama2-70b) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.

You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
Expand Down Expand Up @@ -65,9 +68,11 @@ CPU-only setup, as well as any GPU versions for applicable libraries like PyTorc
### MLCommons Members Download
MLCommons hosts the model and preprocessed dataset for download **exclusively by MLCommons Members**. You must first agree to the [confidentiality notice](https://llama2.mlcommons.org) using your organizational email address, then you will receive a link to a directory containing Rclone download instructions. _If you cannot access the form but you are part of a MLCommons Member organization, submit the [MLCommons subscription form](https://mlcommons.org/community/subscribe/) with your organizational email address and [associate a Google account](https://accounts.google.com/SignUpWithoutGmail) with your organizational email address._

Once you have the access, you can download the model automatically via the below command

### Download model through MLCFlow Automation

```
mlcr get,ml-model,llama2 --outdirname=${CHECKPOINT_PATH} -j
mlcr get,ml-model,llama2-70b,_pytorch -j --outdirname=<Download path> -j
```

### External Download (Not recommended for official submission)
Expand All @@ -82,6 +87,34 @@ git clone https://huggingface.co/meta-llama/Llama-2-70b-chat-hf ${CHECKPOINT_PAT

## Get Dataset

### Download Preprocessed dataset through MLCFlow Automation

**Validation**

```
mlcr get,dataset,preprocessed,openorca,_validation --outdirname=<path_to_download> -j
```

**Calibration**

```
mlcr get,dataset,preprocessed,openorca,_calibration --outdirname=<path_to_download> -j
```

### Download Unprocessed dataset through MLCFlow Automation

**Validation**

```
mlcr get,dataset,openorca,_validation --outdirname=<path_to_download> -j
```

**Calibration**

```
mlcr get,dataset,openorca,_calibration --outdirname=<path_to_download> -j
```

### Preprocessed

You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
Expand Down Expand Up @@ -244,6 +277,18 @@ scale from a 0.0-1.0 scale):

This was run on a DGX-H100 node. Total runtime was ~4.5 days.

### Evaluate the accuracy through MLCFlow Automation
```bash
mlcr process,mlperf,accuracy,_openorca --result_dir=<Path to directory where files are generated after the benchmark run>
```

Please click [here](https://github.com/mlcommons/inference/blob/master/language/llama2-70b/evaluate-accuracy.py) to view the Python script for evaluating accuracy for the Waymo dataset.

## Automated command for submission generation via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow.


# Run llama2-70b-interactive benchmark

For official, Llama2-70b submissions it is also possible to submit in the interactive category. This sets a more strict latency requirements for Time to First Token (ttft) and Time per Output Token (tpot). Specifically, the interactive category requires loadgen to enforce `ttft <= 450ms` and `ttft <= 40ms`
Expand Down
50 changes: 32 additions & 18 deletions language/llama3.1-405b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@
- For server scenario, it is necessary to call `lg.FirstTokenComplete(response)` for each query. This way the first token will be reported and it's latency will be measured.
- For all scenarios, when calling `lg.QuerySamplesComplete(response)`, it is necessary that each of the elements in response is a `lg.QuerySampleResponse` that contains the number of tokens (can be create this way: `lg.QuerySampleResponse(qitem.id, bi[0], bi[1], n_tokens)`). The number of tokens reported should match with the number of tokens on your answer and this will be checked in [TEST06](../../compliance/nvidia/TEST06/)

## Automated command to run the benchmark via MLFlow
## Automated command to run the benchmark via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/llama3_1-405b/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.

You can also do pip install mlc-scripts and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.


## Prepare environment

Expand Down Expand Up @@ -99,11 +100,24 @@ pip install -e ../../loadgen
## Get Model
### MLCommons Members Download (Recommended for official submission)

You need to request for access to [MLcommons](http://llama3-1.mlcommons.org/) and you'll receive an email with the download instructions. You can download the model automatically via the below command
You need to request for access to [MLcommons](http://llama3-1.mlcommons.org/) and you'll receive an email with the download instructions.

### Download model through MLCFlow Automation

**From MLCOMMONS Google Drive**

```
mlcr get,ml-model,llama3 --outdirname=${CHECKPOINT_PATH} -j
```

**From HuggingFace**

```
mlcr get,ml-model,llama3,_hf --outdirname=${CHECKPOINT_PATH} --hf_token=<huggingface access token> -j
```

**Note:**
Downloading llama3.1-405B model from Hugging Face will require an [**access token**](https://huggingface.co/settings/tokens) which could be generated for your account. Additionally, ensure that your account has access to the [llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) model.

### External Download (Not recommended for official submission)
+ First go to [llama3.1-request-link](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and make a request, sign in to HuggingFace (if you don't have account, you'll need to create one). **Please note your authentication credentials** as you may be required to provide them when cloning below.
Expand All @@ -115,16 +129,22 @@ git clone https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct ${CHECKPOINT
cd ${CHECKPOINT_PATH} && git checkout be673f326cab4cd22ccfef76109faf68e41aa5f1
```

### Download huggingface model through MLC

## Get Dataset

### Download dataset through MLCFlow Automation

**Validation**

```
mlcr get,ml-model,llama3,_hf --outdirname=${CHECKPOINT_PATH} --hf_token=<huggingface access token> -j
mlcr get,dataset,mlperf,inference,llama3,_validation --outdirname=<path to download> -j
```

**Note:**
Downloading llama3.1-405B model from Hugging Face will require an [**access token**](https://huggingface.co/settings/tokens) which could be generated for your account. Additionally, ensure that your account has access to the [llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) model.
**Calibration**

## Get Dataset
```
mlcr get,dataset,mlperf,inference,llama3,_calibration --outdirname=<path to download> -j
```

### Preprocessed

Expand All @@ -144,23 +164,13 @@ You can then navigate in the terminal to your desired download directory and run
```
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_405b/mlperf_llama3.1_405b_dataset_8313_processed_fp16_eval.pkl ./ -P
```
**MLC Command**

```
mlcr get,dataset,mlperf,inference,llama3,_validation --outdirname=<path to download> -j
```

You can also download the calibration dataset from the Cloudflare R2 bucket by running the following command:

```
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_405b/mlperf_llama3.1_405b_calibration_dataset_512_processed_fp16_eval.pkl ./ -P
```

**MLC Command**
```
mlcr get,dataset,mlperf,inference,llama3,_calibration --outdirname=<path to download> -j
```


## Run Performance Benchmarks

Expand Down Expand Up @@ -267,3 +277,7 @@ Running the GPU implementation in FP16 precision resulted in the following FP16
}
```
The accuracy target is 99% for rougeL and exact_match, and 90% for tokens_per_sample

## Automated command for submission generation via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow.
41 changes: 40 additions & 1 deletion language/mixtral-8x7b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@
- For all scenarios, when calling `lg.QuerySamplesComplete(response)`, it is necessary that each of the elements in response is a `lg.QuerySampleResponse` that contains the number of tokens (can be create this way: `lg.QuerySampleResponse(qitem.id, bi[0], bi[1], n_tokens)`). The number of tokens reported should match with the number of tokens on your answer and this will be checked in [TEST06](../../compliance/nvidia/TEST06/)


Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/mixtral-8x7b) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
## Automated command to run the benchmark via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/mixtral-8x7b/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.

You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.

## Prepare environment

Expand Down Expand Up @@ -66,6 +70,12 @@ CPU-only setup, as well as any GPU versions for applicable libraries like PyTorc

**Important Note:** Files and configurations of the model have changed, and might change in the future. If you are going to get the model from Hugging Face or any external source, use a version of the model that exactly matches the one in this [commit](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/commit/a60832cb6c88d5cb6e507680d0e9996fbad77050). We strongly recommend to get the model following the steps in the next section:

### Download model through MLCFlow Automation

```
mlcr get,ml-model,mixtral --outdirname=<path_to_download> -j
```

### Get Checkpoint

#### Using Rclone
Expand All @@ -87,6 +97,22 @@ rclone copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7

## Get Dataset

### Download Preprocessed dataset through MLCFlow Automation

**Validation**

```
mlcr get,dataset-mixtral,openorca-mbxp-gsm8k-combined,_validation --outdirname=<path to download> -j
```

**Calibration**

```
mlcr get,dataset-mixtral,openorca-mbxp-gsm8k-combined,_calibration --outdirname=<path to download> -j
```

- Adding `_wget` tag to the run command will change the download tool from `rclone` to `wget`.

### Preprocessed

#### Using Rclone
Expand Down Expand Up @@ -228,6 +254,15 @@ fi

The ServerSUT was not tested for GPU runs.

## Accuracy Evaluation

### Evaluate the accuracy through MLCFlow Automation
```bash
mlcr process,mlperf,accuracy,_openorca-gsm8k-mbxp-combined --result_dir=<Path to directory where files are generated after the benchmark run>
```

Please click [here](https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/evaluate-accuracy.py) to view the Python script for evaluating accuracy for the Waymo dataset.

### Evaluation
Recreating the enviroment for evaluating the quality metrics can be quite tedious. Therefore we provide a dockerfile and recommend using docker for this task.
1. Build the evaluation container
Expand Down Expand Up @@ -269,3 +304,7 @@ For official submissions, 99% of each reference score is enforced. Additionally,
```json
{'tokens_per_sample': 144.84}
```

## Automated command for submission generation via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow.
Loading