From 75d0e27c8d31b72dfe3ae407e016756f774bac81 Mon Sep 17 00:00:00 2001
From: anandhu-eng <anandhukicks@gmail.com>
Date: Mon, 3 Mar 2025 16:50:55 +0530
Subject: [PATCH 1/4] Updated with mlc commands for
 model,dataset,accuracy,submission

---
 automotive/3d-object-detection/README.md      |  4 +
 graph/R-GAT/README.md                         |  4 +
 language/bert/README.md                       | 43 ++++++++++
 language/gpt-j/README.md                      | 32 +++++++-
 language/llama2-70b/README.md                 | 49 +++++++++++-
 language/llama3.1-405b/README.md              | 50 +++++++-----
 language/mixtral-8x7b/README.md               | 41 +++++++++-
 recommendation/dlrm_v2/pytorch/README.md      | 32 ++++++--
 text_to_image/README.md                       | 12 +++
 vision/classification_and_detection/README.md | 80 ++++++++++++++++++-
 .../medical_imaging/3d-unet-kits19/README.md  | 53 ++++++++++++
 11 files changed, 369 insertions(+), 31 deletions(-)
diff --git a/automotive/3d-object-detection/README.md b/automotive/3d-object-detection/README.md
index e1190e8132..d0430d444c 100644
--- a/automotive/3d-object-detection/README.md
+++ b/automotive/3d-object-detection/README.md
@@ -101,3 +101,7 @@ Please click [here](https://github.com/mlcommons/inference/blob/master/automotiv
 ```
 python accuracy_waymo.py --mlperf-accuracy-file <path to accuracy file>/mlperf_log_accuracy.json --waymo-dir /waymo/kitti_format/
 ```
+
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
\ No newline at end of file
diff --git a/graph/R-GAT/README.md b/graph/R-GAT/README.md
index 1380cb047d..d9f5fafa44 100644
--- a/graph/R-GAT/README.md
+++ b/graph/R-GAT/README.md
@@ -181,6 +181,10 @@ mlcr process,mlperf,accuracy,_igbh --result_dir=<Path to directory where files a
 
 Please click [here](https://github.com/mlcommons/inference/blob/dev/graph/R-GAT/tools/accuracy_igbh.py) to view the Python script for evaluating accuracy for the IGBH dataset.
 
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
+
 #### Run using docker
 
 Not implemented yet
diff --git a/language/bert/README.md b/language/bert/README.md
index e60745d45e..31940a343b 100644
--- a/language/bert/README.md
+++ b/language/bert/README.md
@@ -24,6 +24,38 @@ Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/l
 ## Disclaimer
 This benchmark app is a reference implementation that is not meant to be the fastest implementation possible.
 
+## Automated command to run the benchmark via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/bert/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+
+You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
+
+### Download model through MLCFlow Automation
+
+**Pytorch Framework**
+
+```
+mlcr get,ml-model,bert-large,_pytorch --outdirname=<path_to_download> -j
+```
+
+**Onnx Framework**
+
+```
+mlcr get,ml-model,bert-large,_onnx --outdirname=<path_to_download> -j
+```
+
+**TensorFlow Framework**
+
+```
+mlcr get,ml-model,bert-large,_tensorflow --outdirname=<path_to_download> -j
+```
+
+### Download dataset through MLCFlow Automation
+
+```
+mlcr get,dataset,squad,validation  --outdirname=<path_to_download> -j
+```
+
 ## Commands
 
 Please run the following commands:
@@ -45,6 +77,17 @@ Please run the following commands:
 - The script [tf_freeze_bert.py] freezes the TensorFlow model into pb file.
 - The script [bert_tf_to_pytorch.py] converts the TensorFlow model into the PyTorch `BertForQuestionAnswering` module in [HuggingFace Transformers](https://github.com/huggingface/transformers) and also exports the model to [ONNX](https://github.com/onnx/onnx) format.
 
+### Evaluate the accuracy through MLCFlow Automation
+```bash
+mlcr process,mlperf,accuracy,_squad --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/master/language/bert/accuracy-squad.py) to view the Python script for evaluating accuracy for the squad dataset.
+
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
+
 ## Loadgen over the Network 
 
 ```
diff --git a/language/gpt-j/README.md b/language/gpt-j/README.md
index cfcf068791..9c952b65db 100644
--- a/language/gpt-j/README.md
+++ b/language/gpt-j/README.md
@@ -1,9 +1,28 @@
 # GPT-J Reference Implementation
 
-Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/gpt-j) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+## Automated command to run the benchmark via MLCFlow
 
 Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/gpt-j/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
 
+You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
+
+### Download model through MLCFlow Automation
+
+```
+mlcr get,ml-model,gptj,_pytorch --outdirname=<path_to_download> -j
+```
+
+### Download dataset through MLCFlow Automation
+
+**Validation Dataset**
+```
+mlcr get,dataset,cnndm,_validation --outdirname=<path_to_download> -j
+```
+
+**Calibration Dataset**
+```
+mlcr get,dataset,cnndm,_calibration --outdirname=<path_to_download> -j
+```
 
 
 ### Setup Instructions
@@ -113,6 +132,13 @@ Evaluates the ROGUE scores from the accuracy logs. Only applicable when specifyi
 python evaluation.py --mlperf-accuracy-file ./build/logs/mlperf_log_accuracy.json --dataset-file ./data/cnn_eval.json
 ```
 
+### Evaluate the accuracy through MLCFlow Automation
+```bash
+mlcr process,mlperf,accuracy,_cnndm --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/master/language/gpt-j/evaluation.py) to view the Python script for evaluating accuracy for the cnndm dataset.
+
 ### Reference Model - ROUGE scores
 The following are the rouge scores obtained when evaluating the GPT-J fp32 model on the entire validation set (13368 samples) using beam search, beam_size=4
 
@@ -122,6 +148,10 @@ ROUGE 2 - 20.1235
 
 ROUGE L - 29.9881
 
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
+
 ### License:
 Apache License Version 2.0.
 
diff --git a/language/llama2-70b/README.md b/language/llama2-70b/README.md
index 506423cc2c..bbd9889564 100644
--- a/language/llama2-70b/README.md
+++ b/language/llama2-70b/README.md
@@ -7,6 +7,9 @@
         - For server scenario, it is necessary to call `lg.FirstTokenComplete(response)` for each query. This way the first token will be reported and it's latency will be measured.
         - For all scenarios, when calling `lg.QuerySamplesComplete(response)`, it is necessary that each of the elements in response is a `lg.QuerySampleResponse` that contains the number of tokens (can be create this way: `lg.QuerySampleResponse(qitem.id, bi[0], bi[1], n_tokens)`). The number of tokens reported should match with the number of tokens on your answer and this will be checked in [TEST06](../../compliance/nvidia/TEST06/)
 
+
+## Automated command to run the benchmark via MLCFlow
+
 Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/llama2-70b) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
 
 You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
@@ -65,9 +68,11 @@ CPU-only setup, as well as any GPU versions for applicable libraries like PyTorc
 ### MLCommons Members Download
 MLCommons hosts the model and preprocessed dataset for download **exclusively by MLCommons Members**. You must first agree to the [confidentiality notice](https://llama2.mlcommons.org) using your organizational email address, then you will receive a link to a directory containing Rclone download instructions. _If you cannot access the form but you are part of a MLCommons Member organization, submit the [MLCommons subscription form](https://mlcommons.org/community/subscribe/) with your organizational email address and [associate a Google account](https://accounts.google.com/SignUpWithoutGmail) with your organizational email address._
 
-Once you have the access, you can download the model automatically via the below command
+
+### Download model through MLCFlow Automation
+
 ```
-mlcr get,ml-model,llama2 --outdirname=${CHECKPOINT_PATH} -j
+mlcr get,ml-model,llama2-70b,_pytorch -j --outdirname=<Download path> -j
 ```
 
 ### External Download (Not recommended for official submission)
@@ -82,6 +87,34 @@ git clone https://huggingface.co/meta-llama/Llama-2-70b-chat-hf ${CHECKPOINT_PAT
 
 ## Get Dataset
 
+### Download Preprocessed dataset through MLCFlow Automation
+
+**Validation**
+
+```
+mlcr get,dataset,preprocessed,openorca,_validation --outdirname=<path_to_download> -j
+```
+
+**Calibration**
+
+```
+mlcr get,dataset,preprocessed,openorca,_calibration --outdirname=<path_to_download> -j
+```
+
+### Download Unprocessed dataset through MLCFlow Automation
+
+**Validation**
+
+```
+mlcr get,dataset,openorca,_validation --outdirname=<path_to_download> -j
+```
+
+**Calibration**
+
+```
+mlcr get,dataset,openorca,_calibration --outdirname=<path_to_download> -j
+```
+
 ### Preprocessed
 
 You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
@@ -244,6 +277,18 @@ scale from a 0.0-1.0 scale):
 
 This was run on a DGX-H100 node. Total runtime was ~4.5 days.
 
+### Evaluate the accuracy through MLCFlow Automation
+```bash
+mlcr process,mlperf,accuracy,_openorca --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/master/language/llama2-70b/evaluate-accuracy.py) to view the Python script for evaluating accuracy for the Waymo dataset.
+
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
+
+
 # Run llama2-70b-interactive benchmark
 
 For official, Llama2-70b submissions it is also possible to submit in the interactive category. This sets a more strict latency requirements for Time to First Token (ttft) and Time per Output Token (tpot). Specifically, the interactive category requires loadgen to enforce `ttft <= 450ms` and `ttft <= 40ms`
diff --git a/language/llama3.1-405b/README.md b/language/llama3.1-405b/README.md
index 50668263c4..65cf226d03 100644
--- a/language/llama3.1-405b/README.md
+++ b/language/llama3.1-405b/README.md
@@ -7,11 +7,12 @@
         - For server scenario, it is necessary to call `lg.FirstTokenComplete(response)` for each query. This way the first token will be reported and it's latency will be measured.
         - For all scenarios, when calling `lg.QuerySamplesComplete(response)`, it is necessary that each of the elements in response is a `lg.QuerySampleResponse` that contains the number of tokens (can be create this way: `lg.QuerySampleResponse(qitem.id, bi[0], bi[1], n_tokens)`). The number of tokens reported should match with the number of tokens on your answer and this will be checked in [TEST06](../../compliance/nvidia/TEST06/)
 
-## Automated command to run the benchmark via MLFlow
+## Automated command to run the benchmark via MLCFlow
 
 Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/llama3_1-405b/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
 
-You can also do pip install mlc-scripts and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
+You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
+
 
 ## Prepare environment
 
@@ -99,11 +100,24 @@ pip install -e ../../loadgen
 ## Get Model
 ### MLCommons Members Download (Recommended for official submission)
 
-You need to request for access to [MLcommons](http://llama3-1.mlcommons.org/) and you'll receive an email with the download instructions. You can download the model automatically via the below command
+You need to request for access to [MLcommons](http://llama3-1.mlcommons.org/) and you'll receive an email with the download instructions. 
+
+### Download model through MLCFlow Automation
+
+**From MLCOMMONS Google Drive**
+
 ```
 mlcr get,ml-model,llama3 --outdirname=${CHECKPOINT_PATH} -j
 ```
 
+**From HuggingFace**
+
+```
+mlcr get,ml-model,llama3,_hf --outdirname=${CHECKPOINT_PATH} --hf_token=<huggingface access token> -j
+```
+
+**Note:**
+Downloading llama3.1-405B model from Hugging Face will require an [**access token**](https://huggingface.co/settings/tokens) which could be generated for your account. Additionally, ensure that your account has access to the [llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) model. 
 
 ### External Download (Not recommended for official submission)
 + First go to [llama3.1-request-link](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and make a request, sign in to HuggingFace (if you don't have account, you'll need to create one). **Please note your authentication credentials** as you may be required to provide them when cloning below.
@@ -115,16 +129,22 @@ git clone https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct ${CHECKPOINT
 cd ${CHECKPOINT_PATH} && git checkout be673f326cab4cd22ccfef76109faf68e41aa5f1
 ```
 
-### Download huggingface model through MLC
+
+## Get Dataset
+
+### Download dataset through MLCFlow Automation
+
+**Validation**
 
 ```
-mlcr get,ml-model,llama3,_hf --outdirname=${CHECKPOINT_PATH} --hf_token=<huggingface access token> -j
+mlcr get,dataset,mlperf,inference,llama3,_validation --outdirname=<path to download> -j
 ```
 
-**Note:**
-Downloading llama3.1-405B model from Hugging Face will require an [**access token**](https://huggingface.co/settings/tokens) which could be generated for your account. Additionally, ensure that your account has access to the [llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) model. 
+**Calibration**
 
-## Get Dataset
+```
+mlcr get,dataset,mlperf,inference,llama3,_calibration --outdirname=<path to download> -j
+```
 
 ### Preprocessed
 
@@ -144,11 +164,6 @@ You can then navigate in the terminal to your desired download directory and run
 ```
 rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_405b/mlperf_llama3.1_405b_dataset_8313_processed_fp16_eval.pkl ./ -P
 ```
-**MLC Command**
-
-```
-mlcr get,dataset,mlperf,inference,llama3,_validation --outdirname=<path to download> -j
-```
 
 You can also download the calibration dataset from the Cloudflare R2 bucket by running the following command:
 
@@ -156,11 +171,6 @@ You can also download the calibration dataset from the Cloudflare R2 bucket by r
 rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_405b/mlperf_llama3.1_405b_calibration_dataset_512_processed_fp16_eval.pkl ./ -P
 ```
 
-**MLC Command**
-```
-mlcr get,dataset,mlperf,inference,llama3,_calibration --outdirname=<path to download> -j
-```
-
 
 ## Run Performance Benchmarks
 
@@ -267,3 +277,7 @@ Running the GPU implementation in FP16 precision resulted in the following FP16
 }
 ```
 The accuracy target is 99% for rougeL and exact_match, and 90% for tokens_per_sample
+
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
\ No newline at end of file
diff --git a/language/mixtral-8x7b/README.md b/language/mixtral-8x7b/README.md
index 74935a4dc2..2bdcceb12c 100644
--- a/language/mixtral-8x7b/README.md
+++ b/language/mixtral-8x7b/README.md
@@ -9,7 +9,11 @@
         - For all scenarios, when calling `lg.QuerySamplesComplete(response)`, it is necessary that each of the elements in response is a `lg.QuerySampleResponse` that contains the number of tokens (can be create this way: `lg.QuerySampleResponse(qitem.id, bi[0], bi[1], n_tokens)`). The number of tokens reported should match with the number of tokens on your answer and this will be checked in [TEST06](../../compliance/nvidia/TEST06/)
 
 
-Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/mixtral-8x7b) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+## Automated command to run the benchmark via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/mixtral-8x7b/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+
+You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
 
 ## Prepare environment
 
@@ -66,6 +70,12 @@ CPU-only setup, as well as any GPU versions for applicable libraries like PyTorc
 
 **Important Note:** Files and configurations of the model have changed, and might change in the future. If you are going to get the model from Hugging Face or any external source, use a version of the model that exactly matches the one in this [commit](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/commit/a60832cb6c88d5cb6e507680d0e9996fbad77050). We strongly recommend to get the model following the steps in the next section:
 
+### Download model through MLCFlow Automation
+
+```
+mlcr get,ml-model,mixtral --outdirname=<path_to_download> -j
+```
+
 ### Get Checkpoint
 
 #### Using Rclone
@@ -87,6 +97,22 @@ rclone copy mlc-inference:mlcommons-inference-wg-public/mixtral_8x7b/mixtral-8x7
 
 ## Get Dataset
 
+### Download Preprocessed dataset through MLCFlow Automation
+
+**Validation**
+
+```
+mlcr get,dataset-mixtral,openorca-mbxp-gsm8k-combined,_validation --outdirname=<path to download> -j
+```
+
+**Calibration**
+
+```
+mlcr get,dataset-mixtral,openorca-mbxp-gsm8k-combined,_calibration --outdirname=<path to download> -j
+```
+
+- Adding `_wget` tag to the run command will change the download tool from `rclone` to `wget`.
+
 ### Preprocessed
 
 #### Using Rclone
@@ -228,6 +254,15 @@ fi
 
 The ServerSUT was not tested for GPU runs.
 
+## Accuracy Evaluation
+
+### Evaluate the accuracy through MLCFlow Automation
+```bash
+mlcr process,mlperf,accuracy,_openorca-gsm8k-mbxp-combined --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/master/language/mixtral-8x7b/evaluate-accuracy.py) to view the Python script for evaluating accuracy for the Waymo dataset.
+
 ### Evaluation
 Recreating the enviroment for evaluating the quality metrics can be quite tedious. Therefore we provide a dockerfile and recommend using docker for this task.
 1. Build the evaluation container
@@ -269,3 +304,7 @@ For official submissions, 99% of each reference score is enforced. Additionally,
 ```json
 {'tokens_per_sample': 144.84}
 ```
+
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
\ No newline at end of file
diff --git a/recommendation/dlrm_v2/pytorch/README.md b/recommendation/dlrm_v2/pytorch/README.md
index 6f09e26ded..1c0c6a615e 100755
--- a/recommendation/dlrm_v2/pytorch/README.md
+++ b/recommendation/dlrm_v2/pytorch/README.md
@@ -2,8 +2,12 @@
 
 This is the reference implementation for MLCommons Inference benchmarks.
 
+## Automated command to run the benchmark via MLCFlow
+
 Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/recommendation/dlrm-v2/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
 
+You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
+
 ### Supported Models
 
 **TODO: Decide benchmark name**
@@ -71,7 +75,13 @@ CFLAGS="-std=c++14" python setup.py develop --user
 
 ### Download preprocessed Dataset
 
-Download the preprocessed dataset using Rclone.
+#### Download dataset through MLCFlow Automation
+
+```
+mlcr get,preprocessed,dataset,criteo,_validation --outdirname=<path_to_download> -j
+```
+
+#### Download the preprocessed dataset using Rclone.
 
 To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
 To install Rclone on Linux/macOS/BSD systems, run:
@@ -102,13 +112,10 @@ framework | Size in bytes (`du *`) | MD5 hash (`md5sum *`)
 N/A | pytorch | <2GB | -
  pytorch | 97.31GB | -
 
-#### MLC method
-
-The following MLCommons MLC commands can be used to programmatically download the model checkpoint. 
+#### Download model through MLCFlow Automation
 
 ```
-pip install mlc-scripts
-mlcr get,ml-model,dlrm,_pytorch,_weight_sharded,_rclone -j
+mlcr get,ml-model,get,ml-model,dlrm,_pytorch,weight_sharded,_rclone --outdirname=<path_to_download> -j
 ```
 
 #### Manual method
@@ -312,6 +319,15 @@ In the reference implementation, each sample is mapped to 100-700 user-item pair
 
 ### Running accuracy script
 
+#### Evaluate the accuracy through MLCFlow Automation
+
+```bash
+mlcr process,mlperf,accuracy,_terabyte --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/master/recommendation/dlrm_v2/pytorch/tools/accuracy-dlrm.py) to view the Python script for evaluating accuracy for the Waymo dataset.
+
+
 To get the accuracy from a LoadGen accuracy json log file,
 
 1. If your SUT outputs the predictions and the ground truth labels in a packed format like the reference implementation then run
@@ -414,6 +430,10 @@ usage: main.py [-h]
 
 `--find-peak-performance` determine the maximum QPS for the Server, while not applicable to other scenarios.
 
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
+
 ## License
 
 [Apache License 2.0](LICENSE)
diff --git a/text_to_image/README.md b/text_to_image/README.md
index b00595785b..b11873049e 100644
--- a/text_to_image/README.md
+++ b/text_to_image/README.md
@@ -164,3 +164,15 @@ Add the `--accuracy` to the command to run the benchmark
 ```bash
 python3 main.py --dataset "coco-1024" --dataset-path coco2014 --profile stable-diffusion-xl-pytorch --accuracy --model-path model/ [--dtype <fp32, fp16 or bf16>] [--device <cuda or cpu>] [--time <time>] [--scenario <SingleStream, MultiStream, Server or Offline>]
 ```
+
+### Evaluate the accuracy through MLCFlow Automation
+```bash
+mlcr process,mlperf,accuracy,_coco2014 --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/master/text_to_image/tools/accuracy_coco.py) to view the Python script for evaluating accuracy for the Waymo dataset.
+
+
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
\ No newline at end of file
diff --git a/vision/classification_and_detection/README.md b/vision/classification_and_detection/README.md
index 12320b5fde..42c50f14b0 100755
--- a/vision/classification_and_detection/README.md
+++ b/vision/classification_and_detection/README.md
@@ -2,8 +2,11 @@
 
 This is the reference implementation for MLPerf Inference Classification and Object Detection benchmarks
 
-## Automated Run Commands
-Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+## Automated command to run the benchmark via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
+
+You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
 
 ### ResNet50
 
@@ -71,6 +74,55 @@ This benchmark app is a reference implementation that is not meant to be the fas
 It is written in python which might make it less suitable for lite models like mobilenet or large number of cpu's.
 There is a [C++ implementation](https://github.com/mlcommons/cm4mlops/tree/mlperf-inference/script/app-mlperf-inference-mlcommons-cpp) available which currently supports onnxruntime backend for CPUs and Nvidia GPUs.
 
+### Download model through MLCFlow Automation
+
+
+```
+mlcr get,ml-model,pointpainting --outdirname=<path_to_download> -j
+```
+
+### Download dataset through MLCFlow Automation
+
+#### Imagenet Dataset (Preprocessed)
+
+```
+mlcr get,dataset,image-classification,imagenet,preprocessed,_pytorch --outdirname=<path_to_download> -j
+```
+
+#### Imagenet Dataset (Unprocessed)
+
+**Validation**
+
+```
+mlcr get,dataset,imagenet,validation --outdirname=<path_to_download> -j
+```
+
+**Calibration**
+
+```
+mlcr get,dataset,imagenet,calibration --outdirname=<path_to_download> -j
+```
+
+#### Openimages Dataset (Preprocessed)
+
+```
+get,dataset,object-detection,open-images,openimages,preprocessed,_validation --outdirname=<path_to_download> -j 
+```
+
+#### Openimages Dataset (Unprocessed)
+
+**Validation**
+
+```
+mlcr get,dataset,openimages,original,_validation --outdirname=<path_to_download> -j  
+```
+
+**Calibration**
+
+```
+mlcr get,dataset,openimages,original,_calibration --outdirname=<path_to_download> -j 
+```
+
 ## Tools for preparing datasets and validating accuracy
 The reference implementation includes all required pre-processing of datasets.
 It also includes a ```--accuracy``` option to validate accuracy as required by mlperf.
@@ -89,11 +141,29 @@ python upscale_coco.py --inputs /data/coco/ --outputs /data/coco-1200 --size 120
 to come.
 
 ### Validate accuracy for resnet50 and mobilenet benchmarks
-The tool is [here](tools/accuracy-imagenet.py). You can run it like:
+
+The tool is [here](tools/accuracy-imagenet.py). 
+
+#### Evaluate the accuracy through MLCFlow Automation
+```bash
+mlcr process,mlperf,accuracy,_imagenet --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+#### Manual method
 ```
 python tools/accuracy-imagenet.py --mlperf-accuracy-file mlperf_log_accuracy.json --imagenet-val-file /data/imagenet2012/val_map.txt
 ```
 
+### Validate accuracy for retinanet
+
+### Evaluate the accuracy through MLCFlow Automation
+```bash
+mlcr process,mlperf,accuracy,_openimages --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/master/vision/classification_and_detection/tools/accuracy-openimages.py) to view the Python script for evaluating accuracy for the Waymo dataset.
+
+
 ### Validate accuracy for ssd-mobilenet and ssd-resnet34 benchmarks
 The tool is [here](tools/accuracy-coco.py). You can run it like:
 ```
@@ -276,6 +346,10 @@ comma separated list of which latencies (in seconds) we try to reach in the 99 p
 maximum batchsize we generate to backend (default: 128).
 
 
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
+
 ## License
 
 [Apache License 2.0](LICENSE)
diff --git a/vision/medical_imaging/3d-unet-kits19/README.md b/vision/medical_imaging/3d-unet-kits19/README.md
index ae2b816079..aa1b73b05c 100644
--- a/vision/medical_imaging/3d-unet-kits19/README.md
+++ b/vision/medical_imaging/3d-unet-kits19/README.md
@@ -6,8 +6,11 @@ This model performing KiTS19 dataset for kidney tumor segmentation task is propo
 
 [3D-UNet BraTS19 model](https://github.com/mlcommons/inference/tree/master/vision/medical_imaging/3d-unet-brats19), which has been used for MLPerf-Inference v0.7 and v1.0, is the valid model for the submissions until the end of year 2021. Please use BraTS19 model for v.1.1 submission.
 
+## Automated command to run the benchmark via MLCFlow
+
 Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/medical_imaging/3d-unet/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.
 
+You can also do `pip install mlc-scripts` and then use `mlcr` commands for downloading the model and datasets using the commands given in the later sections.
 
 ## Prerequisites
 
@@ -29,6 +32,45 @@ If you would like to run on NVIDIA GPU, you will need:
 This benchmark app is a reference implementation that is not meant to be the fastest implementation possible.
 This benchmark app is built by refactoring lots of codes from [MLPerf-Training 3D-UNet](https://github.com/mlcommons/training/blob/master/image_segmentation/pytorch) and [MLPerf-Inference 3D-UNet BraTS](https://github.com/mlcommons/inference/tree/master/vision/medical_imaging/3d-unet)
 
+
+### Download model through MLCFlow Automation
+
+**PyTorch**
+```
+mlcr get,ml-model,3d-unet,_pytorch --outdirname=<path_to_download> -j
+```
+
+**Onnx**
+```
+mlcr get,ml-model,3d-unet,_onnx --outdirname=<path_to_download> -j
+```
+
+**TensorFlow**
+```
+mlcr get,ml-model,3d-unet,_tensorflow --outdirname=<path_to_download> -j
+```
+
+### Download dataset through MLCFlow Automation
+
+#### Unprocessed dataset
+
+**Validation**
+```
+mlcr get,dataset,kits19,_validation --outdirname=<path_to_download> -j
+```
+
+**Calibration**
+```
+mlcr get,dataset,kits19,_calibration --outdirname=<path_to_download> -j
+```
+
+#### Preprocessed dataset
+
+```
+mlcr get,dataset,kits19,preprocessed, --outdirname=<path_to_download> -j
+```
+
+
 ## Commands
 
 Please run the following commands, in the suggested order:
@@ -52,6 +94,17 @@ Other useful commands:
 - `python3 run.py --backend=[tensorflow|pytorch|pytorch_checkpoint|onnxruntime] --scenario=[Offline|SingleStream|MultiStream|Server] [--accuracy] --model=[path/to/model]`: run the harness inside the docker container. Performance or Accuracy results will be printed in console
 - `python3 accuracy_kits.py --log_file=<LOADGEN_LOG>`: compute accuracy from a LoadGen accuracy JSON log file
 
+### Evaluate the accuracy through MLCFlow Automation
+```bash
+mlcr process,mlperf,accuracy,_kits --result_dir=<Path to directory where files are generated after the benchmark run>
+```
+
+Please click [here](https://github.com/mlcommons/inference/blob/master/vision/medical_imaging/3d-unet-kits19/accuracy_kits.py) to view the Python script for evaluating accuracy for the Waymo dataset.
+
+## Automated command for submission generation via MLCFlow
+
+Please see the [new docs site](https://docs.mlcommons.org/inference/submission/) for an automated way to generate submission through MLCFlow. 
+
 ## Details
 
 - Baseline SUT that connects LoadGen and QSL is in [base_SUT.py](base_SUT.py). QSL implementation is in [kits_QSL.py](kits_QSL.py).

From 47a22f264d85ffc511905ed4762225f72c52285b Mon Sep 17 00:00:00 2001
From: ANANDHU S <71482562+anandhu-eng@users.noreply.github.com>
Date: Tue, 4 Mar 2025 05:07:40 +0530
Subject: [PATCH 2/4] Add mlc command for bert squad calibration

---
 language/bert/README.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/language/bert/README.md b/language/bert/README.md
index 31940a343b..80fc8189ea 100644
--- a/language/bert/README.md
+++ b/language/bert/README.md
@@ -52,10 +52,16 @@ mlcr get,ml-model,bert-large,_tensorflow --outdirname=<path_to_download> -j
 
 ### Download dataset through MLCFlow Automation
 
+**Validation**
 ```
 mlcr get,dataset,squad,validation  --outdirname=<path_to_download> -j
 ```
 
+**Calibration**
+```
+mlcr get,dataset,squad,_calib1 --outdirname=<path_to_download> -j
+```
+
 ## Commands
 
 Please run the following commands:

From 2ded7f5d164f24aa15dfd7c80d241bd7347a02a0 Mon Sep 17 00:00:00 2001
From: ANANDHU S <71482562+anandhu-eng@users.noreply.github.com>
Date: Sun, 16 Mar 2025 17:34:01 +0530
Subject: [PATCH 3/4] fix typo

---
 vision/medical_imaging/3d-unet-kits19/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vision/medical_imaging/3d-unet-kits19/README.md b/vision/medical_imaging/3d-unet-kits19/README.md
index aa1b73b05c..dfb808d800 100644
--- a/vision/medical_imaging/3d-unet-kits19/README.md
+++ b/vision/medical_imaging/3d-unet-kits19/README.md
@@ -67,7 +67,7 @@ mlcr get,dataset,kits19,_calibration --outdirname=<path_to_download> -j
 #### Preprocessed dataset
 
 ```
-mlcr get,dataset,kits19,preprocessed, --outdirname=<path_to_download> -j
+mlcr get,dataset,kits19,preprocessed --outdirname=<path_to_download> -j
 ```
 
 

From d023ce01ae02f942eac553c45c509c1843f96b4b Mon Sep 17 00:00:00 2001
From: ANANDHU S <71482562+anandhu-eng@users.noreply.github.com>
Date: Sun, 16 Mar 2025 17:58:13 +0530
Subject: [PATCH 4/4] fix typo

---
 recommendation/dlrm_v2/pytorch/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/recommendation/dlrm_v2/pytorch/README.md b/recommendation/dlrm_v2/pytorch/README.md
index 1c0c6a615e..9a0c523334 100755
--- a/recommendation/dlrm_v2/pytorch/README.md
+++ b/recommendation/dlrm_v2/pytorch/README.md
@@ -115,7 +115,7 @@ N/A | pytorch | <2GB | -
 #### Download model through MLCFlow Automation
 
 ```
-mlcr get,ml-model,get,ml-model,dlrm,_pytorch,weight_sharded,_rclone --outdirname=<path_to_download> -j
+mlcr get,ml-model,get,ml-model,dlrm,_pytorch,_weight_sharded,_rclone --outdirname=<path_to_download> -j
 ```
 
 #### Manual method