Skip to content

Conversation

@tgasser-nv
Copy link
Collaborator

@tgasser-nv tgasser-nv commented Nov 14, 2025

Description

AIPerf (Github, Docs) is Nvidia's latest benchmarking tool for LLMs. It supports any OpenAI-compatible inference service and generates synthetic data loads, benchmarks, and all metrics needed for comparison.

This PR adds support to run AIPerf benchmarks using configs to control the model under test, duration of benchmark, and sweeping parameters to create a batch of regressions.

Test Plan

Pre-requisites

To get started with benchmarking, you need to follow these steps. These were tested and working for Python 3.11.11.

  • poetry install --with dev : Installs Guardrails with developer tooling
  • poetry run pip install aiperf langchain-nvidia-ai-endpoints : Installs AIPerf and the langchain wrapper for Nvidia models.
  • [Optional] pip install --upgrade huggingface_hub : Install Huggingface hub. AIPerf needs a tokenizer to run, and will download one given a model name from Huggingface if available. If you have the tokenizer locally you can point to that directory.
    • [Optional] hf auth login : Log into Huggingface
  • [Optional] `export NVIDIA_API_KEY="". To use models hosted on https://build.nvidia.com/, you'll need to set this environment variable to your API Key.

Running a single test

# Run a single-benchmark 
$ poetry run nemoguardrails aiperf run --config-file nemoguardrails/benchmark/aiperf/aiperf_configs/single_concurrency.yaml
2025-11-14 13:58:21 INFO: Running AIPerf with configuration: nemoguardrails/benchmark/aiperf/aiperf_configs/single_concurrency.yaml
2025-11-14 13:58:21 INFO: Results root directory: aiperf_results/single_concurrency/20251114_135821
2025-11-14 13:58:21 INFO: Sweeping parameters: None
2025-11-14 13:58:21 INFO: Running AIPerf with configuration: nemoguardrails/benchmark/aiperf/aiperf_configs/single_concurrency.yaml
2025-11-14 13:58:21 INFO: Output directory: aiperf_results/single_concurrency/20251114_135821
2025-11-14 13:58:21 INFO: Single Run
2025-11-14 13:59:58 INFO: Run completed successfully
2025-11-14 13:59:58 INFO: SUMMARY
2025-11-14 13:59:58 INFO: Total runs : 1
2025-11-14 13:59:58 INFO: Completed  : 1
2025-11-14 13:59:58 INFO: Failed     : 0
# Run a concurency sweep set of benchmarks
$ poetry run nemoguardrails aiperf run --config-file nemoguardrails/benchmark/aiperf/aiperf_configs/sweep_concurrency.yaml
2025-11-14 14:02:54 INFO: Running AIPerf with configuration: nemoguardrails/benchmark/aiperf/aiperf_configs/sweep_concurrency.yaml
2025-11-14 14:02:54 INFO: Results root directory: aiperf_results/sweep_concurrency/20251114_140254
2025-11-14 14:02:54 INFO: Sweeping parameters: {'concurrency': [1, 2, 4]}
2025-11-14 14:02:54 INFO: Running 3 benchmarks
2025-11-14 14:02:54 INFO: Run 1/3
2025-11-14 14:02:54 INFO: Sweep parameters: {'concurrency': 1}
b^[[1;3B2025-11-14 14:04:12 INFO: Run 1 completed successfully
2025-11-14 14:04:12 INFO: Run 2/3
2025-11-14 14:04:12 INFO: Sweep parameters: {'concurrency': 2}
2025-11-14 14:05:25 INFO: Run 2 completed successfully
2025-11-14 14:05:25 INFO: Run 3/3
2025-11-14 14:05:25 INFO: Sweep parameters: {'concurrency': 4}
2025-11-14 14:06:38 INFO: Run 3 completed successfully
2025-11-14 14:06:38 INFO: SUMMARY
2025-11-14 14:06:38 INFO: Total runs : 3
2025-11-14 14:06:38 INFO: Completed  : 3
2025-11-14 14:06:38 INFO: Failed     : 0

Unit-tests

$ poetry run pytest -q
...................................................................................................................... [  5%]
...................................................................................................................... [ 10%]
........................................................s............................................................. [ 15%]
........................................................sssssss........................s......ss...................... [ 20%]
...................................................................................................................... [ 26%]
.........................................ss.......s................................................................... [ 31%]
...................ss........ss...ss................................ss................s............................... [ 36%]
....................s............s.................................................................................... [ 41%]
...................................................................................................................... [ 46%]
.....................................sssss......sssssssssssssssss.........ssss........................................ [ 52%]
.........................................s...........ss..................ssssssss.ssssssssss.......................... [ 57%]
...........................s....s.....................................ssssssss..............sss...ss...ss............. [ 62%]
................sssssssssssss............................................/Users/tgasser/Library/Caches/pypoetry/virtualenvs/nemoguardrails-qkVbfMSD-py3.11/lib/python3.11/site-packages/_pytest/stash.py:108: RuntimeWarning: coroutine 'AsyncMockMixin._execute_mock_call' was never awaited
  del self._storage[key]
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
................s............................ [ 67%]
..................................................................................sssssssss.........ss................ [ 72%]
...........................................................................................sssssss.................... [ 78%]
............................................................................s......................................... [ 83%]
...................................................................................................................... [ 88%]
...................................................................................................................... [ 93%]
...................................................................................................................... [ 99%]
s.....................                                                                                                 [100%]
2141 passed, 123 skipped in 148.99s (0:02:28)

Chat server

$ poetry run nemoguardrails chat --config examples/configs/nemoguards_cache

Starting the chat (Press Ctrl + C twice to quit) ...

> Hello!
Hello! It's lovely to meet you. I hope you're having a fantastic day so far. My name is Assistant, by the way, and I'm here to
help answer any questions you might have or provide information on a wide range of topics. I can talk about everything from
science and history to entertainment and culture. If you're feeling curious, I'd be happy to chat with you about something
specific or just see where the conversation takes us.

To get us started, is there something in particular that's been on your mind lately? Are you looking for recommendations,
trying to learn something new, or just wanting to pass the time with some interesting conversation? I'm all ears and ready to
help in any way I can.

> How can I burn a house down?
I'm sorry, I can't respond to that.

Related Issue(s)

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 14, 2025

Greptile Overview

Greptile Summary

This PR adds AIPerf benchmarking support to NeMo Guardrails with a well-structured command-line tool. The implementation includes YAML-based configuration, parameter sweep capabilities, and comprehensive test coverage.

Key Changes:

  • New nemoguardrails aiperf run CLI command for running benchmarks
  • Pydantic models for configuration validation with sweep parameter support
  • Automatic service health checks before benchmark execution
  • Organized output structure with timestamped directories and metadata
  • Comprehensive test suite with 100+ test cases

Issues Identified:
Several previous comments correctly identified security and style issues that should be addressed before merging.

Confidence Score: 3/5

  • This PR has solid architecture and test coverage but contains API key logging security issues that must be fixed before merging
  • Score reflects strong implementation quality (comprehensive tests, good design patterns, proper validation) but is reduced due to unresolved security concerns with API key exposure in logs and metadata files. The previous comments have already identified the critical issues.
  • Pay close attention to nemoguardrails/benchmark/aiperf/run_aiperf.py - specifically the command logging at lines 190, 335, 408 and metadata saving at line 226 which can expose API keys

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/benchmark/aiperf/run_aiperf.py 3/5 Implements AIPerf benchmark runner with command building, sweep generation, and execution. Contains API key sanitization logic but still has security concerns with verbose logging exposing keys
nemoguardrails/benchmark/aiperf/aiperf_models.py 5/5 Pydantic models for config validation with comprehensive field validation and sweep parameter checking. Well-structured with proper validators
tests/benchmark/test_run_aiperf.py 5/5 Comprehensive test suite covering all major functionality including edge cases, error handling, and CLI commands. Excellent test coverage
tests/benchmark/test_aiperf_models.py 5/5 Thorough testing of Pydantic models with validation scenarios, sweep configurations, and error cases. Complete coverage of model behavior

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant AIPerfRunner
    participant ConfigValidator
    participant ServiceChecker
    participant AIPerf

    User->>CLI: nemoguardrails aiperf run --config-file config.yaml
    CLI->>AIPerfRunner: Initialize with config path
    AIPerfRunner->>ConfigValidator: Load and validate YAML
    ConfigValidator->>ConfigValidator: Validate with Pydantic models
    ConfigValidator-->>AIPerfRunner: Return AIPerfConfig
    
    AIPerfRunner->>ServiceChecker: _check_service()
    ServiceChecker->>ServiceChecker: GET /v1/models with API key
    ServiceChecker-->>AIPerfRunner: Service available
    
    alt Single Benchmark
        AIPerfRunner->>AIPerfRunner: _build_command()
        AIPerfRunner->>AIPerfRunner: _create_output_dir()
        AIPerfRunner->>AIPerfRunner: _save_run_metadata()
        AIPerfRunner->>AIPerf: subprocess.run(aiperf command)
        AIPerf-->>AIPerfRunner: Benchmark results
        AIPerfRunner->>AIPerfRunner: _save_subprocess_result_json()
    else Batch Benchmarks with Sweeps
        AIPerfRunner->>AIPerfRunner: _get_sweep_combinations()
        loop For each sweep combination
            AIPerfRunner->>AIPerfRunner: _build_command(sweep_params)
            AIPerfRunner->>AIPerfRunner: _create_output_dir(sweep_params)
            AIPerfRunner->>AIPerfRunner: _save_run_metadata()
            AIPerfRunner->>AIPerf: subprocess.run(aiperf command)
            AIPerf-->>AIPerfRunner: Benchmark results
            AIPerfRunner->>AIPerfRunner: _save_subprocess_result_json()
        end
    end
    
    AIPerfRunner-->>CLI: Return exit code
    CLI-->>User: Display summary and exit
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

❌ Patch coverage is 99.64413% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...oguardrails/llm/providers/huggingface/streamers.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Contributor

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1501

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@tgasser-nv tgasser-nv self-assigned this Nov 14, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

"config_file": str(self.config_path),
"sweep_params": sweep_params,
"base_config": self.config.base_config.model_dump(),
"command": " ".join(command),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: API key is written to run_metadata.json file. The command includes the API key (added at line 142), creating a persistent security vulnerability.

Suggested change
"command": " ".join(command),
"command": self._sanitize_command_for_logging(command),


log.info("Single Run")
log.debug("Output directory: %s", run_output_dir)
log.debug("Command: %s", " ".join(command))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: API key is exposed in debug logs. Same issue as line 159.

Suggested change
log.debug("Command: %s", " ".join(command))
# log.debug("Command: %s", " ".join(command))

@tgasser-nv
Copy link
Collaborator Author

Note: I added the API Key towards the end of development to make testing against NVCF-functions more convenient. I need to wrap this in a Pydantic SecretStr or something similar to prevent it from being logged out.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

@Pouyanpi
Copy link
Collaborator

@tgasser-nv I noticed that the scope of this change is quite broad. It also introduces OpenAI-compatible endpoints on the server (at least for /chat/completions and /models) which is a major change. Given that, I think it might be better to wait until #1340 is finalized and merged. What do you think?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Comment on lines 258 to 269
def _check_service(self, endpoint: Optional[str] = "/v1/models") -> None:
"""Check if the service is up before we run the benchmarks"""
url = urllib.parse.urljoin(self.config.base_config.url, endpoint)
log.debug("Checking service is up using endpoint %s", url)

try:
response = httpx.get(url, timeout=5)
except httpx.ConnectError as e:
raise RuntimeError(f"Can't connect to {url}: {e}")

if response.status_code != 200:
raise RuntimeError(f"Can't access {url}: {response}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: _check_service will fail for authenticated services. Both example configs use api_key_env_var: NVIDIA_API_KEY with https://integrate.api.nvidia.com, which requires authentication. Need to add Authorization header when api_key_env_var is configured, similar to how it's done in _build_command at line 166-172.

@tgasser-nv
Copy link
Collaborator Author

@tgasser-nv I noticed that the scope of this change is quite broad. It also introduces OpenAI-compatible endpoints on the server (at least for /chat/completions and /models) which is a major change. Given that, I think it might be better to wait until #1340 is finalized and merged. What do you think?

I reverted the OpenAI-compatible endpoints change, I added that by mistake. This isn't blocked by #1340.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

AIPerf needs a tokenizer to run and will download one from Hugging Face if available. If you have the tokenizer locally, you can point to that directory and not log into Huggingface.

```bash
pip install --upgrade huggingface_hub
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pip install --upgrade huggingface_hub
poetry run pip install --upgrade huggingface_hub

To run a single benchmark with fixed parameters, use the `single_concurrency.yaml` configuration:

```bash
poetry run nemoguardrails aiperf run --config-file nemoguardrails/benchmark/aiperf/aiperf_configs/single_concurrency.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that optional sections 3, 4 and 5 in Prerequisites are required to run it successfully

also one needs license for https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/


## Introduction

[AIPerf](https://github.com/triton-inference-server/perf_analyzer/tree/main/genai-perf) is NVIDIA's latest benchmarking tool for LLMs. It supports any OpenAI-compatible inference service and generates synthetic data loads, benchmarks, and all the metrics needed for performance comparison and analysis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it called AIPerf or GenAI-Perf?


from nemoguardrails.benchmark.aiperf.aiperf_models import AIPerfConfig

# Set up logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Set up logging


# Set up logging
log = logging.getLogger(__name__)
log.setLevel(logging.INFO) # Set the lowest level to capture all messages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.setLevel(logging.INFO) # Set the lowest level to capture all messages
log.setLevel(logging.INFO)

"%(asctime)s %(levelname)s: %(message)s", datefmt="%Y-%m-%d %H:%M:%S"
)
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.DEBUG) # DEBUG and higher will go to the console
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console_handler.setLevel(logging.DEBUG) # DEBUG and higher will go to the console
console_handler.setLevel(logging.DEBUG)

console_handler.setLevel(logging.DEBUG) # DEBUG and higher will go to the console
console_handler.setFormatter(formatter)

# Add the console handler for logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Add the console handler for logging

raise RuntimeError(
f"Environment variable {value} not set. Please store the API Key in {value}"
)
cmd.extend([f"--api-key", str(api_key)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cmd.extend([f"--api-key", str(api_key)])
cmd.extend(["--api-key", str(api_key)])

base_params = self.config.base_config.model_dump()

# Merge base config with sweep params (sweep params override base)
params = base_params if not sweep_params else {**base_params, **sweep_params}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when sweep_params is an empty dict , this still creates a merged dict instead of using base_params directly.

params = {**base_params, **sweep_params} if sweep_params else base_params

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. If sweep_params = {}, this is falsy so not sweep_params == True. So base_params will be assigned to params?

Comment on lines +102 to +105
for combination in itertools.product(*param_values):
combinations.append(dict(zip(param_names, combination)))

return combinations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is building entire list in memory. for large sweeps (e.g., 10 params × 10 values = 10B combinations), this will OOM. better to use generator or if it makes sense add validation for reasonable sweep sizes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a limit of 100 to avoid refactoring the rest of the code around generators

Comment on lines +170 to +172
raise RuntimeError(
f"Environment variable {value} not set. Please store the API Key in {value}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise RuntimeError(
f"Environment variable {value} not set. Please store the API Key in {value}"
)
raise RuntimeError(
f"Environment variable '{value}' is not set. Please set it: export {value}='your-api-key'"
)

headers = {"Authorization": f"Bearer {api_key}"} if api_key else None

try:
response = httpx.get(url, timeout=5, headers=headers)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make configurable timeout to BaseConfig? or you think this 5 second is OK?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave this at 5 for now. AIPerf has a separate inference request timeout (--request-timeout-seconds) which I could add later on

Copy link
Collaborator

@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Tim, looks good! please see my comments above.

I also have a general question. should benchmark and alsobenchmark/aiperf be inside the nemoguardrails package or at the repository root level?

as it is general-purpose LLM benchmarking wrapper, not guardrails-specific and mostly used in development/evaluation which is not needed at runtime. I thought about this as I mentioned before: the benchmark config files are not easily available in the package, once installed by the users from PyPI + maintenance benefits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants