-
Notifications
You must be signed in to change notification settings - Fork 563
feat(benchmark): AIPerf run script #1501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…o get environment variable
Greptile OverviewGreptile SummaryThis PR adds AIPerf benchmarking support to NeMo Guardrails with a well-structured command-line tool. The implementation includes YAML-based configuration, parameter sweep capabilities, and comprehensive test coverage. Key Changes:
Issues Identified: Confidence Score: 3/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant CLI
participant AIPerfRunner
participant ConfigValidator
participant ServiceChecker
participant AIPerf
User->>CLI: nemoguardrails aiperf run --config-file config.yaml
CLI->>AIPerfRunner: Initialize with config path
AIPerfRunner->>ConfigValidator: Load and validate YAML
ConfigValidator->>ConfigValidator: Validate with Pydantic models
ConfigValidator-->>AIPerfRunner: Return AIPerfConfig
AIPerfRunner->>ServiceChecker: _check_service()
ServiceChecker->>ServiceChecker: GET /v1/models with API key
ServiceChecker-->>AIPerfRunner: Service available
alt Single Benchmark
AIPerfRunner->>AIPerfRunner: _build_command()
AIPerfRunner->>AIPerfRunner: _create_output_dir()
AIPerfRunner->>AIPerfRunner: _save_run_metadata()
AIPerfRunner->>AIPerf: subprocess.run(aiperf command)
AIPerf-->>AIPerfRunner: Benchmark results
AIPerfRunner->>AIPerfRunner: _save_subprocess_result_json()
else Batch Benchmarks with Sweeps
AIPerfRunner->>AIPerfRunner: _get_sweep_combinations()
loop For each sweep combination
AIPerfRunner->>AIPerfRunner: _build_command(sweep_params)
AIPerfRunner->>AIPerfRunner: _create_output_dir(sweep_params)
AIPerfRunner->>AIPerfRunner: _save_run_metadata()
AIPerfRunner->>AIPerf: subprocess.run(aiperf command)
AIPerf-->>AIPerfRunner: Benchmark results
AIPerfRunner->>AIPerfRunner: _save_subprocess_result_json()
end
end
AIPerfRunner-->>CLI: Return exit code
CLI-->>User: Display summary and exit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 3 comments
nemoguardrails/benchmark/aiperf/aiperf_configs/single_concurrency.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, no comments
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Documentation preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, 5 comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
| "config_file": str(self.config_path), | ||
| "sweep_params": sweep_params, | ||
| "base_config": self.config.base_config.model_dump(), | ||
| "command": " ".join(command), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: API key is written to run_metadata.json file. The command includes the API key (added at line 142), creating a persistent security vulnerability.
| "command": " ".join(command), | |
| "command": self._sanitize_command_for_logging(command), |
|
|
||
| log.info("Single Run") | ||
| log.debug("Output directory: %s", run_output_dir) | ||
| log.debug("Command: %s", " ".join(command)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: API key is exposed in debug logs. Same issue as line 159.
| log.debug("Command: %s", " ".join(command)) | |
| # log.debug("Command: %s", " ".join(command)) |
|
Note: I added the API Key towards the end of development to make testing against NVCF-functions more convenient. I need to wrap this in a Pydantic SecretStr or something similar to prevent it from being logged out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12 files reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
|
@tgasser-nv I noticed that the scope of this change is quite broad. It also introduces OpenAI-compatible endpoints on the server (at least for /chat/completions and /models) which is a major change. Given that, I think it might be better to wait until #1340 is finalized and merged. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11 files reviewed, 1 comment
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
| def _check_service(self, endpoint: Optional[str] = "/v1/models") -> None: | ||
| """Check if the service is up before we run the benchmarks""" | ||
| url = urllib.parse.urljoin(self.config.base_config.url, endpoint) | ||
| log.debug("Checking service is up using endpoint %s", url) | ||
|
|
||
| try: | ||
| response = httpx.get(url, timeout=5) | ||
| except httpx.ConnectError as e: | ||
| raise RuntimeError(f"Can't connect to {url}: {e}") | ||
|
|
||
| if response.status_code != 200: | ||
| raise RuntimeError(f"Can't access {url}: {response}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: _check_service will fail for authenticated services. Both example configs use api_key_env_var: NVIDIA_API_KEY with https://integrate.api.nvidia.com, which requires authentication. Need to add Authorization header when api_key_env_var is configured, similar to how it's done in _build_command at line 166-172.
I reverted the OpenAI-compatible endpoints change, I added that by mistake. This isn't blocked by #1340. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
11 files reviewed, no comments
| AIPerf needs a tokenizer to run and will download one from Hugging Face if available. If you have the tokenizer locally, you can point to that directory and not log into Huggingface. | ||
|
|
||
| ```bash | ||
| pip install --upgrade huggingface_hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pip install --upgrade huggingface_hub | |
| poetry run pip install --upgrade huggingface_hub |
| To run a single benchmark with fixed parameters, use the `single_concurrency.yaml` configuration: | ||
|
|
||
| ```bash | ||
| poetry run nemoguardrails aiperf run --config-file nemoguardrails/benchmark/aiperf/aiperf_configs/single_concurrency.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that optional sections 3, 4 and 5 in Prerequisites are required to run it successfully
also one needs license for https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/
|
|
||
| ## Introduction | ||
|
|
||
| [AIPerf](https://github.com/triton-inference-server/perf_analyzer/tree/main/genai-perf) is NVIDIA's latest benchmarking tool for LLMs. It supports any OpenAI-compatible inference service and generates synthetic data loads, benchmarks, and all the metrics needed for performance comparison and analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it called AIPerf or GenAI-Perf?
|
|
||
| from nemoguardrails.benchmark.aiperf.aiperf_models import AIPerfConfig | ||
|
|
||
| # Set up logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Set up logging |
|
|
||
| # Set up logging | ||
| log = logging.getLogger(__name__) | ||
| log.setLevel(logging.INFO) # Set the lowest level to capture all messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| log.setLevel(logging.INFO) # Set the lowest level to capture all messages | |
| log.setLevel(logging.INFO) |
| "%(asctime)s %(levelname)s: %(message)s", datefmt="%Y-%m-%d %H:%M:%S" | ||
| ) | ||
| console_handler = logging.StreamHandler() | ||
| console_handler.setLevel(logging.DEBUG) # DEBUG and higher will go to the console |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| console_handler.setLevel(logging.DEBUG) # DEBUG and higher will go to the console | |
| console_handler.setLevel(logging.DEBUG) |
| console_handler.setLevel(logging.DEBUG) # DEBUG and higher will go to the console | ||
| console_handler.setFormatter(formatter) | ||
|
|
||
| # Add the console handler for logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Add the console handler for logging |
| raise RuntimeError( | ||
| f"Environment variable {value} not set. Please store the API Key in {value}" | ||
| ) | ||
| cmd.extend([f"--api-key", str(api_key)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cmd.extend([f"--api-key", str(api_key)]) | |
| cmd.extend(["--api-key", str(api_key)]) |
| base_params = self.config.base_config.model_dump() | ||
|
|
||
| # Merge base config with sweep params (sweep params override base) | ||
| params = base_params if not sweep_params else {**base_params, **sweep_params} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: when sweep_params is an empty dict , this still creates a merged dict instead of using base_params directly.
params = {**base_params, **sweep_params} if sweep_params else base_paramsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. If sweep_params = {}, this is falsy so not sweep_params == True. So base_params will be assigned to params?
| for combination in itertools.product(*param_values): | ||
| combinations.append(dict(zip(param_names, combination))) | ||
|
|
||
| return combinations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is building entire list in memory. for large sweeps (e.g., 10 params × 10 values = 10B combinations), this will OOM. better to use generator or if it makes sense add validation for reasonable sweep sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a limit of 100 to avoid refactoring the rest of the code around generators
| raise RuntimeError( | ||
| f"Environment variable {value} not set. Please store the API Key in {value}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| raise RuntimeError( | |
| f"Environment variable {value} not set. Please store the API Key in {value}" | |
| ) | |
| raise RuntimeError( | |
| f"Environment variable '{value}' is not set. Please set it: export {value}='your-api-key'" | |
| ) |
| headers = {"Authorization": f"Bearer {api_key}"} if api_key else None | ||
|
|
||
| try: | ||
| response = httpx.get(url, timeout=5, headers=headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make configurable timeout to BaseConfig? or you think this 5 second is OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather leave this at 5 for now. AIPerf has a separate inference request timeout (--request-timeout-seconds) which I could add later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Tim, looks good! please see my comments above.
I also have a general question. should benchmark and alsobenchmark/aiperf be inside the nemoguardrails package or at the repository root level?
as it is general-purpose LLM benchmarking wrapper, not guardrails-specific and mostly used in development/evaluation which is not needed at runtime. I thought about this as I mentioned before: the benchmark config files are not easily available in the package, once installed by the users from PyPI + maintenance benefits
Description
AIPerf (Github, Docs) is Nvidia's latest benchmarking tool for LLMs. It supports any OpenAI-compatible inference service and generates synthetic data loads, benchmarks, and all metrics needed for comparison.
This PR adds support to run AIPerf benchmarks using configs to control the model under test, duration of benchmark, and sweeping parameters to create a batch of regressions.
Test Plan
Pre-requisites
To get started with benchmarking, you need to follow these steps. These were tested and working for Python 3.11.11.
poetry install --with dev: Installs Guardrails with developer toolingpoetry run pip install aiperf langchain-nvidia-ai-endpoints: Installs AIPerf and the langchain wrapper for Nvidia models.pip install --upgrade huggingface_hub: Install Huggingface hub. AIPerf needs a tokenizer to run, and will download one given a model name from Huggingface if available. If you have the tokenizer locally you can point to that directory.hf auth login: Log into HuggingfaceRunning a single test
Unit-tests
Chat server
Related Issue(s)
Checklist