Chimbuko

Introduction

The Chimbuko framework captures, analyzes and visualizes performance metrics for complex scientific workflows and relates these metrics to the context of their execution (provenance) on extreme-scale machines. The purpose of Chimbuko is to enable empirical studies of performance analysis for a software or a workflow during a development phase or in different computational environments.

Chimbuko enables the comparison of different runs at high and low levels of metric granularity by capturing and displaying aggregate statistics such as function profiles and counter averages, as well as maintaining detailed trace information. Because trace data can quickly escalate in volume for applications running on multi-node machines, the core of Chimbuko is an in-situ data reduction component that captures trace data from a running application instance (e.g. MPI rank) and applies machine learning to filter out anomalous function executions. By focusing primarily on performance anomalies, a significant reduction in data volume is achieved while maintaining detailed information regarding those events that impact the application performance.

Alongside providing a framework to allow for offline analysis of the data collected over the run, Chimbuko also provides an online visualization tool with which aggregated statistics and individual anomalous executions can be monitored in real-time.

The following figure shows the basic layout of the Chimbuko framework.

The ADIOS framework orchestrates workflow and provides data streaming.
The TAU tool provides performance metrics for instrumented components 1 and 2. The tool extracts provenance metadata and trace data.
Trace data is dynamically analyzed to detect anomalies by the Online AD modules, and aggregate statistics are maintained on the parameter server.
Detailed provenance information regarding the detected anomalies is stored in the provenance database, an UnQLite JSON document-store remote database provided by the Mochi Sonata framework.
The visualization module allows for interaction with Chimbuko in real-time.

For more information about the design and working philosophy of Chimbuko, please see the documents directory.

Documentation

Detailed documentation on the API, installation and usage of the Chimbuko "PerformanceAnalysis" backend can be found here, and documentation on the visualization module can be found here.

Releases

The current v7.0 release includes updates to the following components:

Chimbuko backend

Added anomaly "post-pruning"; a second pass over recorded anomalies is performed at the end of the run, re-evaluating stored anomalies against the final AD model and discarding those no longer considered anomalous. This significantly reduces the number of mislabeled anomalies resulting from an unconverged AD model.
Major refactoring of backend codebase, separating out generic functionality and that specific to analyzing performance trace data. Additional modules can now be created to analyze other streaming data types.
Improvements to services including:
- Finer control over network interfaces used by service components
- Optional NUMA binding of service components
- Additional controls over the frequency of provenance database and pserver sends from the AD modules to reduce network load
Improvements to AD algorithm robustness for several edge cases
Various bugfixes

Visualization

Updated dependencies for the visualization component

Offline analysis

Added a new tool for converting Chimbuko's provenance database to a relational database
Added a preliminary version of a new Python library for offline analysis built around the relational database

Chimbuko Data Analysis

This library provides C/C++ APIs to process TAU performance profile and traces.

Chimbuko Visualization

This is a visualization framework for online performance analysis. This framework mainly focuses on visualizing real-time anomalous behaviors in a High Performance Computing application so that any patterns of anomalies that users might not have recognized can be effectively detected through online visual analytics.

Citations

For citing Chimbuko, please use:

Kelly C., Xu W., Pouchard L.C., et al. "Performance analysis and data reduction for exascale scientific workflows". The International Journal of High Performance Computing Applications. 2025;39(4):553-578. doi: 10.1177/10943420251316253

C. Kelly et al., “Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool,” in ICPS Proceedings, in ISAV’20. online: Association for Computing Machinery, Nov. 2020, pp. 15–19. doi: 10.1145/3426462.3426465.

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
ChimbukoOfflineAnalysis @ e3416eb		ChimbukoOfflineAnalysis @ e3416eb
ChimbukoProvDBconvert @ 438e9eb		ChimbukoProvDBconvert @ 438e9eb
ChimbukoVisualizationII @ 2c5fe62		ChimbukoVisualizationII @ 2c5fe62
PerformanceAnalysis @ b12b9a2		PerformanceAnalysis @ b12b9a2
demo/SC18/NWChem		demo/SC18/NWChem
docs		docs
documents		documents
figures		figures
papers		papers
sphinx		sphinx
version_1		version_1
.gitignore		.gitignore
.gitmodules		.gitmodules
Chimbuko-logo.png		Chimbuko-logo.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chimbuko

Introduction

Documentation

Releases

Chimbuko backend

Visualization

Offline analysis

Chimbuko Data Analysis

Chimbuko Visualization

Citations

About

Uh oh!

Releases 10

Uh oh!

Contributors 7

Uh oh!

Languages

CODARcode/Chimbuko

Folders and files

Latest commit

History

Repository files navigation

Chimbuko

Introduction

Documentation

Releases

Chimbuko backend

Visualization

Offline analysis

Chimbuko Data Analysis

Chimbuko Visualization

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 10

Uh oh!

Contributors 7

Uh oh!

Languages