Skip to content

Latest commit

 

History

History
1538 lines (1179 loc) · 69.5 KB

File metadata and controls

1538 lines (1179 loc) · 69.5 KB

Command Line Interface Documentation

Please double check that you've followed the installation instructions in the root README. The procedure has changed recently. This document assumes that you have run pip install -e . as instructed.

Table of Contents

Pipeline Types

Each autonomy system is broken into four pipelines. The intended use case for each pipeline type is described below. The remainder of this document describes the detailed usage for each pipeline type in each autonomy system.

Pipeline Type Use Case
Flight Pipeline Flight pipelines implement the autonomy run onboard the instrument. This can be considered flight software.
Ground Pipeline Ground pipelines implement standard ground processing run after flight pipeline data has been downlinked. This can be considered ground data system software. The flight pipeline must be run before the ground pipeline.
Analysis Pipeline Analysis pipelines provide the capabilities of flight and ground pipelines, as well as additional evaluation tools utilized by machine learning SMEs assessing the health of the overall autonomy system.
Simulation Pipeline Simulation pipelines are a standalone capability used to generate synthetic instrument data for testing purposes.

ACME

ACME_flight_pipeline

For a first-time run on raw files (this will generate .pickle files for future use):

$ ACME_flight_pipeline --data "path/to/files/*.raw" --outdir specify/directory

For pickle files (Scan all .pickle files in directory):

$ ACME_flight_pipeline --data "path/to/files/*.pickle" --outdir specify/directory

For reprocessing the database:

$ ACME_flight_pipeline --reprocess_dir "labdata/toplevel/" --reprocess_version vX.y --reprocess

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --data Glob of files to be processed None
💡 --outdir Output directory path None
--masses Path to file containing known masses cli/configs/compounds.yml
--params Path to config file for Analyzer cli/configs/acme_config.yml
--sue_weights Path to weights for Science Utility Estimate cli/configs/acme_sue_weights.yml
--dd_weights Path to weights for Diversity Descriptor cli/configs/acme_dd_weights.yml
--log_name Filename for the pipeline log ACME_flight_pipeline.log
--log_folder Folder path to store logs cli/logs
--kill_file Pipeline will halt if this file is found ACME_flight_kill_file
--cores Number of processor cores to utilize 7
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None

ACME_ground_pipeline

For a first-time run on raw files (this will generate .pickle files for future use):

$ ACME_ground_pipeline --data "path/to/files/*.raw" --outdir specify/directory

For pickle files (Scan all .pickle files in directory):

$ ACME_ground_pipeline --data "path/to/files/*.pickle" --outdir specify/directory

For reprocessing the database:

$ ACME_ground_pipeline --reprocess_dir "labdata/toplevel/" --reprocess_version vX.y --reprocess

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --data Glob of files to be processed None
💡 --outdir Output directory path None
--masses Path to file containing known masses cli/configs/compounds.yml
--params Path to config file for Analyzer cli/configs/acme_config.yml
--sue_weights Path to weights for Science Utility Estimate cli/configs/acme_sue_weights.yml
--dd_weights Path to weights for Diversity Descriptor cli/configs/acme_dd_weights.yml
--log_name Filename for the pipeline log ACME_ground_pipeline.log
--log_folder Folder path to store logs cli/logs
🔽 --noplots Disables plotting output None
🔽 --noexcel Disables excel file output None
🔼 --debug_plots Enables per-peak plots for debugging purposes None
💡 🔽 --space_mode Only output science products; equivalent to --noplots --noexcel None
--cores Number of processor cores to utilize 7
--saveheatmapdata Save heatmap as data file in addition to image None
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None
💡 --knowntraces Process only known masses specified in configs/compounds.yml None

ACME_analysis_pipeline

For a first-time run on raw files (this will generate .pickle files for future use):

$ ACME_analysis_pipeline --data "path/to/files/*.raw" --outdir specify/directory

For pickle files (Scan all .pickle files in directory):

$ ACME_analysis_pipeline --data "path/to/files/*.pickle" --outdir specify/directory

For reprocessing the database:

$ ACME_analysis_pipeline --reprocess_dir "labdata/toplevel/" --reprocess_version vX.y --reprocess

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --data Glob of files to be processed None
💡 --outdir Output directory path None
--masses Path to file containing known masses cli/configs/compounds.yml
--params Path to config file for Analyzer cli/configs/acme_config.yml
--sue_weights Path to weights for Science Utility Estimate cli/configs/acme_sue_weights.yml
--dd_weights Path to weights for Diversity Descriptor cli/configs/acme_dd_weights.yml
--log_name Filename for the pipeline log ACME_analysis_pipeline.log
--log_folder Folder path to store logs cli/logs
🔽 --noplots Disables plotting output None
🔽 --noexcel Disables excel file output None
🔼 --debug_plots Enables per-peak plots for debugging purposes None
💡 🔽 --space_mode Only output science products; equivalent to --noplots --noexcel None
--cores Number of processor cores to utilize 7
--saveheatmapdata Save heatmap as data file in addition to image None
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None
💡 --knowntraces Process only known masses specified in configs/compounds.yml None

ACME_simulator

Simulate Raw ACME samples to debug and better understand ACME analyser. This Simulator was used to generate the Silver and Golden Dataset at data_OWLS/ACME/.... The config file for those datasets is saved in those folders as well.

Arguments

Argument flag Description Default Value
--params Path to config file cli/configs/acme_sim_params.yml
--out_dir Path to output directory None
--n_runs Number of simulation runs 10
--log_name Filename for pipeline log ACME_simulator.log
--log_folder Folder path to store logs cli/logs

ACME_evaluation

ACME Evaluation measures the performance of ACME on simulator data and hand-labeled lab data. There are two versions of this script:

  • ACME_evaluation.py is the original evaluation script. It calculates the output precision and the label recall, which is the number of output peaks that match to any label and the number of labels that match to any output, respectively. Note that the f1-score cannot be calculated due to differing populations.
  • ACME_evaluation_strict.py is the stricter evaluation script. It enforces a one-to-one match between the output and labeled peaks, marking duplicate detections as false positives. This allows the script to calculate a formal precision, recall, and f1. This script is recommended.

Arguments

Argument flag Description Default Value
acme_outputs Required, glob of peak output from analyzer None
acme_labels Required, glob of peak labels None
--hand_labels Expects hand labels (as opposed to simulator labels) None
--mass_threshold Max distance (in mass indices) between matching peaks. 30
--time_threshold Max distance (in time indices) between matching peaks. 30
--ambiguous Some peaks are labeled as ambiguous. Consider these as true peaks. None
--log_name Filename for pipeline log. ACME_evaluation.log
--log_folder Folder path to store logs. cli/logs

HELM

HELM_flight_pipeline

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/helm_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 🔽 --space_mode Only output space products. Skips most plots. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. HELM_flight_pipeline.log
--log_folder Folder path to store logs. cli/logs
--kill_file Pipeline will halt if this file is found HELM_flight_kill_file
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None
--predict_model Path to ML model for motility classification. cli/models/classifier_labtrain_v02.pickle

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
preproc Lowers the resolution from 2048x2048 to 1024x1024 for analysis. TRUE
validate Generates data validation products, including videos and MHIs. TRUE
tracker Track particles in the experiment. TRUE
features Extract features from detected tracks. FALSE
predict Predict motility of all tracks with classification model. FALSE
asdp Generate ASDP products, including a visualization video. FALSE
manifest Generate file manifest for JEWEL. FALSE

Most steps depend on output from all previous steps. This table lists step prerequisites.

Step Name Prerequisite Steps Other Reqs
preproc N/A N/A
validate preproc N/A
tracker preproc validate N/A
features preproc validate tracker track_evaluation Optional
predict preproc validate tracker features Pretrained Model
asdp preproc validate tracker features predict Track Labels Optional
manifest N/A Various validate, tracker, predict, asdp Products Optional

There are also pipelines available for common combinations of steps.

Pipeline Name Description Steps
pipeline_train Pipeline to train the motility classifier. preproc, validate, tracker, track_evaluation, features, train
pipeline_predict Pipeline to predict motility. preproc, validate, tracker, features, predict
pipeline_products Pipeline to generate all products. preproc, validate, tracker, features, predict, asdp, manifest
pipeline_space Pipeline to generate space-mode products. Disables most plotting, especially in validate. preproc, validate, tracker, features, predict, asdp, manifest

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory Holograms/
  • Have at least 50 valid holograms in said subdirectory. Valid holograms are:
    • Images with extension .tif
    • Images with resolution 2048x2048
    • These values can be configured in the config.
  • The enumerated names of the images are expected to be consecutive.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in helm_config.yml before executing the pipeline. These use src/cli/configs/helm_config.yml by default.

Validate your Experiment Data

HELM_flight_pipeline \
--experiments "my/experiments/glob/wildcard_*_string" \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate

Note how, by adding a wildcard, you can process multiple experiments at once.

Generate Particle Tracks

HELM_flight_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate tracker \
--use_existing preproc validate

Note how, by specifying --use_existing, the pipeline will use existing preproc and validate step output if they already exist.

Validate, Track Particles, Predict Motility, and Generate Visualization

HELM_flight_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps pipeline_products \
--use_existing preproc validate tracker

Note that --config and --predict_model can also be specified, but we're just using the default values here.

Tracker outputs

In the output folder, the following subfolders will be made:

/plots: plots of all the tracks, where each track is colored with a distinct color

/tracks: subfolders for each cases, with .track files giving the coordinates

/configs: configuration file for the case

/train classifier: an empty folder, which is needed by train_model.py

Classifier outputs

In the output folder, under /train classifier, you will see the following:

track motility.csv, which gives each track with its label, is just a log file output

yourclassifier.pickle, classifier in pkl form

plots/ roc curve, confusion matrix, feature_importance if running Zaki's classifier

HELM_ground_pipeline

⚠️ The flight pipeline must be run before the ground pipeline.

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/helm_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. HELM_ground_pipeline.log
--log_folder Folder path to store logs. cli/logs
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
validate Generates data validation products, including videos and MHIs. TRUE
asdp Generate ASDP products, including a visualization video. FALSE

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory Holograms/
  • Have at least 50 valid holograms in said subdirectory. Valid holograms are:
    • Images with extension .tif
    • Images with resolution 2048x2048
    • These values can be configured in the config.
  • The enumerated names of the images are expected to be consecutive.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in helm_config.yml before executing the pipeline. These use src/cli/configs/helm_config.yml by default.

HELM_analysis_pipeline

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/helm_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 🔽 --space_mode Only output space products. Skips most plots. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. HELM_analysis_pipeline.log
--log_folder Folder path to store logs. cli/logs
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None
--train_feats Only usees tracks with labels for model training. None
--predict_model Path to ML model for motility classification. cli/models/classifier_labtrain_v02.pickle
--toga_config Override config filepath for TOGA optimization. None

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
preproc Lowers the resolution from 2048x2048 to 1024x1024 for analysis. TRUE
validate Generates data validation products, including videos and MHIs. TRUE
tracker Track particles in the experiment. TRUE
point_evaluation Using track labels, measure point accuracy of the tracker. TRUE
track_evaluation Using track labels, measure track accuracy of the tracker. TRUE
features Extract features from detected tracks. FALSE
train Train the motility classification model. FALSE
predict Predict motility of all tracks with classification model. FALSE
asdp Generate ASDP products, including a visualization video. FALSE
manifest Generate file manifest for JEWEL. FALSE

Most steps depend on output from all previous steps. This table lists step prerequisites.

Step Name Prerequisite Steps Other Reqs
preproc N/A N/A
validate preproc N/A
tracker preproc validate N/A
point_evaluation preproc validate tracker Track Labels
track_evaluation preproc validate tracker Track Labels
features preproc validate tracker track_evaluation Optional
train preproc validate tracker track_evaluation features Track Labels
predict preproc validate tracker features Pretrained Model
asdp preproc validate tracker features predict Track Labels Optional
manifest N/A Various validate, tracker, predict, asdp Products Optional

There are also pipelines available for common combinations of steps.

Pipeline Name Description Steps
pipeline_train Pipeline to train the motility classifier. preproc, validate, tracker, track_evaluation, features, train
pipeline_predict Pipeline to predict motility. preproc, validate, tracker, features, predict
pipeline_tracker_eval Pipeline to evaluate tracker performance. preproc, validate, tracker, point_evaluation, track_evaluation
pipeline_products Pipeline to generate all products. preproc, validate, tracker, point_evaluation, track_evaluation, features, predict, asdp, manifest
pipeline_space Pipeline to generate space-mode products. Disables most plotting, especially in validate. preproc, validate, tracker, features, predict, asdp, manifest

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory raw/
  • Have at least 50 valid holograms in said subdirectory. Valid holograms are:
    • Images with extension .tif
    • Images with resolution 2048x2048
    • These values can be configured in the config.
  • The enumerated names of the images are expected to be consecutive.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in helm_config.yml before executing the pipeline. These use src/cli/configs/helm_config.yml by default.

Validate your Experiment Data

HELM_analysis_pipeline \
--experiments "my/experiments/glob/wildcard_*_string" \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate

Note how, by adding a wildcard, you can process multiple experiments at once.

Generate Particle Tracks

HELM_analysis_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate tracker \
--use_existing preproc validate

Note how, by specifying --use_existing, the pipeline will use existing preproc and validate step output if they already exist.

Train a motility model

Use the pipeline_train step bundle to run the tracker, evaluation, feature generator, and training steps. The --use_existing flag will skip any steps that were previously computed:

HELM_analysis_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps pipeline_train \
--train_feats \
--use_existing preproc validate

Validate, Track Particles, Predict Motility, and Generate Visualization

HELM_analysis_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps pipeline_products \
--use_existing preproc validate tracker

Note that --config and --predict_model can also be specified, but we're just using the default values here.

Tracker outputs

In the output folder, the following subfolders will be made:

/plots: plots of all the tracks, where each track is colored with a distinct color

/tracks: subfolders for each cases, with .track files giving the coordinates

/configs: configuration file for the case

/train classifier: an empty folder, which is needed by train_model.py

Classifier outputs

In the output folder, under /train classifier, you will see the following:

track motility.csv, which gives each track with its label, is just a log file output

yourclassifier.pickle, classifier in pkl form

plots/ roc curve, confusion matrix, feature_importance if running Zaki's classifier

HELM_simulator

The HELM simulator generates synthetic DHM data. It can be used as a source of sensitivity and sanity checks to assess performance of edge cases and limits (such as particle speed) of HELM. This tool is broken down into two major steps:

  1. simulate particle tracks (specifying position, brightness, velocity through time)
  2. simulate DHM images from tracks (creating 2D tif hologram images using noise and tracks)
HELM_simulator [required-args] [optional-args]

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --configs Configuration parameters for synthetic data. configs/helm_simulator_config.yml
💡 --n_exp Number of experiments to generate with config(s). 1
--sim_outdir Directory to save the synthetic data to. None
--log_name Filename for the pipeline log. HELM_simulator.log
--log_folder Folder path to store logs. cli/logs

Common Usage Examples

# Single config
HELM_simulator.py \
--configs src/cli/configs/helm_simulator_config_v2.yml \
--n_exp 2 \
--sim_outdir <local_output_dir>

# Multiple configs
HELM_simulator.py \
--configs src/cli/configs/sim_config_v*.yml \
--n_exp 2 \
--sim_outdir <local_output_dir>

Config options

Within the configuration file, you can set items like

  • image parameters (e.g., resolution, chamber size, noise characteristics, etc.)
  • experiment parameters (number of motile/non-motile particles, length of recording, drift, etc.)
  • particle parameters (e.g., shape/size/brightness of particle, movement distribution, etc.)

There are two pre-baked configuration to choose from. They differ slightly in how they generate motile tracks dynamics.

  1. helm_simulator_config_v1.yml This configuration generates all tracks (motile and non-motile) by making random perturbations to a track's velocity at each time step. Motile tracks are best generated by assigning a wide movement distribution and high momentum. This approach is simple to use, but creates tracks that are less realistic than in helm_simulator_config_v2.yml.

  2. helm_simulator_config_v2.yml This configuration (default) uses Variational Autoregression (VAR) models to simulate more realistic motile tracks. Non-motile tracks are simulated using the same random perturbation approach in helm_simulator_config_v1.yml. VAR models are essentially N-dimensional autoregression models. Use VAR models that were fitted to real particle tracks to generate synthetic ones.

In the config, you must specify a VAR model file that was calibrated using the statsmodels package. The VAR model files are stored in helm_dhm/simulator/var_models and an example on how to fit a VAR model to data is in src/research/wronk/simulation_dynamics. See this statsmodels page for general information on VAR models.


FAME

FAME_flight_pipeline

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/fame_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 🔽 --space_mode Only output space products. Skips most plots. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. FAME_flight_pipeline.log
--log_folder Folder path to store logs. cli/logs
--kill_file Pipeline will halt if this file is found FAME_flight_kill_file
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None
--predict_model Path to ML model for motility classification. cli/models/classifier_labtrain_v02.pickle

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
preproc Lowers the resolution from 2048x2048 to 1024x1024 for analysis. TRUE
validate Generates data validation products, including videos and MHIs. TRUE
tracker Track particles in the experiment. TRUE
features Extract features from detected tracks. FALSE
train Train the motility classification model. FALSE
predict Predict motility of all tracks with classification model. FALSE
asdp Generate ASDP products, including a visualization video. FALSE
manifest Generate file manifest for JEWEL. FALSE

Most steps depend on output from all previous steps. This table lists step prerequisites.

Step Name Prerequisite Steps Other Reqs
preproc N/A N/A
validate preproc N/A
tracker preproc validate N/A
features preproc validate tracker track_evaluation Optional
train preproc validate tracker track_evaluation features Track Labels
predict preproc validate tracker features Pretrained Model
asdp preproc validate tracker features predict Track Labels Optional
manifest N/A Various validate, tracker, predict, asdp Products Optional

There are also pipelines available for common combinations of steps.

Pipeline Name Description Steps
pipeline_train Pipeline to train the motility classifier. preproc, validate, tracker, features, train
pipeline_predict Pipeline to predict motility. preproc, validate, tracker, features, predict
pipeline_products Pipeline to generate all products. preproc, validate, tracker, point_evaluation, features, predict, asdp, manifest
pipeline_space Pipeline to generate space-mode products. Disables most plotting, especially in validate. preproc, validate, tracker, features, predict, asdp, manifest

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory Holograms/
  • Have at least 50 valid frames in said subdirectory. Valid frames are:
    • Images with extension .tif
    • Images with resolution 2048x2048
    • These values can be configured in the config.
  • The enumerated names of the images are expected to be consecutive.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in fame_config.yml before executing the pipeline. These use src/cli/configs/fame_config.yml by default.

Validate your Experiment Data

FAME_flight_pipeline \
--experiments "my/experiments/glob/wildcard_*_string" \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate

Note how, by adding a wildcard, you can process multiple experiments at once.

Generate Particle Tracks

FAME_flight_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate tracker \
--use_existing preproc validate

Note how, by specifying --use_existing, the pipeline will use existing preproc and validate step output if they already exist.

Validate, Track Particles, Predict Motility, and Generate Visualization

FAME_flight_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps pipeline_products \
--use_existing preproc validate tracker

Note that --config and --predict_model can also be specified, but we're just using the default values here.

Tracker outputs

In the output folder, the following subfolders will be made:

/plots: plots of all the tracks, where each track is colored with a distinct color

/tracks: subfolders for each cases, with .track files giving the coordinates

/configs: configuration file for the case

/train classifier: an empty folder, which is needed by train_model.py

Classifier outputs

In the output folder, under /train classifier, you will see the following:

track motility.csv, which gives each track with its label, is just a log file output

yourclassifier.pickle, classifier in pkl form

plots/ roc curve, confusion matrix, feature_importance if running Zaki's classifier

FAME_ground_pipeline

⚠️ The flight pipeline must be run before the ground pipeline.

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/fame_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 🔽 --space_mode Only output space products. Skips most plots. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. FAME_ground_pipeline.log
--log_folder Folder path to store logs. cli/logs
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
validate Generates data validation products, including videos and MHIs. TRUE
asdp Generate ASDP products, including a visualization video. FALSE

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory Holograms/
  • Have at least 50 valid frames in said subdirectory. Valid frames are:
    • Images with extension .tif
    • Images with resolution 2048x2048
    • These values can be configured in the config.
  • The enumerated names of the images are expected to be consecutive.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in fame_config.yml before executing the pipeline. These use src/cli/configs/fame_config.yml by default.

Validate your Experiment Data

FAME_ground_pipeline \
--experiments "my/experiments/glob/wildcard_*_string" \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate

Note how, by adding a wildcard, you can process multiple experiments at once.

Validate, Track Particles, Predict Motility, and Generate Visualization

FAME_ground_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps pipeline_products \
--use_existing preproc validate tracker

Note that --config and --predict_model can also be specified, but we're just using the default values here.

FAME_analysis_pipeline

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/fame_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 🔽 --space_mode Only output space products. Skips most plots. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. FAME_analysis_pipeline.log
--log_folder Folder path to store logs. cli/logs
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None
--train_feats Only usees tracks with labels for model training. None
--predict_model Path to ML model for motility classification. cli/models/classifier_labtrain_v02.pickle
--toga_config Override config filepath for TOGA optimization. None

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
preproc Lowers the resolution from 2048x2048 to 1024x1024 for analysis. TRUE
validate Generates data validation products, including videos and MHIs. TRUE
tracker Track particles in the experiment. TRUE
point_evaluation Using track labels, measure point accuracy of the tracker. TRUE
track_evaluation Using track labels, measure track accuracy of the tracker. TRUE
features Extract features from detected tracks. FALSE
train Train the motility classification model. FALSE
predict Predict motility of all tracks with classification model. FALSE
asdp Generate ASDP products, including a visualization video. FALSE
manifest Generate file manifest for JEWEL. FALSE

Most steps depend on output from all previous steps. This table lists step prerequisites.

Step Name Prerequisite Steps Other Reqs
preproc N/A N/A
validate preproc N/A
tracker preproc validate N/A
point_evaluation preproc validate tracker Track Labels
track_evaluation preproc validate tracker Track Labels
features preproc validate tracker track_evaluation Optional
train preproc validate tracker track_evaluation features Track Labels
predict preproc validate tracker features Pretrained Model
asdp preproc validate tracker features predict Track Labels Optional
manifest N/A Various validate, tracker, predict, asdp Products Optional

There are also pipelines available for common combinations of steps.

Pipeline Name Description Steps
pipeline_train Pipeline to train the motility classifier. preproc, validate, tracker, track_evaluation, features, train
pipeline_predict Pipeline to predict motility. preproc, validate, tracker, features, predict
pipeline_tracker_eval Pipeline to evaluate tracker performance. preproc, validate, tracker, point_evaluation, track_evaluation
pipeline_products Pipeline to generate all products. preproc, validate, tracker, point_evaluation, track_evaluation, features, predict, asdp, manifest
pipeline_space Pipeline to generate space-mode products. Disables most plotting, especially in validate. preproc, validate, tracker, features, predict, asdp, manifest

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory Holograms/
  • Have at least 50 valid frames in said subdirectory. Valid frames are:
    • Images with extension .tif
    • Images with resolution 2048x2048
    • These values can be configured in the config.
  • The enumerated names of the images are expected to be consecutive.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in fame_config.yml before executing the pipeline. These use src/cli/configs/fame_config.yml by default.

Validate your Experiment Data

FAME_analysis_pipeline \
--experiments "my/experiments/glob/wildcard_*_string" \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate

Note how, by adding a wildcard, you can process multiple experiments at once.

Generate Particle Tracks

FAME_anaysis_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps preproc validate tracker \
--use_existing preproc validate

Note how, by specifying --use_existing, the pipeline will use existing preproc and validate step output if they already exist.

Train a motility model

Use the pipeline_train step bundle to run the tracker, evaluation, feature generator, and training steps. The --use_existing flag will skip any steps that were previously computed:

FAME_analysis_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps pipeline_train \
--train_feats \
--use_existing preproc validate

Validate, Track Particles, Predict Motility, and Generate Visualization

FAME_analysis_pipeline \
--experiments my/experiments/glob/string \
--batch_outdir my/experiments/batch_directory \
--steps pipeline_products \
--use_existing preproc validate tracker

Note that --config and --predict_model can also be specified, but we're just using the default values here.

Tracker outputs

In the output folder, the following subfolders will be made:

/plots: plots of all the tracks, where each track is colored with a distinct color

/tracks: subfolders for each cases, with .track files giving the coordinates

/configs: configuration file for the case

/train classifier: an empty folder, which is needed by train_model.py

Classifier outputs

In the output folder, under /train classifier, you will see the following:

track motility.csv, which gives each track with its label, is just a log file output

yourclassifier.pickle, classifier in pkl form

plots/ roc curve, confusion matrix, feature_importance if running Zaki's classifier


HIRAILS

HIRAILS_flight_pipeline

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/hirails_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 🔽 --space_mode Only output space products. Skips most plots. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. HIRAILS_flight_pipeline.log
--log_folder Folder path to store logs. cli/logs
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
tracker Identify particles in the experiment. TRUE
asdp Generate ASDP products. FALSE
manifest Generate file manifest for JEWEL. FALSE

Most steps depend on output from all previous steps. This table lists step prerequisites.

Step Name Prerequisite Steps Other Reqs
tracker N/A N/A
asdp tracker Track Labels
manifest N/A asdp Products

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory Holograms/
  • Have at least 1 valid frames in said subdirectory. Valid frames are:
    • Images with extension .tif
    • Images with resolution 2200x3208x3
    • These values can be configured in the config.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in hirails_config.yml before executing the pipeline. These use src/cli/configs/hirails_config.yml by default.

Process Experiment Data

HIRAILS_flight_pipeline \
--experiments "my/experiments/glob/wildcard_*_string" \
--batch_outdir my/experiments/batch_directory \
--steps tracker asdp manifest

Note how, by adding a wildcard, you can process multiple experiments at once.

HIRAILS_ground_pipeline

⚠️ The flight pipeline must be run before the ground pipeline.

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/hirails_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 🔽 --space_mode Only output space products. Skips most plots. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. HIRAILS_ground_pipeline.log
--log_folder Folder path to store logs. cli/logs
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
tracker Identify particles in the experiment. TRUE
asdp Generate ASDP products. FALSE
manifest Generate file manifest for JEWEL. FALSE

Most steps depend on output from all previous steps. This table lists step prerequisites.

Step Name Prerequisite Steps Other Reqs
tracker N/A N/A
asdp tracker Track Labels
manifest N/A asdp Products

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory Holograms/
  • Have at least 1 valid frames in said subdirectory. Valid frames are:
    • Images with extension .tif
    • Images with resolution 2200x3208x3
    • These values can be configured in the config.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in hirails_config.yml before executing the pipeline. These use src/cli/configs/hirails_config.yml by default.

Process Experiment Data

HIRAILS_ground_pipeline \
--experiments "my/experiments/glob/wildcard_*_string" \
--batch_outdir my/experiments/batch_directory \
--steps tracker asdp manifest

Note how, by adding a wildcard, you can process multiple experiments at once.

HIRAILS_analysis_pipeline

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
💡 --config Filepath of configuration file. cli/configs/hirails_config.yml
--experiments Glob string pattern of experiment directories to process. None
--steps Steps of the pipeline to run. See below for description of steps. None
--batch_outdir Output directory for batch-level results. None
💡 🔽 --use_existing Attempt to reuse previous processing output for any steps defined here. See description below for options. None
💡 🔽 --space_mode Only output space products. Skips most plots. None
💡 --cores Number of processor cores to utilize. 7
--note String to be appended to output directory name. None
--log_name Filename for the pipeline log. HIRAILS_analysis_pipeline.log
--log_folder Folder path to store logs. cli/logs
--priority_bin Downlink priority bin (lower number is higher priority) for generated products 0
--manifest_metadata Manifest metadata (YAML string); takes precedence over file entries None
--manifest_metadata_file Manifest metadata file (YAML) None

Steps

This table lists all steps available. It also indicates which steps can be used with the --use_existing step. It is listed in typical order of usage.

Step Name Description --use_existing
tracker Identify particles in the experiment. TRUE
asdp Generate ASDP products. FALSE
manifest Generate file manifest for JEWEL. FALSE

Most steps depend on output from all previous steps. This table lists step prerequisites.

Step Name Prerequisite Steps Other Reqs
tracker N/A N/A
asdp tracker Track Labels
manifest N/A asdp Products

Valid Experiments

An experiment is defined by a unique directory. To be considered valid, an experiment must satisfy the following:

  • Contain the subdirectory Holograms/
  • Have at least 1 valid frames in said subdirectory. Valid frames are:
    • Images with extension .tif
    • Images with resolution 2200x3208x3
    • These values can be configured in the config.

Common Usage Examples

A brief reminder that these examples assume you have followed the installation instructions.

Make sure to specify desired configuration parameters in hirails_config.yml before executing the pipeline. These use src/cli/configs/hirails_config.yml by default.

Process Experiment Data

HIRAILS_analysis_pipeline \
--experiments "my/experiments/glob/wildcard_*_string" \
--batch_outdir my/experiments/batch_directory \
--steps tracker asdp manifest

Note how, by adding a wildcard, you can process multiple experiments at once.


CSM

CSM_flight_pipeline

Arguments

This table lists all arguments available. They are annotated with emoji flags to indicate the following:

  • ✅ Required
  • 🔼 Increased Runtime
  • 🔽 Decreased Runtime
  • 💡 Useful, often used
  • ❗ Warning, requires deliberate use
Argument flag Description Default Value
compath Filepath of COM CSV file. None
💡 --config Filepath of configuration file. cli/configs/csm_config.yml
--log_name Filename for the pipeline log. CSM_flight_pipeline.log
--log_folder Folder path to store logs. cli/logs

Valid COM CSV Files

Valid COM CSV files are generated by /owls-bus/owls-bus-fprime/blob/devel/util/log_processor.py, and contain the bp_CSM.EC_Conductivity channel name.

Common Usage Examples

Process CSM COM output

CSM_flight_pipeline CSM_2_1639618003_239055.csv

JEWEL

JEWEL generates an ordering for downlinking ASDPs contained within an ASDP Database (ASDP DB), given a user-specified configuration file.

JEWEL [required-args] [optional-args]
Required Argument Flag Description Default
dbfile The path to the ASDP DB file None
outputfile The path to the output CSV file where the ordered data products will be written None
--config Path to a config (.yml) file. cli/configs/jewel_default.yml
--log_name Filename for the pipeline log. JEWEL.log
--log_folder Folder path to store logs. cli/logs

Auxiliary Scripts

Below are several several scripts used to support JEWEL by managing the ASDP DB. First is a script to update the contents of the ASDP DB. The first time the script is invoked, an ASDP DB will be initialized and populated. During subsequent invocations, the DB will be updated with any new ASDPs that have been generated.

update_asdp_db [required-args] [optional-args]
Required Argument Flag Description Default
rootdirs A list of root directories for each of the ASDP results (e.g., for HELM or ACME); each directory should correspond to a single ASDP. None
dbfile The path to where the DB file will be stored (currently CSV format) None
--log_name Filename for the pipeline log. update_asdp_db.log
--log_folder Folder path to store logs. cli/logs

The next script simulates a downlink for testing JEWEL with ground in the loop. It traverses the ordering produced by JEWEL, "downlinks" untransmitted ASDPs, and marks them as transmitted within the ASDP DB.

simulate_downlink [required-args] [optional-args]
Required Argument Description Default
dbfile The path to the ASDP DB file None
orderfile The path to the ASDP ordering file produced by JEWEL None
datavolume The simulated downlink data volume in bytes None
-d/--downlinkdir Simulate downlink of ASDP files by copying to this directory. If None, still mark files as downlinked. None
--log_name Filename for the pipeline log simulate_downlink.log
--log_folder Folder path to store logs. cli/logs

The next script is used to manually set the downlink status of individual ASDPs. This can be invoked during the downlink process, or via a ground command to manually reset the downlink status of an item that was transmitted but not received, for example.

set_downlink_status [required-args] [optional-args]
Required Argument Description Default
dbfile The path to the ASDP DB file. None
asdpid Integer ASDP identifier. None
status The new downlink status, either "transmitted" or "untransmitted". None
--log_name Filename for the pipeline log. set_downlink_status.log
--log_folder Folder path to store logs. cli/logs

Finally, the last script will plot the results of a downlink session, showing the JEWEL ordering, visualizing SUEs and DDs, and providing summary and detailed visualizations for each ASDP. Currently, the script only supports complete downlinks, not partial downlinks, using the simulate_downlink script.

JEWEL_plot_downlink [required-args] [optional-args]
Required Argument Flag Description Default
sessiondir The downlink session directory to which files have been copied via the simulate_downlink command. None
outputdir The output directory that will contain visualization files, with the main index.html file at the root. None

SUE Options

For HELM and FAME pipelines, there are a several methods for calculating science utility estimate (SUEs):

  • sum_confidence: add track motility confidence values within an observation. Statistically, this is equivalent to the expected number of motile tracks within the observation. This method uses a max_sum parameter, which is used to normalize the total sum between 0 and 1.

  • topk_confidence: computes the probability that at least motile track is present within the top k most confident tracks. The parameter k can be adjusted.

  • intensity: uses an intensity-based SUE calculation to be used specifically with FAME data. To compute a single intensity across the entire observation, there is a track_statistic parameter (either "minimum", "maximum", "median", or "percentile") that summarizes the intensity across each track. If the "percentile" option is used, there is a percentile parameter under track_statistic_params to specify the percentile. Likewise, there is an observation_statistic parameter to summarize the intensity across tracks. Finally, there are weights for red, green, blue, or grayscale channels that can be used to combine intensities across channels. The final intensity is normalized between 0 and 1.

Common Usage Examples

The following example shows how JEWEL and associated tools can be used as part of the OWLS Autonomy pipeline. First, update the ASDP DB using:

$ update_asdp_db --rootdirs [path to ASDP root directories] --dbfile asdpdb.csv

The asdpdb.csv file will be created or updated if it already exists to contain the ASDPs specified. This process should be performed before any downlink after new ASDPs are generated. At any time, JEWEL can be invoked using the following command:

$ JEWEL --dbfile asdpdb.csv --outputfile jewel_ordering.csv -c jewel_config.yml

The ASDPs ordered for downlink will be placed in jewel_ordering.csv. To simulate downlink of the ASDPs, use:

$ mkdir downlink
$ simulate_downlink --dbfile asdpdb.csv --orderfile jewel_ordering.csv --datavolume -1 -d downlink

This will create a downlink session directory containing the ASDPs. To plot the and view results for this session, use:

$ JEWEL_plot_downlink downlink/20220421T154736/ downlink/20220421T154736/visualization
$ open downlink/20220421T154736/visualization/index.html

TOGA

Running TOGA on HELM

Generic docs for getting toga installed here: https://github-fn.jpl.nasa.gov/MLIA/TOGA/blob/master/README.md

git clone https://github-fn.jpl.nasa.gov/MLIA/TOGA.git
cd TOGA
conda env create -f envs/toga36-env.yml
source activate toga36
python setup.py develop

python setup.py develop is critical here, not pip install . . This will set up the package such that the installation uses code from this directory instead of copying files to the pip library.

cli/TOGA_wrapper.py is the main interface between HELM and TOGA. This script reads in a TOGA generated config file along with the experiment directory to run on. It then calls HELM_pipeline via subprocess and reports back to TOGA via a generated metrics.csv file.

In addition to usual TOGA parameters a subset of point and/or track evaluation metrics, metrics_names, must be specified in the config (as a list). These metrics are each aggregated over the experiments via a simple mean. In the case of multiple metrics, TOGA will treat all but one as "fixed axes" - toga does not optimize over fixed axes, rather it searches for top solutions according to the single "non-fixed" axis across the full spectrum of the others. See the banana-sink example on the toga side for a simple multi-dimensional problem.

Steps on the TOGA side

Since we are committed to maintaining TOGA as a project agnostic tool, a few configuration tweaks are needed specific to running TOGA on HELM. These are mostly handled via TOGA configuration files, discussed below.

After cloning TOGA, all HELM configuration files are in TOGA/test/run_configurations/HELM/. Each of run_settings.yml, gene_performance_metrics.yml, and genetic_algorithm_settings.yml should be copied to TOGA/toga/config/ to override default TOGA configuration.

Furthermore, the following items in run_settings.yml need to be updated:

  • gene_template should be the absolute path to TOGA/test/run_configurations/HELM/helm_config.yml (the only config not copied to the TOGA/toga/config/ folder)
  • work_dir -> base_dir should name a working directory
  • command -> cmd should have the absolute path to TOGA_wrapper.py
  • command -> static_args should name a valid experiment dir

IMPORTANT: These configs will then need to be copied to the toga environment lib directory to take effect. (See https://github-fn.jpl.nasa.gov/MLIA/TOGA/issues/5) for details. Issue closed.

Lastly, the environment variable PYTHON needs to be set (in the TOGA virtual environment) to the python executable running HELM. This is the version of python TOGA_wrapper.py will use to run the helm pipeline.

which python
export PYTHON="absolute/path/to/conda/python"

TODO: Helm should have its own virtual envrionment to avoid the environment variable.

Thomas's debugging tips

  • Important: For MLIA machines, you may need to copy config files to wherever TOGA is installed upon update before calling toga_server or toga_client. For code changes (inserting prints/etc.), rerun pip install on TOGA. It is easy to get into a state where your work repo does not match the install version that commands toga_server/toga_client reference.

  • On starting TOGA client, does the helm pipeline run? I.e. does the client start spewing helm prints?

    • Yes? Continue below
    • No? Look in toga's experiment directory (work_dir -> base_dir specified in run_settings). Are there any yml's in the random_config subdir?
      • Yes? TOGA should be able to call HELM. Put prints in TOGA_wrapper.py on the helm side.
      • No? TOGA is failing to generate configs. Ensure the helm config yml specifies genes correctly (properly indented, type and range for each). Put prints in population.py -> create_individual() and mutate(). TODO: "bool" type is broken; use int in range [0, 1] instead. Issue logged on TOGA side
  • Try running with a single worker on a single experiment (so runtime is short). Upon finishing the helm script does the client print "{'individual': [uuid], 'status': 'successfully stored'}"?

    • Yes? The client seems to be running correctly. If configs do not appear in the best toga experiment subdir, the server may not yet have updated (it prints "running serialization" when this happens) or it may need to be restarted for any config changes to take affect.
    • No? If it looks like the HELM run was cut short, the timeout in run_settings.yml may be set too low. Otherwise, the client is probably failing to parse the outputted metrics after finishing the call to HELM. Prints can be placed at the bottom of TOGA_wrapper.py. Double check that the metric_names key in helm_config.yml matches those in gene_performance_metrics.yml.

Jake's debugging tips

  • Any warnings that the shell is not configured for conda can be safely ignored.
  • The TOGA client will likely not quit on Ctrl-C, use screen then use Ctrl-A, K, then Y to terminate.
  • Don't tell TOGA to use conda in the run settings on analysis or paralysis