This guide covers advanced command-line options, workflows, and configurations for the CLIF Table One tool.
The run_project.py script orchestrates the complete analysis pipeline.
# Full analysis with sampling (recommended)
uv run python run_project.py --sample --no-summary --get-ecdf
# Full dataset analysis (45-90 minutes)
uv run python run_project.py --get-ecdf
# Skip automatic app launch
uv run python run_project.py --sample --no-summary --get-ecdf --no-launch-app
# Validation only
uv run python run_project.py --validate-only --sample --no-summary
# Table One only (skip validation)
uv run python run_project.py --tableone-only
# ECDF bins computation only
uv run python run_project.py --get-ecdf-only
uv run python run_project.py --get-ecdf-only --visualize
# Specific tables validation
uv run python run_project.py --tables patient adt hospitalizationWorkflow Control:
--validate-only Only run validation step
--tableone-only Only run table one generation step
--get-ecdf-only Only run ECDF bins computation step
--get-ecdf Include ECDF in workflow
--visualize Generate HTML visualizations (for ECDF)
--continue-on-error Continue even if previous step fails
--no-launch-app Skip automatic Streamlit app launch
Validation Options:
--tables TABLE [TABLE ...]
Specific tables to validate
--sample Use 1k ICU sample for faster analysis
--no-summary Skip summary statistics generation
--verbose, -v Enable verbose output
Configuration:
--config CONFIG Path to configuration file
The run_analysis.py script provides granular control over validation and summary generation.
# Single table with both validation and summary
uv run python run_analysis.py --patient --validate --summary
# Multiple tables with validation only
uv run python run_analysis.py --patient --hospitalization --validate
# All implemented tables
uv run python run_analysis.py --all --validate --summary
# Use 1k ICU sample for faster analysis
uv run python run_analysis.py --labs --validate --summary --sample
# Specify custom config file
uv run python run_analysis.py --config path/to/config.json --patient --validate
# Verbose output for debugging
uv run python run_analysis.py --patient --validate --summary --verbose
# Quiet mode (minimal output)
uv run python run_analysis.py --all --validate --summary --quiet--patient Patient table
--hospitalization Hospitalization table
--adt ADT table
--code_status Code status table
--crrt_therapy CRRT therapy table
--ecmo_mcs ECMO/MCS table
--hospital_diagnosis Hospital diagnosis table
--labs Labs table
--medication_admin_continuous Continuous medications
--medication_admin_intermittent Intermittent medications
--microbiology_culture Microbiology culture table
--microbiology_nonculture Microbiology non-culture table
--microbiology_susceptibility Susceptibility table
--patient_assessments Patient assessments table
--patient_procedures Patient procedures table
--position Position table
--respiratory_support Respiratory support table
--vitals Vitals table
--all All implemented tables
--validate Run validation using clifpy
--summary Generate summary statistics
--verbose, -v Enable verbose output
--quiet, -q Minimize output (only errors and final summary)
--no-pdf Disable PDF report generation (JSON only)
--sample Use 1k ICU sample for faster analysis
0- Success1- All tables failed2- Partial success (some tables failed)130- Interrupted by user (Ctrl+C)
The --sample flag creates or uses a 1k patient ICU sample for faster processing:
# Create sample and run validation
uv run python run_project.py --sample --validate-only
# Sample behavior:
# 1. First run creates sample from ADT table
# 2. Sample saved to output/final/sample_1k_icu_hospitalizations.csv
# 3. Subsequent runs reuse existing sample
# 4. Core tables (patient, hospitalization, ADT) always use full data
# 5. Other tables filter to sample hospitalization IDsBenefits:
- Reduces runtime from 30-60 min to 5-10 min for all tables
- Maintains validation accuracy for data quality checks
- Ideal for iterative development and testing
ECDF (Empirical Cumulative Distribution Function) bins are used for visualizations in the EDA app.
- Outlier Configuration (
get-ecdf_data/ecdf_config/outlier_config.yaml):
labs:
albumin_g_dl:
lower: 0.5
upper: 10
bicarbonate_meq_l:
lower: 5
upper: 50
# ... more lab configurations
vitals:
heart_rate_bpm:
lower: 20
upper: 250
# ... more vital configurations- Binning Configuration (
get-ecdf_data/ecdf_config/lab_vital_config.yaml):
labs:
albumin_g_dl:
bins: [0, 2.0, 2.5, 3.0, 3.5, 4.0, 100]
labels: ["<2.0", "2.0-2.5", "2.5-3.0", "3.0-3.5", "3.5-4.0", "≥4.0"]
# ... more lab configurations
vitals:
heart_rate_bpm:
bins: [0, 60, 100, 120, 150, 300]
labels: ["<60", "60-100", "100-120", "120-150", "≥150"]
# ... more vital configurations{
"site_name": "Your Hospital Name",
"site_id": "YOUR_ID",
"tables_path": "/path/to/clif/data",
"filetype": "parquet",
"timezone": "America/Chicago",
"output_dir": "output" // Optional, defaults to "output"
}The Table One generation uses several internal configurations:
- Cohort Definition: ICU stays ≥24 hours
- MCIDE Collection: Automated collection of clinically important data elements
- Medication Analysis: Vasoactives, sedatives, paralytics with dose conversions
- SOFA Scoring: Automated calculation with missing data handling
- CCI Calculation: Charlson Comorbidity Index from ICD codes
The Table One generation includes memory optimization features:
- Chunked Processing: Large tables processed in chunks
- Selective Loading: Only required columns loaded for analysis
- Weight Data Pre-loading: Optimized medication dose conversion
- Garbage Collection: Aggressive memory cleanup between steps
# Use sampling for initial runs
uv run python run_project.py --sample --no-summary
# Increase system swap if needed
# Monitor with: watch -n 1 free -h# Check specific table details
uv run python run_analysis.py --patient --validate --verbose
# Review validation report
open output/final/reports/patient_validation_report.pdf# Check configuration files
cat get-ecdf_data/ecdf_config/outlier_config.yaml
cat get-ecdf_data/ecdf_config/lab_vital_config.yaml
# Review unit mismatches
cat output/final/unit_mismatches.log- Validation Logs:
output/final/results/{table}_summary_validation.json - Table One Execution:
output/final/tableone/execution_report.txt - ECDF Processing:
output/final/unit_mismatches.log
- TABLEONE_VIEWER_GUIDE.md - Detailed guide for Table One results viewer
- CLIF Documentation - CLIF consortium documentation
- clifpy Documentation - CLIF validation library