This guide provides strategies for optimizing pipeline performance and identifying bottlenecks.
The pipeline tracks performance metrics for each stage:
- Stage Duration: Time taken for each stage
- Total Duration: pipeline execution time
- Bottleneck Identification: Automatic detection of slowest stages
- ETA Calculations: Estimated time remaining during execution
After pipeline completion, a summary is displayed:
Performance Metrics:
Total Execution Time: 2m 15s
Average Stage Time: 22.5s
Slowest Stage: Stage 5 - PDF Rendering (45s, 33%)
Fastest Stage: Stage 2 - Project Tests (5s)
Each stage reports:
- Execution time
- Percentage of total time
- Bottleneck warnings (if >10s and >20% of total)
The pipeline automatically identifies bottlenecks:
- Stages taking >10 seconds
- Stages consuming >20% of total time
- Marked with ⚠ bottleneck indicator
# Run pipeline with timing
time ./run.sh --pipeline
# Check individual stage times
uv run python scripts/00_setup_environment.py
time uv run python scripts/01_run_tests.py
time uv run python scripts/02_run_analysis.pyBottleneck: Test execution can be slow with large test suites
Optimizations:
- Use pytest-xdist for parallel test execution
- Skip slow tests during development
- Use pytest caching for faster repeated runs
# Parallel test execution
uv run pytest tests/ -n auto
# Skip slow tests
uv run pytest tests/ -m "not slow"Bottleneck: LaTeX compilation is CPU-intensive
Optimizations:
- Use incremental compilation (only rebuild changed sections)
- Cache LaTeX intermediate files
- Use faster LaTeX engines (xelatex vs pdflatex)
# Check LaTeX compilation time
time xelatex document.tex
# Use incremental builds (if supported)Bottleneck: Data processing and figure generation
Optimizations:
- Parallelize independent analysis scripts
- Cache intermediate results
- Optimize data processing algorithms
# Example: Parallel script execution
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor() as executor:
results = executor.map(run_script, scripts)Bottleneck: LLM generation is slow (minutes per review)
Optimizations:
- Use faster models for initial reviews
- Stream responses for progress visibility
- Cache review results
- Skip optional reviews during development
# Skip LLM reviews during development
./run.sh --pipeline # LLM stages are optional
# Use faster model
export OLLAMA_MODEL="smollm2" # Smaller, faster modelMonitor memory consumption:
# Check memory usage during pipeline
/usr/bin/time -v uv run python scripts/execute_pipeline.py --project {name} --core-only
# Monitor continuously
watch -n 1 'ps aux | grep python'Monitor CPU utilization:
# Check CPU usage
top -p $(pgrep -f "python3 scripts")
# Profile CPU-intensive operations
uv run python -m cProfile -o profile.stats scripts/03_render_pdf.pyMonitor file operations:
# Check disk I/O
iostat -x 1
# Monitor specific directory
watch -n 1 'du -sh projects/{name}/output/*'Typical pipeline execution times:
- Setup: 1-2 seconds
- Infrastructure Tests: 30-60 seconds
- Project Tests: 2-5 seconds
- Analysis: 5-15 seconds
- PDF Rendering: 30-90 seconds
- Validation: 1-3 seconds
- Copy Outputs: 1-2 seconds
- LLM Review: 5-15 minutes (optional)
Total (without LLM): ~2-3 minutes Total (with LLM): ~7-18 minutes
- Test Execution: Reduce by 30-50% with parallel execution
- PDF Rendering: Reduce by 20-30% with incremental builds
- Analysis: Reduce by 40-60% with parallelization
# Profile entire pipeline
uv run python -m cProfile -o pipeline.prof scripts/execute_pipeline.py --project {name} --core-only
# Analyze profile
uv run python -m pstats pipeline.prof# Profile specific stage
uv run python -m cProfile -o stage.prof scripts/03_render_pdf.py# Memory profiler
uv run python -m memory_profiler scripts/execute_pipeline.py --project {name} --core-only# Enable pytest cache
uv run pytest tests/ --cache-clear # Clear cache
uv run pytest tests/ # Uses cache for faster runs- LaTeX intermediate files (
.aux,.bbl) are cached - Figure generation results cached in
projects/{name}/output/figures/ - Re-run only if source files changed
- Review results saved to
projects/{name}/output/llm/ - Re-use previous reviews if manuscript unchanged
- Clear cache:
rm -rf projects/{name}/output/llm/*
Some stages can run in parallel:
- Tests: Infrastructure and project tests (if independent)
- Analysis Scripts: Multiple scripts can run concurrently
- PDF Sections: Individual section PDFs can render in parallel
# Example: Parallel stage execution
from concurrent.futures import ThreadPoolExecutor
stages = [stage1, stage2, stage3] # Independent stages
with ThreadPoolExecutor(max_workers=3) as executor:
results = executor.map(run_stage, stages)Note: Parallel execution requires careful dependency management.
# Monitor pipeline execution
watch -n 1 'ps aux | grep -E "(python|pytest|xelatex)"'# Analyze pipeline logs for timing
grep "Completed in" projects/{name}/output/*.log | awk '{print $NF}'- Fast Iteration: Skip slow stages during development
- Selective Execution: Run only changed stages
- Caching: Enable all caching mechanisms
- Parallel Tests: Use pytest-xdist for test execution
- Full Pipeline: Run pipeline for final builds
- Performance Baseline: Establish performance benchmarks
- Monitoring: Track performance over time
- Optimization: Address bottlenecks systematically
Symptoms: Tests take >60 seconds
Solutions:
- Enable parallel execution:
pytest -n auto - Skip slow tests:
pytest -m "not slow" - Optimize test data generation
- Use test fixtures for expensive setup
Symptoms: PDF rendering takes >90 seconds
Solutions:
- Check LaTeX installation and version
- Use incremental compilation
- Optimize figure sizes
- Reduce number of figures
Symptoms: Pipeline runs out of memory
Solutions:
- Process data in chunks
- Clear large objects after use
- Use generators instead of lists
- Increase system memory
../../scripts/execute_pipeline.py- Performance tracking implementationinfrastructure/core/pipeline/stage_monitor.py- Stage timing and progress trackinginfrastructure/core/runtime/function_profiler.py- Function-level profiling utilities- Troubleshooting - Performance troubleshooting