Skip to content

Commit 33582e5

Browse files
edmundmillerclaude
andauthored
feat: optimize S3 lifecycle management and modernize infrastructure configuration (#170)
* feat: add S3 lifecycle management for cost optimization and automated cleanup This commit implements comprehensive S3 lifecycle rules for the nf-core-awsmegatests bucket to optimize storage costs and automatically clean up temporary workflow files. Changes: - Add create_s3_lifecycle_configuration() function with 4 lifecycle rules - Rule 1: Preserve metadata files with cost optimization (IA after 30 days, Glacier after 90 days) - Rule 2: Clean up temporary files after 30 days (based on nextflow.io/temporary tag) - Rule 3: Clean up work directory after 90 days (prefix-based cleanup) - Rule 4: Clean up incomplete multipart uploads after 7 days The implementation includes proper error handling to gracefully fall back to manual management if AWS permissions are insufficient. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: remove all try/except statements to let Pulumi handle errors natively This commit removes all manual try/except error handling blocks throughout the codebase, allowing Pulumi to handle errors using its built-in error management system. Changes: - Remove try/except from S3 lifecycle configuration creation - Remove try/except from Seqera compute environment deployment - Remove try/except from GitHub integration resource creation - Remove try/except from Seqera provider initialization - Remove try/except from TowerForge credential upload - Remove try/except from configuration file loading - Remove try/except from workspace ID validation - Remove unused validate_environment() function - Simplify numeric validation logic in settings Pulumi's native error handling provides better diagnostics and stack traces than custom exception wrapping. This simplifies the codebase while maintaining robust error reporting. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: optimize S3 bucket lifecycle and enable CORS for Seqera Data Explorer - Update work directory cleanup from 90 days to 14 days for aggressive cost optimization - Add lifecycle rules for scratch/ (7 days) and cache/ directories (30 days) - Enable CORS configuration for Seqera Data Explorer compatibility - Add one-time batch job script for tagging existing log files - Preserve tagged log files while aggressively cleaning untagged work files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: adjust S3 lifecycle cleanup to be less aggressive - Update work directory cleanup from 14 days to 30 days - Update scratch directory cleanup from 7 days to 30 days - Maintain 30-day cleanup for cache directories - Keep tagged log files preserved for 90 days with storage transitions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: externalize nextflow configuration to separate files - Move embedded nextflowConfig strings from JSON to external .config files - Create modular nextflow configs with base + environment-specific settings - Add load_nextflow_config() function to read external config files - Update compute environment creation to use external nextflow configs - Clean JSON files by removing embedded nextflowConfig fields - Improve maintainability and readability of nextflow configurations Config structure: - nextflow-base.config: Common settings (AWS Batch, error handling, fusion tags) - nextflow-cpu.config: CPU-specific settings (x86_64, CPU tags) - nextflow-gpu.config: GPU-specific settings (x86_64, GPU tags) - nextflow-arm.config: ARM-specific settings (arm64, ARM tags) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: update Seqera Terraform provider version to 0.25.2 - Update provider version from 0.13.0 to 0.25.2 - Attempt to resolve pulumi_seqera module import issues - Provider configuration ready but SDK generation still needs resolution 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: generate Seqera Terraform provider SDK and verify infrastructure changes - Generate pulumi-seqera SDK from terraform-provider registry.terraform.io/seqeralabs/seqera - Add pulumi-seqera dependency with local SDK path configuration - Update package dependencies and lock file with generated SDK - Verify infrastructure changes work correctly with pulumi preview: * S3 CORS configuration for Seqera Data Explorer ✅ * S3 lifecycle optimization with 30-day cleanup ✅ * External nextflow config files integration ✅ * Compute environment replacements with updated configs ✅ All infrastructure optimizations tested and ready for deployment. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: correct Seqera provider version to 0.25.2 - Fix provider version that was incorrectly reverted to 0.13.0 - Ensure we're using the latest Seqera provider version 0.25.2 as intended - SDK generation accidentally used older version, now corrected 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: correct S3 CORS configuration to comply with AWS and Seqera requirements - Remove unsupported x-amz-meta-* wildcard from ExposeHeaders - Simplify ExposeHeaders to only include ETag per Seqera documentation - Update documentation link to correct Seqera Data Explorer CORS guide 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: add missing EC2 permissions to TowerForge IAM policy Add ec2:DescribeAccountAttributes, ec2:DescribeLaunchTemplateVersions, and ec2:DescribeInstanceTypeOfferings permissions to align with the official Seqera forge policy requirements. This resolves 403 forbidden errors when Seqera Platform attempts to describe AWS account attributes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * test: add IAM policy compliance validation against Seqera reference Add comprehensive unit tests to validate TowerForge IAM policy includes all required permissions from the official Seqera forge policy. The test: - Fetches reference policy from seqeralabs/nf-tower-aws repository - Compares our policy permissions against the reference - Validates critical EC2 permissions are present - Ensures proper policy structure Also adds missing EFS permissions (elasticfilesystem:*) and iam:GetRole permission that were in the reference policy but missing from ours. Includes TODO comment for implementing Pulumi CrossGuard policy validation for automated compliance checking at deployment time. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: add explicit compute environment dependencies and IAM policy change detection Implement proper dependency management and prevent authorization errors by: - Add explicit dependencies between IAM resources → Seqera credentials → compute environments - Generate IAM policy hash to force compute environment recreation on policy changes - Embed policy version hash in CE descriptions to trigger replacement when policies update - Pass Seqera credential resource for explicit dependency tracking This ensures compute environments are always created with fully propagated IAM permissions, preventing "not authorized" errors during resource creation when policies are updated. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: implement Python file injection for Nextflow config merging Replace includeConfig statements with programmatic config merging to resolve Seqera Platform compatibility issues. Add comprehensive test suite with 10 test cases. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent fc6acda commit 33582e5

25 files changed

+1639
-278
lines changed

pulumi/AWSMegatests/Pulumi.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@ config:
99
# GitHub Provider Configuration
1010
github:owner:
1111
value: nf-core
12-
# Note: AWS and other tokens are provided via ESC environment
13-
# AWS credentials should come from ESC OIDC integration
12+
# Note: AWS and other tokens are provided via ESC environment
13+
# AWS credentials should come from ESC OIDC integration
1414
packages:
1515
seqera:
1616
source: terraform-provider
17-
version: 0.13.0
17+
version: 0.25.2
1818
parameters:
1919
- registry.terraform.io/seqeralabs/seqera

pulumi/AWSMegatests/__main__.py

Lines changed: 29 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -38,78 +38,41 @@ def main():
3838
# Note: lifecycle_configuration is managed manually, not used in exports
3939

4040
# Step 5: Create TowerForge IAM credentials and upload to Seqera Platform
41-
towerforge_access_key_id, towerforge_access_key_secret, seqera_credentials_id = (
42-
create_towerforge_credentials(
43-
aws_provider,
44-
nf_core_awsmegatests_bucket,
45-
seqera_provider,
46-
float(config["tower_workspace_id"]),
47-
)
41+
(
42+
towerforge_access_key_id,
43+
towerforge_access_key_secret,
44+
seqera_credentials_id,
45+
seqera_credential_resource,
46+
iam_policy_hash,
47+
) = create_towerforge_credentials(
48+
aws_provider,
49+
nf_core_awsmegatests_bucket,
50+
seqera_provider,
51+
float(config["tower_workspace_id"]),
4852
)
4953

5054
# Step 6: Deploy Seqera Platform compute environments using Terraform provider
51-
try:
52-
pulumi.log.info(
53-
"Deploying Seqera compute environments using Terraform provider"
54-
)
55-
56-
# Deploy using Seqera Terraform provider with dynamic credentials ID
57-
terraform_resources = deploy_seqera_environments_terraform(
58-
config,
59-
seqera_credentials_id, # Dynamic TowerForge credentials ID from Seqera Platform
60-
seqera_provider, # Reuse existing Seqera provider
61-
)
62-
63-
# Get compute environment IDs from Terraform provider
64-
compute_env_ids = get_compute_environment_ids_terraform(terraform_resources)
65-
deployment_method = "terraform-provider"
66-
67-
pulumi.log.info(
68-
"Successfully deployed compute environments using Seqera Terraform provider"
69-
)
70-
except Exception as e:
71-
error_msg = (
72-
f"Seqera deployment failed: {e}. "
73-
"Common solutions: "
74-
"1. Verify TOWER_ACCESS_TOKEN has WORKSPACE_ADMIN permissions "
75-
"2. Check workspace ID is correct in ESC environment "
76-
"3. Ensure TowerForge credentials were successfully uploaded to Seqera Platform "
77-
"4. Verify network connectivity to api.cloud.seqera.io"
78-
)
79-
pulumi.log.error(error_msg)
80-
raise RuntimeError(error_msg)
55+
# Deploy using Seqera Terraform provider with dynamic credentials ID
56+
terraform_resources = deploy_seqera_environments_terraform(
57+
config,
58+
seqera_credentials_id, # Dynamic TowerForge credentials ID from Seqera Platform
59+
seqera_provider, # Reuse existing Seqera provider
60+
seqera_credential_resource, # Seqera credential resource for dependency
61+
iam_policy_hash, # IAM policy hash to force CE recreation on policy changes
62+
)
63+
64+
# Get compute environment IDs from Terraform provider
65+
compute_env_ids = get_compute_environment_ids_terraform(terraform_resources)
66+
deployment_method = "terraform-provider"
8167

8268
# Step 8: Create GitHub resources
8369
# Full GitHub integration enabled - creates both variables and secrets
84-
try:
85-
pulumi.log.info("Creating GitHub organization variables and secrets")
86-
87-
github_resources = create_github_resources(
88-
github_provider,
89-
compute_env_ids,
90-
config["tower_workspace_id"],
91-
tower_access_token=config["tower_access_token"],
92-
)
93-
94-
pulumi.log.info(
95-
"Successfully created GitHub variables. Manual secret commands available in outputs."
96-
)
97-
except Exception as e:
98-
error_msg = (
99-
f"GitHub integration failed: {e}. "
100-
"This is often harmless if variables already exist (409 errors). "
101-
"Common issues: "
102-
"1. GitHub token lacks org-level permissions "
103-
"2. Variables already exist (409 Already Exists - harmless) "
104-
"3. Network connectivity to api.github.com"
105-
)
106-
pulumi.log.warn(error_msg)
107-
github_resources = {
108-
"variables": {},
109-
"secrets": {},
110-
"gh_cli_commands": [],
111-
"note": f"GitHub integration failed: {e}",
112-
}
70+
github_resources = create_github_resources(
71+
github_provider,
72+
compute_env_ids,
73+
config["tower_workspace_id"],
74+
tower_access_token=config["tower_access_token"],
75+
)
11376

11477
# Exports - All within proper Pulumi program context
11578
pulumi.export(

pulumi/AWSMegatests/pyproject.toml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@ dependencies = [
1212
"pulumi-seqera",
1313
]
1414

15-
[tool.uv.sources]
16-
pulumi-seqera = { path = "sdks/seqera" }
17-
18-
1915
[dependency-groups]
2016
dev = [
2117
"mypy>=1.17.1",
18+
"pytest>=7.0.0",
19+
"requests>=2.28.0",
2220
]
21+
22+
[tool.uv.sources]
23+
pulumi-seqera = { path = "sdks/seqera" }
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# AWS Megatests Utility Scripts
2+
3+
This directory contains utility scripts for AWS Megatests infrastructure management.
4+
5+
## Log File Tagging Script
6+
7+
**Purpose**: One-time batch job to tag existing log files in S3 work directories for preservation during lifecycle cleanup.
8+
9+
### Usage
10+
11+
```bash
12+
# Install dependencies
13+
pip install -r requirements.txt
14+
15+
# Dry run to see what would be tagged
16+
python tag_existing_log_files.py --bucket nf-core-awsmegatests --dry-run
17+
18+
# Actually tag the files
19+
python tag_existing_log_files.py --bucket nf-core-awsmegatests
20+
21+
# With custom threading (default is 10 workers)
22+
python tag_existing_log_files.py --bucket nf-core-awsmegatests --max-workers 20
23+
```
24+
25+
### Prerequisites
26+
27+
- AWS credentials configured (AWS CLI, IAM role, or environment variables)
28+
- S3 permissions: `s3:ListBucket`, `s3:GetObjectTagging`, `s3:PutObjectTagging`
29+
30+
### What it does
31+
32+
1. Scans the `work/` directory for log files matching Nextflow patterns
33+
2. Tags log files with `nextflow.io/metadata=true`
34+
3. Preserves existing tags while adding the metadata tag
35+
4. Uses multi-threading for performance with large buckets
36+
37+
### Log File Patterns
38+
39+
The script identifies these Nextflow log files:
40+
41+
- `.command.log` - Main command log
42+
- `.command.err` - Error log
43+
- `.command.out` - Standard output
44+
- `.exitcode` - Exit code file
45+
- `.command.sh` - Command script
46+
- `.command.run` - Run script
47+
- `.command.begin` - Begin timestamp
48+
- `trace.txt` - Trace file
49+
- `timeline.html` - Timeline report
50+
- `report.html` - Execution report
51+
- `dag.html` - DAG visualization
52+
53+
### Integration with Lifecycle Rules
54+
55+
Tagged files are preserved by S3 lifecycle rules:
56+
57+
- Tagged log files: Kept for 90 days, then moved to cheaper storage classes
58+
- Untagged work files: Deleted after 14 days for aggressive cleanup
59+
- Future log files will be tagged automatically by Nextflow (no need to run this script again)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Requirements for log file tagging script
2+
boto3>=1.26.0
3+
botocore>=1.29.0

0 commit comments

Comments
 (0)