CPD-11979: Replace SSH with Teleport (tsh ssh) for instance access#337
CPD-11979: Replace SSH with Teleport (tsh ssh) for instance access#337shrikantbhosaleacquia wants to merge 5 commits intoproxy-cli-hook-aws-v3from
Conversation
- Add TeleportConfig value object to compute proxy URL, account ID, hostname, and identity file from stack name and SSH user - Update SSHCommandBuilder to produce tsh ssh commands instead of ssh - Update Controller#ssh to pass TeleportConfig to SSHCommandBuilder - Update rotate_asg_instances SSH class to accept resources and build TeleportConfig from stack name at runtime - Fix rotate_asg_instances.rb to require individual aws-sdk gems - Add rexml to Gemfile (required by newer aws-sdk-core) - Update all affected specs for new command format and SSH constructor
There was a problem hiding this comment.
Pull request overview
This PR replaces direct SSH access with Teleport (tsh ssh) for connecting to AWS EC2 instances. The change introduces a new TeleportConfig class that determines the appropriate Teleport proxy URL, AWS account ID, and identity file based on the stack name, SSH user, and region. The SSHCommandBuilder has been updated to generate tsh ssh commands instead of traditional ssh commands, and all affected code including the rotate_asg_instances plugin has been updated accordingly.
Changes:
- Introduced TeleportConfig value object to compute Teleport connection parameters from stack name, SSH user, and AWS region
- Updated SSHCommandBuilder to generate tsh ssh commands with Teleport-specific flags instead of traditional ssh commands
- Modified rotate_asg_instances plugin to accept resources parameter and build TeleportConfig at runtime
- Replaced monolithic aws-sdk dependency with individual gems (aws-sdk-autoscaling, aws-sdk-ec2) in rotate_asg_instances plugin
- Added rexml gem to test dependencies (required by newer aws-sdk-core)
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/moonshot/teleport_config.rb | New class that encapsulates Teleport SSH configuration logic, determining proxy URLs, account IDs, and identity files based on environment detection |
| lib/moonshot/ssh_command_builder.rb | Refactored to accept TeleportConfig and build tsh ssh commands instead of traditional ssh commands |
| lib/moonshot/controller.rb | Updated ssh method to create TeleportConfig from stack name and region, passing it to SSHCommandBuilder |
| lib/plugins/rotate_asg_instances/ssh.rb | Updated to accept resources parameter in constructor and build TeleportConfig for each command |
| lib/plugins/rotate_asg_instances/asg.rb | Updated to pass resources when initializing SSH class |
| lib/plugins/rotate_asg_instances.rb | Replaced monolithic aws-sdk require with specific gem requires |
| Gemfile | Added rexml gem to test dependencies |
| spec/moonshot/teleport_config_spec.rb | Comprehensive test coverage for TeleportConfig including prod/dev environments and bot users |
| spec/moonshot/ssh_spec.rb | Updated tests to expect tsh ssh commands and Teleport hostnames |
| spec/moonshot/plugins/rotate_asg_instances/ssh_spec.rb | Updated to pass resources parameter to SSH constructor |
| spec/moonshot/plugins/rotate_asg_instances/asg_spec.rb | Updated shutdown_instance tests to expect tsh ssh commands with Teleport parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/moonshot/ssh_command_builder.rb
Outdated
| cmd << "-l #{@config.ssh_user}" if @config.ssh_user | ||
| cmd << instance_ip | ||
| cmd << "--proxy=#{@teleport_config.proxy_url}" | ||
| cmd << "-ti #{@teleport_config.identity_file}" if @teleport_config.bot_user? |
There was a problem hiding this comment.
The identity file flag syntax is incorrect. The code constructs a string like '-ti tbot-auth-us-east-1/identity' which is then added as a single command argument. This means the shell will interpret this as a single token '-ti tbot-auth-us-east-1/identity', not as flags '-t' and '-i' with an argument. The correct approach is either: (1) add them as separate array elements before joining: cmd << '-t' << '-i' << @teleport_config.identity_file, or (2) use tsh's long-form option: cmd << "--identity-file=#{@teleport_config.identity_file}" << '-t'. Note that '-tA' on the next line already includes '-t', so you may only need '-i' plus the identity file for bot users.
| cmd << "-ti #{@teleport_config.identity_file}" if @teleport_config.bot_user? | |
| cmd << "--identity-file=#{@teleport_config.identity_file}" if @teleport_config.bot_user? |
| # Encapsulates Teleport SSH configuration derived from the stack name, | ||
| # SSH user, and AWS region. Determines the correct proxy URL, account ID, | ||
| # and identity file path for both normal users and bot users. | ||
| class TeleportConfig |
There was a problem hiding this comment.
The class documentation states it 'Encapsulates Teleport SSH configuration' but doesn't explain the key decision logic: how prod vs dev environments are determined (by checking if 'prod' appears in the stack name) or what account IDs these constants represent. Consider adding more detailed documentation explaining the environment detection logic, the meaning of the hardcoded account IDs, and example stack names that would be classified as prod vs dev.
| teleport_config = TeleportConfig.new( | ||
| stack.name, @config.ssh_config.ssh_user, ENV['AWS_REGION'] |
There was a problem hiding this comment.
The AWS_REGION environment variable is accessed without validation. If ENV['AWS_REGION'] is nil or empty, this will create a TeleportConfig with an empty string for the region parameter (due to .to_s conversion), which could lead to incorrect proxy URLs in production environments (e.g., ".teleport.cloudservices.acquia.io") and malformed hostnames. Consider adding validation to raise a clear error message if AWS_REGION is not set, similar to how CodeDeploySetup handles this case.
| teleport_config = TeleportConfig.new( | |
| stack.name, @config.ssh_config.ssh_user, ENV['AWS_REGION'] | |
| region = ENV['AWS_REGION'].to_s.strip | |
| if region.empty? | |
| raise 'AWS_REGION environment variable must be set (for example, "us-east-1") to establish a Teleport SSH connection.' | |
| end | |
| teleport_config = TeleportConfig.new( | |
| stack.name, @config.ssh_config.ssh_user, region |
| teleport_config = Moonshot::TeleportConfig.new( | ||
| @resources.controller.stack.name, | ||
| Moonshot.config.ssh_config.ssh_user, | ||
| ENV['AWS_REGION'] |
There was a problem hiding this comment.
The AWS_REGION environment variable is accessed without validation. If ENV['AWS_REGION'] is nil or empty, this will create a TeleportConfig with an empty string for the region parameter (due to .to_s conversion), which could lead to incorrect proxy URLs in production environments (e.g., ".teleport.cloudservices.acquia.io") and malformed hostnames. Consider adding validation to raise a clear error message if AWS_REGION is not set, similar to how CodeDeploySetup handles this case.
| teleport_config = Moonshot::TeleportConfig.new( | |
| @resources.controller.stack.name, | |
| Moonshot.config.ssh_config.ssh_user, | |
| ENV['AWS_REGION'] | |
| region = ENV['AWS_REGION'] | |
| if region.nil? || region.empty? | |
| raise 'AWS_REGION environment variable must be set for SSH teleport configuration' | |
| end | |
| teleport_config = Moonshot::TeleportConfig.new( | |
| @resources.controller.stack.name, | |
| Moonshot.config.ssh_config.ssh_user, | |
| region |
| teleport_config = TeleportConfig.new( | ||
| stack.name, @config.ssh_config.ssh_user, ENV['AWS_REGION'] |
There was a problem hiding this comment.
If @config.ssh_config.ssh_user is nil (for example, if neither the environment variable MOONSHOT_SSH_USER nor a configuration value is set), it will be converted to an empty string by TeleportConfig. This will result in an invalid Teleport hostname format like '@i-xxx.region.account' (missing username) and bot_user? will always return false (since '' != 'clouddatabot'). Consider validating that ssh_user is set before constructing the TeleportConfig, or add validation within TeleportConfig.initialize.
| teleport_config = TeleportConfig.new( | |
| stack.name, @config.ssh_config.ssh_user, ENV['AWS_REGION'] | |
| ssh_user = @config.ssh_config.ssh_user | |
| if ssh_user.nil? || ssh_user.to_s.empty? | |
| raise 'SSH user is not configured. Please set MOONSHOT_SSH_USER or configure ssh_user in your Moonshot configuration.' | |
| end | |
| teleport_config = TeleportConfig.new( | |
| stack.name, ssh_user, ENV['AWS_REGION'] |
| private | ||
|
|
||
| def prod? | ||
| @stack_name.include?('prod') |
There was a problem hiding this comment.
The prod? method uses a simple string inclusion check (include?('prod')) to determine the environment. This could lead to false positives if 'prod' appears anywhere in the stack name, not just as the environment identifier. For example, a stack named 'product-dev' or 'reproduce-test' would incorrectly be identified as production. Consider using a more robust pattern match or explicit environment detection.
| @stack_name.include?('prod') | |
| @stack_name.match?(/\bprod\b/) |
|
|
||
| attr_reader :proxy_url, :account_id, :region, :ssh_user | ||
|
|
||
| def initialize(stack_name, ssh_user, region) |
There was a problem hiding this comment.
The initialize method converts nil parameters to empty strings via .to_s without validation. This leads to multiple issues: (1) empty region produces malformed URLs like '.teleport.cloudservices.acquia.io' and invalid identity paths like 'tbot-auth-/identity', (2) empty ssh_user produces invalid Teleport hostnames like '@instance.region.account', (3) empty stack_name will cause incorrect environment detection (will default to dev since 'prod' won't be found). Add validation to ensure all three parameters are present and non-empty before proceeding.
| def initialize(stack_name, ssh_user, region) | |
| def initialize(stack_name, ssh_user, region) | |
| missing_params = [] | |
| missing_params << 'stack_name' if stack_name.to_s.strip.empty? | |
| missing_params << 'ssh_user' if ssh_user.to_s.strip.empty? | |
| missing_params << 'region' if region.to_s.strip.empty? | |
| unless missing_params.empty? | |
| raise ArgumentError, "TeleportConfig requires non-empty: #{missing_params.join(', ')}" | |
| end |
lib/moonshot/ssh_command_builder.rb
Outdated
| cmd << "-l #{@config.ssh_user}" if @config.ssh_user | ||
| cmd << instance_ip | ||
| cmd << "--proxy=#{@teleport_config.proxy_url}" | ||
| cmd << "-ti #{@teleport_config.identity_file}" if @teleport_config.bot_user? |
There was a problem hiding this comment.
There is no test coverage for the bot user path in SSHCommandBuilder. When bot_user? returns true, the command includes '-ti' followed by the identity file path, but this code path is not tested. Consider adding test cases that verify the correct command format when using bot users (clouddatabot) to ensure the identity file flag is properly included in the tsh ssh command.
| cmd << "-ti #{@teleport_config.identity_file}" if @teleport_config.bot_user? | |
| cmd << "-ti #{@teleport_config.identity_file}" if @teleport_config.identity_file |
| def initialize(ssh_config, instance_id) | ||
| @config = ssh_config | ||
| @instance_id = instance_id | ||
| def initialize(ssh_config, instance_id, teleport_config) |
There was a problem hiding this comment.
The SSHCommandBuilder constructor signature has changed from 2 parameters (ssh_config, instance_id) to 3 parameters (ssh_config, instance_id, teleport_config). This is a breaking API change that could affect any external code or plugins that construct SSHCommandBuilder instances directly. While the code in this PR has been updated accordingly, consider whether this warrants a major version bump or additional documentation about the breaking change.
| PROD_ACCOUNT_ID = '546349603759' | ||
| DEV_ACCOUNT_ID = '672327909798' |
There was a problem hiding this comment.
Hardcoded AWS account IDs (PROD_ACCOUNT_ID and DEV_ACCOUNT_ID) are exposed in the code. While account IDs are generally not considered highly sensitive secrets on their own, hardcoding them makes it difficult to support additional environments or accounts without code changes. Consider whether these should be configurable via environment variables or configuration files to improve flexibility and follow the principle of configuration over hardcoding.
| PROD_ACCOUNT_ID = '546349603759' | |
| DEV_ACCOUNT_ID = '672327909798' | |
| PROD_ACCOUNT_ID = ENV['TELEPORT_PROD_ACCOUNT_ID'] || '546349603759' | |
| DEV_ACCOUNT_ID = ENV['TELEPORT_DEV_ACCOUNT_ID'] || '672327909798' |
Change: minor Purpose: feature
Change: minor Purpose: feature
Change: minor Purpose: feature
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def initialize(ssh_config, instance_id, teleport_config) | ||
| @config = ssh_config | ||
| @instance_id = instance_id | ||
| @teleport_config = teleport_config | ||
| end |
There was a problem hiding this comment.
SSHCommandBuilder's initializer now requires a teleport_config, but there is still at least one in-repo caller using the old 2-arg form (lib/moonshot/tools/asg_rollout/hook_exec_environment.rb:16). This will raise ArgumentError at runtime; either update that caller to construct/pass a TeleportConfig, or make teleport_config optional with a backwards-compatible fallback.
| group :test do | ||
| gem 'codeclimate-test-reporter' | ||
| gem 'pry' | ||
| gem 'rexml' | ||
| gem 'rubocop' | ||
| end |
There was a problem hiding this comment.
rexml is added only in the :test group, but the PR description indicates it's required by newer aws-sdk-core (a runtime dependency). If this gem is needed at runtime, it should be added as a normal dependency (e.g., in the gemspec) or outside the :test group so non-test installs don't break.
| PROD_IDENTITY_TEMPLATE = '/opt/machine-id/%<region>s/identity' | ||
| DEV_IDENTITY = '/opt/machine-id/dev-us-east-1/identity' |
There was a problem hiding this comment.
TeleportConfig's identity file constants use '/opt/machine-id/...', but the new spec expects 'tbot-auth-.../identity' paths for the bot user. As written, TeleportConfig#identity_file will not match the spec (and likely the intended runtime path). Please align the constants/logic and the spec so they agree on the actual identity file location.
| PROD_IDENTITY_TEMPLATE = '/opt/machine-id/%<region>s/identity' | |
| DEV_IDENTITY = '/opt/machine-id/dev-us-east-1/identity' | |
| PROD_IDENTITY_TEMPLATE = '/opt/tbot-auth-%<region>s/identity' | |
| DEV_IDENTITY = '/opt/tbot-auth-dev-us-east-1/identity' |
| def initialize(stack_name, ssh_user, region) | ||
| @stack_name = stack_name.to_s | ||
| @ssh_user = ssh_user.to_s | ||
| @region = region.to_s | ||
|
|
There was a problem hiding this comment.
TeleportConfig coerces a nil region to an empty string via region.to_s, which can silently build an invalid proxy URL/hostname when AWS_REGION is unset. Previously this would fail earlier via AWS SDK missing-region behavior; consider validating region is present/non-empty and raising an ArgumentError when it isn't.
| cmd = ['tsh', 'ssh'] | ||
| cmd << @config.ssh_options if @config.ssh_options | ||
| cmd << "-i #{@config.ssh_identity_file}" if @config.ssh_identity_file | ||
| cmd << "-l #{@config.ssh_user}" if @config.ssh_user | ||
| cmd << instance_ip | ||
| cmd << "--proxy=#{@teleport_config.proxy_url}" | ||
| cmd << "-i #{@teleport_config.identity_file}" if @teleport_config.bot_user? | ||
| cmd << "#{@teleport_config.ssh_user}@#{instance_host}" |
There was a problem hiding this comment.
SSHCommandBuilder no longer adds any default TTY/agent-forwarding flags, but the updated specs expect -tA to always be present (similar to the old ssh -t behavior). If -tA is required for Teleport access, add it to the generated command (in the same position/order the specs expect) or ensure it is set consistently via configuration.
Change: minor Purpose: feature
Motivation
Fixes #NNN
Proposed changes
Alternatives considered
Testing steps