Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement routable pod CIDRs in e2e tests #363

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

g-gaston
Copy link
Member

Description of changes

We now create an individual route per hybrid node on the subnet where the hybrid nodes live for the pod CIDR assigned to the node using the instance as the next hop.

In order for this work, it required swapping the peered VPC for a TGW. Not because the individual routes couldn't be added, but because the VPC networking would refuse to forward a packet through the peered connection with a dst IP that doesn't belong to the VPC'S CIDR blocks on the other side. This is a problem because the hybrid pod CIDR is not part of the second VPC. And it can't be if we want to be able to add the previously mentioned per-node routes.

This enables CP to pod communication or east-west pod to pod communication.

Testing (if applicable):
Running this in my stack.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@g-gaston g-gaston force-pushed the e2e-test-full-pod-networking branch 3 times, most recently from c90c265 to 08caebf Compare February 21, 2025 20:57
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of these are just moved from test/e2e/kubernetes/kubernetes.go
the only thing new are the last 2 methods

Copy link
Member

@jaxesn jaxesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably going to need new perms. otherwise, just a couple questions mainly

@@ -145,6 +145,20 @@ func (e *InstanceConfig) Create(ctx context.Context, ec2Client *ec2.Client, ssmC
}, nil
}

// DisableSourceDestCheck disables the source/destination check for the given instance.
func DisableSourceDestCheck(ctx context.Context, ec2Client *ec2.Client, instanceID string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so by default an ec2 instance blocks the packet if the src/dest is outside the vpc(?) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

You can enable or disable source/destination checks, which ensure that the instance is either the source or the destination of any traffic that it receives. Source/destination checks are enabled by default. You must disable source/destination checks if the instance runs services such as network address translation, routing, or firewalls.

@@ -224,6 +234,12 @@ func (c *NodeCleanup) Cleanup(ctx context.Context, instance ec2.Instance) error
c.Logger.Info("Skipping EC2 Instance deletion", "instanceID", instance.ID)
return nil
}

c.Logger.Info("Deleting routes for EC2 Instance", "instanceID", instance.ID, "subnetID", c.Cluster.SubnetID)
if err := ec2.DeleteRoutesForInstance(ctx, c.EC2, c.Cluster.SubnetID, instance.ID); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any issue to other tests, potentially, if this fails to cleanup or just in general?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

if the route is not deleted and we try to create another route for the same cidr, it will fail. So it's possible for:

  1. a node to be deleted
  2. its route not be deleted
  3. then another node is created
  4. cilium happens to assign the same pod cidr to this new node that it had assigned to the previous one
  5. when the test gets to create the vpc route, it will be duplicated and fail.

The alternative is to detect the duplicated route error in the test and delete it first.

I opted not to do it because that could cause another running test to fail if it happens that the other node hasn't been deleted yet. That code could be masking a different type of issue. Moreover, the test error would be very difficult to debug.

With the current code, the error will be obvious. If this starts being a problem, we can reconsider it.

We now create an individual route per hybrid node on the subnet where
the hybrid nodes live for the pod CIDR assigned to the node using the
instance as the next hop.

In order for this work, it required swapping the peered VPC for a TGW.
Not because the individual routes couldn't be added, but because the VPC
networking would refuse to forward a packet through the peered
connection with a dst IP that doesn't belong to the VPC'S CIDR blocks on
the other side. This is a problem because the hybrid pod CIDR is not
part of the second VPC. And it can't be if we want to be able to add
the previously mentioned per-node routes.
@junpengdev junpengdev force-pushed the e2e-test-full-pod-networking branch from 90b5514 to c7466dc Compare April 4, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants