Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support connection initiation direction in adv_forward_bytes metric #1426

Open
mmckeen opened this issue Mar 12, 2025 · 17 comments
Open

Support connection initiation direction in adv_forward_bytes metric #1426

mmckeen opened this issue Mar 12, 2025 · 17 comments

Comments

@mmckeen
Copy link

mmckeen commented Mar 12, 2025

Is your feature request related to a problem? Please describe.

We are attempting to use Retina to distinguish bytes ingressed/egressed through NAT gateways (for pods deployed on a private network) from bytes ingressed/egressed through NLBs (via LoadBalancer Services).

Currently the packetparser direction label appears to be based on the direction of the packet in relation to the container rather than the direction of the connection (e.g. EGRESS for connections initiated from the container and INGRESS for connections initiated towards the container.

Describe the solution you'd like

We'd like to see if it would be possible to expose the connection direction, potentially as a separate label connection_direction or a separate metric adv_connection_bytes with a different definition of the direction label (with rx/tx metrics).

@nddq
Copy link
Contributor

nddq commented Mar 13, 2025

Hi @mmckeen , just to clarify, direction here meaning the traffic_direction? We have been using conntrack for packetparser for a while now, in which the traffic_direction is set once in a connection's lifetime based on where we observed the first packet of the connection. So, if we observed the packet is leaving the pod/host then it will be EGRESS and vice versa. You can see the implementation details here

If possible, could you provide a screenshot of the metrics showing that the direction is being set wrong? Thanks!

@mmckeen
Copy link
Author

mmckeen commented Mar 13, 2025

Hm 🤔 indeed it does look like we finally use the conntrack direction for the metric https://github.com/microsoft/retina/blob/main/pkg/plugin/packetparser/packetparser_linux.go#L603.

Let me try and find some example data to show the behavior I'm seeing 🙇.

@mmckeen
Copy link
Author

mmckeen commented Mar 13, 2025

I'm trying to understand this flow, this should be a Prometheus instance initiating a connection over a NAT GW to a public IP.

The highlighted main metric makes sense, I believe this is capturing the TX traffic from the pod to the remote IP.

What I'm having difficulty capturing is the RX traffic. It appears to be getting various directions and not capturing the proper pod destination due to masquerade applied to the outgoing packet incoming back on the node ip (10.128.70.70 in the outgoing pod's case, there are several other nodes also receiving traffic from 3.17.54.38 showing up as well).

There's a similar problem in the case of connections established from NLBs to the pod, that time in reverse (and also complicated by externalTrafficPolicy: Cluster).

Image

@mmckeen
Copy link
Author

mmckeen commented Mar 13, 2025

Here's an example of a request through an NLB into a LoadBalancer Service with externalTrafficPolicy: Local.

We observe the node IP as the destination of the INGRESS instead of the pod IP and the response is classified as EGRESS incorrectly.

Image

@nddq
Copy link
Contributor

nddq commented Mar 13, 2025

So this is the point where I would highly recommend you to try out Retina with Hubble control plane. It won't provide you with the metrics similar to adv_forward_bytes (although we are working on something similar here with the conntrack metrics), but it will give you the ability to view the connection flow logs via hubble observe via the Hubble CLI, which will help us make more sense of what we are seeing here.

Returning to the issue at hand, Retina installs the packetparser BPF programs at four locations by default: the pod's veth (both ingress and egress sides) and the host's eth0 (also both ingress and egress sides). This means that an outgoing packet from a specific pod will be observed at two locations as it travels to a public destination: first, when it moves from the pod network namespace to the host network namespace via the veth, and second, when it exits the host via eth0. At this point, we should generate two flow events, a.k.a two metric datapoints. Given that the source IP in the metrics was replaced with the node's IP, it suggests that some NAT might be involved within the host. Regarding the variation in direction that you're seeing, I expect the direction to be consistent with respect to a single connection. Could it be possible that the host at the public IP is initiating other connections towards the host where the Prometheus instance is running? Regardless, I'm keen on resolving this issue to smooth out the kinks in conntrack. Feel free to join our office hours if you'd like to discuss it further 🙂

@mmckeen
Copy link
Author

mmckeen commented Mar 14, 2025

I'll gather some Flows with the Hubble control plane so we can debug further 🙇.

@mmckeen
Copy link
Author

mmckeen commented Mar 14, 2025

Here are some results from an alternate cluster which doesn't do SNAT on the host for egress.

Image

@nddq
Copy link
Contributor

nddq commented Mar 17, 2025

#1417 this might be related to your issue, but in the meantime, it is possible at all for you to get a sample Hubble flow logs?

@mmckeen
Copy link
Author

mmckeen commented Mar 18, 2025

Okay, I found some Hubble flows for the SNAT enabled cluster

hubble observe -f --to-ip 3.17.54.38

Mar 18 22:42:08.236: platform/prometheus-shard-0-1:60280 (ID:25638) -> 3.17.54.38:443 (world) to-stack FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:42:08.236: platform/prometheus-shard-0-1:60280 (ID:25638) -> 3.17.54.38:443 (world) to-stack FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:42:08.236: platform/prometheus-shard-0-1:60280 (ID:25638) -> 3.17.54.38:443 (world) to-stack FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:42:08.245: 3.17.54.38:443 (world) <- platform/prometheus-shard-0-1:60280 (ID:25638) to-stack FORWARDED (TCP Flags: ACK:true)
Mar 18 22:42:08.245: 3.17.54.38:443 (world) <- platform/prometheus-shard-0-1:60280 (ID:25638) to-stack FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:42:08.245: 3.17.54.38:443 (world) <- platform/prometheus-shard-0-1:60280 (ID:25638) to-stack FORWARDED (TCP Flags: FIN:true  ACK:true)

hubble observe -f --from-ip 3.17.54.38

Mar 18 22:46:08.156: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: SYN:true  ACK:true)
Mar 18 22:46:08.157: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: ACK:true)
Mar 18 22:46:08.159: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:46:37.910: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:46:38.127: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: FIN:true  ACK:true)
Mar 18 22:46:38.128: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: ACK:true)

The second set doesn't make much sense, there shouldn't be connections initiating from the public IP.

@mmckeen
Copy link
Author

mmckeen commented Mar 18, 2025

I can do the same for the non-SNAT cluster if it would be useful.

This is just focused on internet egress originating from the pod, right now that's the most important use case I'm looking to solve.

@mmckeen
Copy link
Author

mmckeen commented Mar 19, 2025

It appears like Hubble might handle the SNAT properly via https://github.com/cilium/cilium/blob/4912f7a79eabc8e7bd3eec5a0364cde15fe87ec5/pkg/hubble/parser/threefour/parser.go#L177, but we don't provide this info in the Flow?

@nddq
Copy link
Contributor

nddq commented Mar 19, 2025

Okay, I found some Hubble flows for the SNAT enabled cluster

hubble observe -f --to-ip 3.17.54.38

Mar 18 22:42:08.236: platform/prometheus-shard-0-1:60280 (ID:25638) -> 3.17.54.38:443 (world) to-stack FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:42:08.236: platform/prometheus-shard-0-1:60280 (ID:25638) -> 3.17.54.38:443 (world) to-stack FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:42:08.236: platform/prometheus-shard-0-1:60280 (ID:25638) -> 3.17.54.38:443 (world) to-stack FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:42:08.245: 3.17.54.38:443 (world) <- platform/prometheus-shard-0-1:60280 (ID:25638) to-stack FORWARDED (TCP Flags: ACK:true)
Mar 18 22:42:08.245: 3.17.54.38:443 (world) <- platform/prometheus-shard-0-1:60280 (ID:25638) to-stack FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:42:08.245: 3.17.54.38:443 (world) <- platform/prometheus-shard-0-1:60280 (ID:25638) to-stack FORWARDED (TCP Flags: FIN:true  ACK:true)

hubble observe -f --from-ip 3.17.54.38

Mar 18 22:46:08.156: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: SYN:true  ACK:true)
Mar 18 22:46:08.157: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: ACK:true)
Mar 18 22:46:08.159: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:46:37.910: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: PSH:true  ACK:true)
Mar 18 22:46:38.127: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: FIN:true  ACK:true)
Mar 18 22:46:38.128: platform/prometheus-shard-0-1:55376 (ID:25638) <- 3.17.54.38:443 (world) to-endpoint FORWARDED (TCP Flags: ACK:true)

The second set doesn't make much sense, there shouldn't be connections initiating from the public IP.

The second set of flows has a SYN-ACK, meaning that this is a reply packet, so it does align with the fact that the connection was initiated from the pod to the public IP don't you think? For a TCP connection, we would have sent flows (client -> server) and reply flows (server -> client), so when we have the filter --from-ip 3.17.54.38, we are seeing the reply flows from the public IP. One thing that has helped me when working on/debugging Hubble traffic flows is that the one of the left will always be the initiator of the connection 🙂

w.r.t your first set of flows, looks like it is being affected by the bug i mentioned previously, which I've had a fix opened here #1438

Going back to the original issue, we are interested in the traffic direction, so can you add the flag --output json to get the full flow information with the traffic_direction field?

@mmckeen
Copy link
Author

mmckeen commented Mar 19, 2025

When using JSON output it appears like I'm running into #1080, not sure if there's a work around.

@nddq
Copy link
Contributor

nddq commented Mar 19, 2025

oh seems like we haven't implemented the MarshalJSON interface for RetinaMetadata, let me open a fix item for it

@mmckeen
Copy link
Author

mmckeen commented Apr 2, 2025

I tested with your fix in #1438 and things are looking good!

I'm gonna also test with a fix for MarshalJSON so we can see the traffic_direction.

@nddq
Copy link
Contributor

nddq commented Apr 3, 2025

That's good to hear! Regarding the marshal json issue, I did some preliminary investigation, and it doesn't seem like a simple fix, so this will take some more time 😕

@mmckeen
Copy link
Author

mmckeen commented Apr 8, 2025

As far as this issue is concerned the fix in #1438 resolves any concerns I have with the connection direction tracking.

What remains as a nice-to-have is to expose is_reply as a label on the existing adv_forward_bytes metrics.

This would make it a lot easier to reason about the direction of the traffic.

What do you think about that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants