-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support connection initiation direction in adv_forward_bytes
metric
#1426
Comments
Hi @mmckeen , just to clarify, If possible, could you provide a screenshot of the metrics showing that the direction is being set wrong? Thanks! |
Hm 🤔 indeed it does look like we finally use the Let me try and find some example data to show the behavior I'm seeing 🙇. |
So this is the point where I would highly recommend you to try out Retina with Hubble control plane. It won't provide you with the metrics similar to Returning to the issue at hand, Retina installs the packetparser BPF programs at four locations by default: the pod's veth (both ingress and egress sides) and the host's eth0 (also both ingress and egress sides). This means that an outgoing packet from a specific pod will be observed at two locations as it travels to a public destination: first, when it moves from the pod network namespace to the host network namespace via the veth, and second, when it exits the host via eth0. At this point, we should generate two flow events, a.k.a two metric datapoints. Given that the source IP in the metrics was replaced with the node's IP, it suggests that some NAT might be involved within the host. Regarding the variation in direction that you're seeing, I expect the direction to be consistent with respect to a single connection. Could it be possible that the host at the public IP is initiating other connections towards the host where the Prometheus instance is running? Regardless, I'm keen on resolving this issue to smooth out the kinks in conntrack. Feel free to join our office hours if you'd like to discuss it further 🙂 |
I'll gather some Flows with the Hubble control plane so we can debug further 🙇. |
#1417 this might be related to your issue, but in the meantime, it is possible at all for you to get a sample Hubble flow logs? |
Okay, I found some Hubble flows for the SNAT enabled cluster
The second set doesn't make much sense, there shouldn't be connections initiating from the public IP. |
I can do the same for the non-SNAT cluster if it would be useful. This is just focused on internet egress originating from the pod, right now that's the most important use case I'm looking to solve. |
It appears like Hubble might handle the SNAT properly via https://github.com/cilium/cilium/blob/4912f7a79eabc8e7bd3eec5a0364cde15fe87ec5/pkg/hubble/parser/threefour/parser.go#L177, but we don't provide this info in the Flow? |
The second set of flows has a SYN-ACK, meaning that this is a reply packet, so it does align with the fact that the connection was initiated from the pod to the public IP don't you think? For a TCP connection, we would have sent flows (client -> server) and reply flows (server -> client), so when we have the filter w.r.t your first set of flows, looks like it is being affected by the bug i mentioned previously, which I've had a fix opened here #1438 Going back to the original issue, we are interested in the traffic direction, so can you add the flag |
When using JSON output it appears like I'm running into #1080, not sure if there's a work around. |
oh seems like we haven't implemented the MarshalJSON interface for RetinaMetadata, let me open a fix item for it |
I tested with your fix in #1438 and things are looking good! I'm gonna also test with a fix for MarshalJSON so we can see the |
That's good to hear! Regarding the marshal json issue, I did some preliminary investigation, and it doesn't seem like a simple fix, so this will take some more time 😕 |
As far as this issue is concerned the fix in #1438 resolves any concerns I have with the connection direction tracking. What remains as a nice-to-have is to expose is_reply as a label on the existing This would make it a lot easier to reason about the direction of the traffic. What do you think about that? |
Is your feature request related to a problem? Please describe.
We are attempting to use Retina to distinguish bytes ingressed/egressed through NAT gateways (for pods deployed on a private network) from bytes ingressed/egressed through NLBs (via LoadBalancer Services).
Currently the
packetparser
direction
label appears to be based on the direction of the packet in relation to the container rather than the direction of the connection (e.g.EGRESS
for connections initiated from the container andINGRESS
for connections initiated towards the container.Describe the solution you'd like
We'd like to see if it would be possible to expose the connection direction, potentially as a separate label
connection_direction
or a separate metricadv_connection_bytes
with a different definition of thedirection
label (with rx/tx metrics).The text was updated successfully, but these errors were encountered: