Skip to content

✨ add NetworkPolicy objects for catalogd and operator-controller #1942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

joelanford
Copy link
Member

Description

This PR adds NetworkPolicy objects, 1 each for operator-controller and catalogd.

Both network policies allow:

  • all egress traffic (for communication with arbitrary image registries)
  • ingress traffic to the respective metrics ports

In addition, catalogd's network policy also allows

  • ingress traffic to the mutating webhook server
  • ingress traffic to the catalogd http server

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@joelanford joelanford requested a review from a team as a code owner April 29, 2025 13:30
@openshift-ci openshift-ci bot requested review from anik120 and trgeiger April 29, 2025 13:30
Copy link

netlify bot commented Apr 29, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit ebad7c5
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/681100f7e4c28700091567ab
😎 Deploy Preview https://deploy-preview-1942--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@joelanford
Copy link
Member Author

I did verify that the NetworkPolicy objects have an effect in kind clusters. For example, I:

  • changed the egress policy to be empty (which means allow no egress). This broke both catalogd and operator-controller.
  • changed the ingress policy of catalogd such that the catalogd http server was denied. This broke operator-controller.

- protocol: TCP
port: 9443 # webhook
egress:
- {} # Allows all egress traffic (needed to pull catalog images from arbitrary image registries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add space at the end of the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link

codecov bot commented Apr 29, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 66.67%. Comparing base (1171691) to head (ebad7c5).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1942   +/-   ##
=======================================
  Coverage   66.67%   66.67%           
=======================================
  Files          75       75           
  Lines        6326     6326           
=======================================
  Hits         4218     4218           
  Misses       1843     1843           
  Partials      265      265           
Flag Coverage Δ
e2e 45.12% <ø> (ø)
unit 56.25% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford Just a quick heads-up — the network policies won’t be tested at all with the default Kind configuration.

To ensure we properly test the NetworkPolicy resources scaffolded by us, we’ll need to follow the same or a similar approach used in Kubebuilder’s own e2e setup. That means disabling the default CNI and installing Calico manually.

You can see how they handle this here:

Before moving forward, we should make sure the changes are tested end-to-end. I’d suggest we replicate the setup done there to ensure coverage.

Let me know if you’d like help setting it up!

control-plane: catalogd-controller-manager
policyTypes:
- Ingress
- Egress
Copy link
Contributor

@camilamacedo86 camilamacedo86 Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need Ingress, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I did like you there, and I received this feedback from community: kubernetes-sigs/kubebuilder#3853 (comment)

So, we might need to reconsider.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicitly want to allow all egress traffic both catalogd and operator-controller because they need to be able to communicate with unknown-in-advance image registries. If we are not explicitly granting that access, then another NetworkPolicy that matches our pod that is more restrictive could break us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to be more restrictive to make sense.
But I get your point.
We can also enhance it on follow ups. So, 👍

- protocol: TCP
port: 8443 # metrics
egress:
- {} # Allows all egress traffic (needed to pull bundle images from arbitrary image registries)
Copy link
Contributor

@camilamacedo86 camilamacedo86 Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a bit of context — back when we added this to Kubebuilder, there was a request from the community for us to use labels.

The idea was to allow scenarios where, for example, a Pod wants to scrape metrics. In those cases, having specific labels becomes necessary to permit that traffic through the NetworkPolicy.

You might want to take a look at this example for reference:
https://github.com/kubernetes-sigs/kubebuilder/blob/master/testdata/project-v4/config/network-policy/allow-metrics-traffic.yaml#L19-L24

So, we might want to consider the same approach. Sharing to allow you to think about the pros and cons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are fully open now, I think keeping it simple and slightly more open is better for this first pass. If we find later that we can ratchet down access to metrics and the catalogd webhook port in a general way, we can do that later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we can do a follow-up after checking that this config works on the downstream as well.
It is fine.

@joelanford
Copy link
Member Author

Installing a separate CNI is no longer necessary. See:

camilamacedo86
camilamacedo86 previously approved these changes Apr 29, 2025
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approved

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2025
@bentito
Copy link
Contributor

bentito commented Apr 29, 2025

Should this have a new e2e specific to validating the reduced network access are working as intended?

This change proves that the NetworkPolicy for catalogd and
operator-controller allows scraping metrics from outside the namespace
in which catalogd and operator-controller are running.

Signed-off-by: Joe Lanford <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2025
@joelanford
Copy link
Member Author

Should this have a new e2e specific to validating the reduced network access are working as intended?

Brett and I discussed this a bit offline and couldn't immediately figure out a concrete test to add.

  • What we can say is that the current NetworkPolicy objects are working because the e2e's are still passing.
  • But, we cannot say that the current NetworkPolicy objects are the minimum necessary for the e2e's to pass.

It would be nice to be able to introduce a higher barrier for changing network policy because it is a security-sensitive change. We would not want to accidentally open up network access unnecessarily.

From a technical perspective, I'm not sure tests that probing blocked ports would work all that well because we may need to have something actually listening on the blocked port and matched by the network policy selector in order to ever see something other than a network failure (would the client see a difference between "NetworkPolicy blocked the traffic" and "nothing is listening there"?)

Perhaps we could have a meta-test like "make sure the NetworkPolicy looks like $this", where "$this" is some sort of annotated list of rules where a justification is provided. If we had that, then:

  • PR authors would have to change "$this" if they wanted to change the NetworkPolicy.
  • Reviewers would be confronted with the change and would have a chance to see the justification for the change, which would theoretically provide more context to a review than an opaque two-line YAML change in the NetworkPolicy definition.

@bentito
Copy link
Contributor

bentito commented Apr 30, 2025

/lgtm

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants