Skip to content

Fix entitlements in internalClusterTest #131539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mosche
Copy link
Contributor

@mosche mosche commented Jul 18, 2025

  • Previously, entitlement checks got disabled when resetting the policy manager (which defaults to inactive).
  • The shared data dir is granted as additional data base directory.
  • Due to the lack of entitlement delegation and wipePendingDataDirectories using server's FileSystemUtils, node base directories won't be removed until after the test.
  • Disable entitlement checks for some command tests.
  • Disable entitlement checks for some tests requiring entitlement delegation.

I'll follow up with better managing the lifecycle of test entitlement state, as discussed on Slack.

* Previously, entitlement checks got disabled when resetting the policy
  manager (which defaults to inactive).
* The shared data dir is granted as additional data base directory.
* Due to the lack of entitlement delegation and wipePendingDataDirectories using
  server's FileSystemUtils, node base directories won't be removed until
  after the test.
* Disable entitlement checks for some command tests.
* Disable entitlement checks for some tests requiring entitlement delegation.
@mosche mosche requested a review from a team July 18, 2025 15:12
@mosche mosche added >refactoring test-windows Trigger CI checks on Windows :Core/Infra/Entitlements Entitlements infrastructure test-fips Trigger CI checks for FIPS labels Jul 18, 2025
@elasticsearchmachine elasticsearchmachine added v9.2.0 Team:Core/Infra Meta label for core/infra team labels Jul 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@@ -367,9 +367,6 @@ private ModuleEntitlements getModuleScopeEntitlements(
* @return true if permission is granted regardless of the entitlement
*/
boolean isTriviallyAllowed(Class<?> requestingClass) {
if (generalLogger.isTraceEnabled()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging in entitlements can be sketchy, especially on such a critical path. this breaks in a settings test that sets log level to trace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have encountered other such tests before, and managed to get them working without removing this logging. I'm not sure it's reasonable for a test to turn on trace logging globally and then make assertions about the logs that result.

An example is here. There was no particular reason that test needed to use the root logger, so I changed it to use a more specific logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble to me isn't the test, but (file) logging itself triggering further entitlement checks, especially on this path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blew the logging framework in this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds familiar too. 🤔

Could that happen in production? If so, we ought to fix it; if not, we ought to arrange things to make those entitlements trivially allowed. In either case, we shouldn't need to remove logging from PolicyManager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can absolutely happen in production, though it should be very rare. Nevertheless I'd consider logging in isTriviallyAllowed rather dangerous and would rather remove, especially if it contains an exception.

What happens is:

  1. Java loads a new class via URLClassPath.
  2. We emit a trace log containing an exception in isTriviallyAllowed
  3. Serialization of that exception requires log4j to load another class (ThrowableProxy)
  4. We emit another trace log containing an exception in isTriviallyAllowed
  5. That blows log4j with a ClassCircularityError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist, we can move the stacktrace down into

generalLogger.trace("Entitlement not trivially allowed");

once we've granted trusted system classes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah your scenario is a bit concerning. I'm good with moving it down if you are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh and if you feel like you want to delete it, I can live with that too. I was just hesitating because being unable to log is a big problem for a system as complex as entitlements, so we just want to take some care before we resign ourselves to having "no logging zones" in the code.)

@@ -115,20 +134,31 @@ private static Collection<Path> dataDirs(Settings settings, Path homeDir) {
: dataDirs.stream().map(TestEntitlementBootstrap::absolutePath).toList();
}

private static Path sharedDataDir(Settings settings) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for some tests, though it looks like we never grant PATH_SHARED_DATA_SETTING in production.
Is this a test-only thing? Or is that a bug?

@@ -93,6 +94,7 @@
import static org.hamcrest.Matchers.startsWith;

@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0)
@ESTestCase.WithoutEntitlements // commands don't run with entitlements enforced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate how often we need to do this. Makes me wonder if there's a more general rule we could apply so that WithoutEntitlements only needs to be used in exceptional cases. 🤔

Copy link
Contributor

@prdoyle prdoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're on the same page about what's needed. I'll proactively approve to avoid delays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Entitlements Entitlements infrastructure >refactoring Team:Core/Infra Meta label for core/infra team test-fips Trigger CI checks for FIPS test-windows Trigger CI checks on Windows v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants