Introduce CachedSupplier for BasePersistence objects #1765

adnanhemani · 2025-05-31T01:22:10Z

I came across an interesting bug yesterday that we need to fix to ensure that tasks can use the BasePersistence object, as they run outside of user call contexts.

What I was trying to do:

Create and run a Task which dumps some information to the persistence. In order to do this, I was using the following line of code to get a BasePersistence object: metaStoreManagerFactory.getOrCreateSessionSupplier(CallContext.getCurrentContext().getRealmContext()).get();
Get the following error message when executing the last .get() call:

jakarta.enterprise.context.ContextNotActiveException: RequestScoped context was not active when trying to obtain a bean instance for a client proxy...

When digging deeper into why this is happening, I realized that due to the Supplier's lazy-loading at https://github.com/apache/polaris/blob/main/extension/persistence/relational-jdbc/src/main/java/org/apache/polaris/extension/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L100-L105, the .get() was actually using a RequestScoped realmContext bean given by the previously-ran TokenBroker initialization (which is a RequestScoped object here: https://github.com/apache/polaris/blob/main/quarkus/service/src/main/java/org/apache/polaris/service/quarkus/config/QuarkusProducers.java#L290-L299. Given this is a relatively-new addition, this may be why we haven't seen this bug previously.

As Tasks run asynchronously, likely after the original request was already completed, this error actually makes sense - we should not be able to use a request scoped bean inside of a Task execution. But upon further looking, we do not actually need realmContext for anything other than resolving the realmIdentifier once during the BasePersistence object initialization - as a result, we can cache the BasePersistence object using a supplier that caches the original result instead of constantly making new objects. This will also solve our issue, as the original request scoped RealmContext bean will not be used again during the Task's call to get a BasePersistence object.

I've added a test case that shows the difference between the OOTB supplier and my ideal way to solve this problem using a CachedSupplier. If there is significant concern that we cannot cache the BasePersistence object, we can materialize the RealmContext object prior to the supplier so that at a minimum the RequestScoped RealmContext object is not being used - but I'm not sure if there's an easy way to test this, given that the MetastoreFactories are Quarkus ApplicationScoped objects.

Please note, this is an issue in both EclipseLink and JDBC, as they have almost identical code paths here.

Many thanks to @singhpk234 for being my debugging rubber ducky :)

adnanhemani · 2025-05-31T01:24:53Z

cc @dimas-b (as you are looking at the similar issue at #1758), @eric-maynard , @collado-mike

edit: sorry, wrong PR number

adnanhemani · 2025-06-02T18:34:56Z

cc @adutra as well as you are also looking through #1758 .

adutra · 2025-06-02T19:13:05Z

@adnanhemani thanks for bringing my attention to this PR.

I realized that due to the Supplier's lazy-loading [...] the .get() was actually using a RequestScoped realmContext bean given by the previously-ran TokenBroker initialization

Hmm I looked at your code snippets but I don't see the connection between the TokenBroker bean production and the lazy loading of JdbcBasePersistenceImpl. But assuming that this is happening inside a task executor thread, and the problem is RealmContext, why don't you resolve the realmId eagerly? E.g.:

  private void initializeForRealm(
      RealmContext realmContext, RootCredentialsSet rootCredentialsSet, boolean isBootstrap) {
    String realmId = realmContext.getRealmIdentifier(); // resolve realm ID eagerly
    DatasourceOperations databaseOperations = getDatasourceOperations(isBootstrap);
    sessionSupplierMap.put(
        realmId,
        () ->
            new JdbcBasePersistenceImpl(
                databaseOperations,
                secretsGenerator(() -> realmId, rootCredentialsSet),
                storageIntegrationProvider,
                realmId));

    PolarisMetaStoreManager metaStoreManager = createNewMetaStoreManager();
    metaStoreManagerMap.put(realmId, metaStoreManager);
  }

adnanhemani · 2025-06-02T20:51:45Z

@adutra thanks for taking a look :)

Hmm I looked at your code snippets but I don't see the connection between the TokenBroker bean production and the lazy loading of JdbcBasePersistenceImpl

The connection is that the TokenBroker bean is RequestScoped and it does create a BasePersistence Supplier object as part of the bean initialization using the realmContext in the RequestScoped bean initialization. That BasePersistence Supplier is then stored in the sessionSupplierMap and attempted to be used during the lazy loading - which then tries to load the bean's (now-expired) realmContext. Not sure if this is more clear? Let me know what part that's not clear if not!

But assuming that this is happening inside a task executor thread, and the problem is RealmContext, why don't you resolve the realmId eagerly?

Yes, this was my original idea - but was hard for me to construct a test case for this type of fix. Maybe this is something you've had more experience with - but using a request-scoped realmContext bean during a test was something I just wasn't able to do at all. Additionally, I'm just not sure that we're getting any use for continuously re-creating JdbcBasePersistenceImpl objects - is there really any good reason for us to lazy load this? If not, why not cache the object as-is?

As a result, I'm promoting the CachedSupplier as our preferred way to solve this issue instead. But I'm not heavily tied to this approach if we have a better way to test the way that you suggested.

adutra · 2025-06-02T21:05:07Z

The connection is that the TokenBroker bean is RequestScoped and it does create a BasePersistence Supplier

I still don't see any TokenBroker creating any BasePersistence anywhere in the code 🤔

@adnanhemani as it stands, this PR is imo not mergeable: it has no clear error description, no stack trace that we can investigate, no reproducer, and no test case (CachedSupplierTest is just a unit test, but there is no test that shows evidence of a broken behavior that would be "fixed" by the proposed changes).

adnanhemani · 2025-06-05T03:26:53Z

@adutra - I've reproduced the issue on a branch in my fork: https://github.com/adnanhemani/polaris/tree/ahemani/show_failure_1765

You can read the full diff here, but I made a really simple case here that creates a task when you create a catalog. The task only tries to get the BasePersistence object - which is where the call blows up due to the poisoned cache. Feel free to stick a debugger in there and you'll be able to see, it is because of the lazy loading of the JdbcBasePersistenceImpl class and that the cache poisoning happened due to the creation of the TokenBroker (RequestScoped) bean.

Steps to reproduce the error using the code linked above:

[This can only be reproduced using JDBC or EclipseLink.] Create a Persistence instance and set application.properties to the right set of configurations.
Run: ./polaris --client-id <CLIENT_ID> --client-secret <CLIENT_SECRET> catalogs create polaris1 --storage-type FILE --default-base-location "/var/tmp/polaris1/" (you must try this
Wait for the Task to execute. This will fail and retry - until it runs out of retries altogether and then will print out to logs that the task cannot be successfully completed. A call trace will also be able to be seen here.

You can then apply this PR on top of that code and retry these steps and see that you will no longer see this issue.

More on how the TokenBroker creates the poisoned cache:

tokenBrokerFactory.apply(realmContext): https://github.com/apache/polaris/blob/main/quarkus/service/src/main/java/org/apache/polaris/service/quarkus/config/QuarkusProducers.java#L289. Note this is a RequestScoped bean - and so is realmContext.
createTokenBroker(realmContext): https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java#L53
metaStoreManagerFactory.getOrCreateMetaStoreManager(realmContext): https://github.com/apache/polaris/blob/main/service/common/src/main/java/org/apache/polaris/service/auth/JWTRSAKeyPairFactory.java#L65-L66
initializeForRealm(realmContext, null, false);: https://github.com/apache/polaris/blob/main/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L177

And that call is where the sessionSupplierMap stores the poisoned lambda that creates JdbcBasePersistenceImpl. At no point in this call trace was realmContext reset with a materialized version of the realmIdentifier - which is why it remains as a RequestScoped bean that made its way into the sessionSupplierMap.

Again, your suggestion above to change this behavior by materializing the realmContext (perhaps from the tokenBroker itself) will solve this issue. But I have no idea how to make a test that ensures something like that cannot happen again. If you have an idea, then glad to change the approach to that.

adnanhemani added 2 commits May 30, 2025 17:31

Fix Poisoned Request-Specific Supplier Bug

cae8d2f

added tests

1a1813e

adnanhemani requested review from adutra, ashvina, dennishuo, dimas-b, eric-maynard, jackye1995, jbonofre, vvcephei, collado-mike, snazy and RussellSpitzer as code owners May 31, 2025 01:22

github-project-automation bot added this to Basic Kanban Board May 31, 2025

adnanhemani requested review from takidau, MonkeyCanCode, flyrain, ebyhr, ajantha-bhat, HonahX, singhpk234 and pingtimeout as code owners May 31, 2025 01:22

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board May 31, 2025

spotlessapply

19bde28

adnanhemani mentioned this pull request Jun 5, 2025

Use isolated request contexts for task execution #1817

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce CachedSupplier for BasePersistence objects #1765

Introduce CachedSupplier for BasePersistence objects #1765

Uh oh!

adnanhemani commented May 31, 2025

Uh oh!

adnanhemani commented May 31, 2025 •

edited

Loading

Uh oh!

adnanhemani commented Jun 2, 2025

Uh oh!

adutra commented Jun 2, 2025

Uh oh!

adnanhemani commented Jun 2, 2025

Uh oh!

adutra commented Jun 2, 2025

Uh oh!

adnanhemani commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Introduce CachedSupplier for BasePersistence objects #1765

Are you sure you want to change the base?

Introduce CachedSupplier for BasePersistence objects #1765

Uh oh!

Conversation

adnanhemani commented May 31, 2025

Uh oh!

adnanhemani commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adnanhemani commented Jun 2, 2025

Uh oh!

adutra commented Jun 2, 2025

Uh oh!

adnanhemani commented Jun 2, 2025

Uh oh!

adutra commented Jun 2, 2025

Uh oh!

adnanhemani commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adnanhemani commented May 31, 2025 •

edited

Loading

adnanhemani commented Jun 5, 2025 •

edited

Loading