Skip to content

Conversation

@vdegans
Copy link

@vdegans vdegans commented Sep 11, 2025

DRILL-8529: Caching QueryPlan Results

Description

Implements a caching mechanism for query plans and transformations, to shorten the prepare phase.

Documentation

The cache behavior can be customized via drill-override.conf

planner {
  query {
    cache {
      max_entries_amount: 100       # Maximum number of cached query plans (default: 100)
      plan_cache_ttl_minutes: 300   # Time-to-live for cached query plans in minutes (default: 300)
    }
  }
  transform {
    cache {
      max_entries_amount: 100       # Maximum number of cached transform plans (default: 100)
      plan_cache_ttl_minutes: 300   # Time-to-live for cached transform plans in minutes (default: 300)
    }
  }
}
  • max_entries_amount: limits the number of cached plans. Older entries are evicted when the limit is reached.
  • plan_cache_ttl_minutes: sets the lifetime of cached plans. Expired entries are recomputed on next use.

At runtime, caching can also be toggled with:
planner.cache.enable (true = enabled, false = disabled)

Testing

  • Manual testing shows reduced query planning time for repeated large query plans.
  • Automated tests are being added to verify correctness and cache eviction behavior.

@cgivre cgivre added doc-impacting PRs that affect the documentation performance PRs that Improve Performance labels Sep 11, 2025
@cgivre
Copy link
Contributor

cgivre commented Sep 11, 2025

@vdegans Wow! This is an impressive first contribution to Drill! Before we start review, would you please do a clean rebase on master? I'm sure you didn't mean to pull in all those old versions.

@vdegans vdegans force-pushed the rebased-cache branch 2 times, most recently from 5ac8803 to c49c581 Compare September 11, 2025 13:18
@vdegans
Copy link
Author

vdegans commented Sep 11, 2025

@cgivre Thanks! I fixed the rebase.

@cgivre
Copy link
Contributor

cgivre commented Sep 11, 2025

@cgivre Thanks! I fixed the rebase.

Thanks. Before I start review I had a few questions:

  1. Is there ever a case where someone might want different cache settings for different storage plugins?
  2. Or.. is there a situation where a user might want to disable caching entirely for certain plugins but not others?

What I'm getting at here is would it make sense to have global settings which you already have, but then also give the user the ability to set custom settings for specific plugins if they wanted to do so. I genuinely don't know if that is worth the effort or not. I could imagine this being more of an issue with data where the schema could change--MongoDB or JSON files for instance--and queries like SELECT * might bring back different data every time you run them.

@vdegans
Copy link
Author

vdegans commented Sep 11, 2025

Thanks, that’s a great point. I agree that giving users control over caching at the per-plugin level makes sense. Some plugins, especially ones where the underlying data might change frequently, like MongoDB or JSON files, could benefit from having caching disabled or customized independently from the global settings. I think supporting per-plugin cache configuration would give users the flexibility to optimize caching behavior for their specific use cases and improve the overall user experience.

@cgivre
Copy link
Contributor

cgivre commented Sep 18, 2025

@vdegans Hi Vincent, Any update?

@vdegans
Copy link
Author

vdegans commented Sep 19, 2025

Hi @cgivre, I was thinking about the suggestion for a setting to enable/disable caching per plugin and I got stuck on the idea of when to cache and when not to.
I think the best solution right now is to disable caching all together when one of the used plugins is set to disabled for caching, since I am not sure how a partially cached query plan would work (if it could even work).

I didn't get much time to look at the code yet, but I would like to hear your thoughts about the settings per plugin.

@cgivre
Copy link
Contributor

cgivre commented Sep 22, 2025

Hi @cgivre, I was thinking about the suggestion for a setting to enable/disable caching per plugin and I got stuck on the idea of when to cache and when not to. I think the best solution right now is to disable caching all together when one of the used plugins is set to disabled for caching, since I am not sure how a partially cached query plan would work (if it could even work).

I didn't get much time to look at the code yet, but I would like to hear your thoughts about the settings per plugin.

The whole idea of partially cached query plans is extremely tricky. I think but could be wrong but there may have been some work on that from the Calcite team at one point.

In any event, my suggestion would be to start simple. Let's get all the unit tests to pass and just start with simple caching. IE: exact query match. Once that's done and merged, we can iterate and find improvements.

@vdegans vdegans marked this pull request as ready for review October 9, 2025 11:46
@cgivre
Copy link
Contributor

cgivre commented Oct 9, 2025

@vdegans You should rebase on current master. I think that will solve the size limit issue you're running into.

@vdegans
Copy link
Author

vdegans commented Oct 9, 2025

I think caffeine might cause this, should I add caffeine to the exclude list?

@cgivre
Copy link
Contributor

cgivre commented Oct 9, 2025

I think caffeine might cause this, should I add caffeine to the exclude list?

You can either exclude caffeine or bump up the max size. Either is fine.

@vdegans
Copy link
Author

vdegans commented Oct 9, 2025

Locally this seems to have fixed the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-impacting PRs that affect the documentation performance PRs that Improve Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants