Improve index cleanup #5709

nickva · 2025-10-23T07:33:31Z

Fix these issues:

Index cleanup triggered by smoosh only cleaned view indexes and purge checkpoints, make sure to also clean nouveau and search indexes and checkpoints.
/_search_cleanup endpoint only cleaned indexes on the coordinator node, despite being a clustered fabric endpoint. Fix it so it cleans indexes on other nodes as well as most users would expect.
Nouveau cleanup didn't clean purge checkpoints, so make it do so. Use the mrview purge checkpoint strategy for all indexes. This improves dreyfus logic to avoid traversing internal disk paths from clouseau.
Make _view_cleanup clean all the index types. This is a bit dirty, but it may be better than adding another cleanup _index_cleanup API. Left _search_cleanup and _nouveau_cleanup APIs as is for now.

Some optimizations:

For each index clean request fetch ddocs once. Calculate signatures and then call the remote node cleanup logic with them. This avoids fetching design documents multiple times or sending all of them to the worker nodes. This is what Nouveau is doing so stick with that nice pattern.
Use erpc for remote calls. Our Erlang version is high enough (25+) to use the multple requests pattern from erpc. This is more compact than rexi. The absolute timeout pattern makes it simpler to have a global timeout for the whole request.

Cleanups:

Make purge checkpoint fetching and cleanup more uniform. Use common utility logic in couch_index_util for all indexes.
Make index cleanup similar between all three indexes. Use the same erpc pattern for all of them.
Move index cleanup functions from fabric.erl to its own module -- fabric_index_cleanup.erl
Rename fabric function names more uniform and clearly indicate if it cleans indexes on all nodes or just the current node.
Add the db name to dreyfus #index{} record when it initializes so it matches Nouveau.

rnewson

very nice cleanup of cleanup code.

src/couch_mrview/src/couch_mrview_cleanup.erl

rnewson · 2025-10-23T10:08:50Z

src/dreyfus/src/clouseau_rpc.erl

    ok.

-cleanup(DbName, ActiveSigs) ->
+cleanup(DbName, #{} = SigMap) ->


a _search_cleanup during a cluster upgrade will crash then? I think that's fine as they are rare but might be worth noting somewhere.

We can add an upgrade clause as well. It's easy enough in this case

Added an upgrade clause

Fix these issues: * Index cleanup triggered by smoosh only cleaned view indexes and purge checkpoints, make sure to also clean nouveau and search indexes and checkpoints. * `/_search_cleanup` endpoint only cleaned indexes on the coordinator node, despite being a clustered fabric endpoint. Fix it so it cleans indexes on other nodes as well as most users would expect. * Nouveau cleanup didn't clean purge checkpoints, so make it do so. Use the mrview purge checkpoint strategy for all indexes. This improves dreyfus logic to avoid traversing internal disk paths from clouseau. * Make `_view_cleanup` clean all the index types. This is a bit dirty, but it may be better than adding another cleanup `_index_cleanup` API. Left `_search_cleanup` and `_nouveau_cleanup` APIs as is for now. Some optimizations: * For each index clean request fetch ddocs once. Calculate signatures and then call the remote node cleanup logic with them. This avoids fetching design documents multiple times or sending all of them to the worker nodes. This is what Nouveau is doing so stick with that nice pattern. * Use erpc for remote calls. Our Erlang version is high enough (25+) to use the multple requests pattern from erpc. This is more compact than rexi. The absolute timeout pattern makes it simpler to have a global timeout for the whole request. Cleanups: * Make purge checkpoint fetching and cleanup more uniform. Use common utility logic in `couch_index_util` for all indexes. * Make index cleanup similar between all three indexes. Use the same erpc pattern for all of them. * Move index cleanup functions from fabric.erl to its own module -- fabric_index_cleanup.erl * Rename fabric function names more uniform and clearly indicate if it cleans indexes on all nodes or just the current node. * Add the db name to dreyfus `#index{}` record when it initializes so it matches Nouveau.

nickva force-pushed the fix-index-cleanup branch from 5b5ca2b to b1751b9 Compare October 23, 2025 07:35

rnewson reviewed Oct 23, 2025

View reviewed changes

nickva force-pushed the fix-index-cleanup branch 2 times, most recently from e79027c to 1ccf4c6 Compare October 24, 2025 21:44

nickva requested a review from rnewson October 25, 2025 05:29

nickva force-pushed the fix-index-cleanup branch from 1ccf4c6 to 912b00c Compare October 26, 2025 04:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve index cleanup #5709

Improve index cleanup #5709

Uh oh!

nickva commented Oct 23, 2025

Uh oh!

rnewson left a comment

Uh oh!

Uh oh!

rnewson Oct 23, 2025

Uh oh!

nickva Oct 23, 2025

Uh oh!

nickva Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve index cleanup #5709

Are you sure you want to change the base?

Improve index cleanup #5709

Uh oh!

Conversation

nickva commented Oct 23, 2025

Uh oh!

rnewson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rnewson Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

nickva Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

nickva Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants