Skip to content

Conversation

@nickva
Copy link
Contributor

@nickva nickva commented Oct 23, 2025

Fix these issues:

  • Index cleanup triggered by smoosh only cleaned view indexes and purge checkpoints, make sure to also clean nouveau and search indexes and checkpoints.

  • /_search_cleanup endpoint only cleaned indexes on the coordinator node, despite being a clustered fabric endpoint. Fix it so it cleans indexes on other nodes as well as most users would expect.

  • Nouveau cleanup didn't clean purge checkpoints, so make it do so. Use the mrview purge checkpoint strategy for all indexes. This improves dreyfus logic to avoid traversing internal disk paths from clouseau.

  • Make _view_cleanup clean all the index types. This is a bit dirty, but it may be better than adding another cleanup _index_cleanup API. Left _search_cleanup and _nouveau_cleanup APIs as is for now.

Some optimizations:

  • For each index clean request fetch ddocs once. Calculate signatures and then call the remote node cleanup logic with them. This avoids fetching design documents multiple times or sending all of them to the worker nodes. This is what Nouveau is doing so stick with that nice pattern.

  • Use erpc for remote calls. Our Erlang version is high enough (25+) to use the multple requests pattern from erpc. This is more compact than rexi. The absolute timeout pattern makes it simpler to have a global timeout for the whole request.

Cleanups:

  • Make purge checkpoint fetching and cleanup more uniform. Use common utility logic in couch_index_util for all indexes.

  • Make index cleanup similar between all three indexes. Use the same erpc pattern for all of them.

  • Move index cleanup functions from fabric.erl to its own module -- fabric_index_cleanup.erl

  • Rename fabric function names more uniform and clearly indicate if it cleans indexes on all nodes or just the current node.

  • Add the db name to dreyfus #index{} record when it initializes so it matches Nouveau.

@nickva nickva force-pushed the fix-index-cleanup branch from 5b5ca2b to b1751b9 Compare October 23, 2025 07:35
Copy link
Member

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice cleanup of cleanup code.

ok.

cleanup(DbName, ActiveSigs) ->
cleanup(DbName, #{} = SigMap) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a _search_cleanup during a cluster upgrade will crash then? I think that's fine as they are rare but might be worth noting somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add an upgrade clause as well. It's easy enough in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an upgrade clause

@nickva nickva force-pushed the fix-index-cleanup branch 2 times, most recently from e79027c to 1ccf4c6 Compare October 24, 2025 21:44
@nickva nickva requested a review from rnewson October 25, 2025 05:29
Fix these issues:

  * Index cleanup triggered by smoosh only cleaned view indexes and purge
  checkpoints, make sure to also clean nouveau and search indexes and
  checkpoints.

  * `/_search_cleanup` endpoint only cleaned indexes on the coordinator node,
  despite being a clustered fabric endpoint. Fix it so it cleans indexes on
  other nodes as well as most users would expect.

  * Nouveau cleanup didn't clean purge checkpoints, so make it do so. Use the
  mrview purge checkpoint strategy for all indexes. This improves dreyfus logic
  to avoid traversing internal disk paths from clouseau.

  * Make `_view_cleanup` clean all the index types. This is a bit dirty, but it
  may be better than adding another cleanup `_index_cleanup` API. Left
  `_search_cleanup` and `_nouveau_cleanup` APIs as is for now.

Some optimizations:

  * For each index clean request fetch ddocs once. Calculate signatures and
  then call the remote node cleanup logic with them. This avoids fetching
  design documents multiple times or sending all of them to the worker nodes.
  This is what Nouveau is doing so stick with that nice pattern.

  * Use erpc for remote calls. Our Erlang version is high enough (25+) to use
  the multple requests pattern from erpc. This is more compact than rexi. The
  absolute timeout pattern makes it simpler to have a global timeout for the
  whole request.

Cleanups:

  * Make purge checkpoint fetching and cleanup more uniform. Use common utility
  logic in `couch_index_util` for all indexes.

  * Make index cleanup similar between all three indexes. Use the same erpc
  pattern for all of them.

  * Move index cleanup functions from fabric.erl to its own module --
  fabric_index_cleanup.erl

  * Rename fabric function names more uniform and clearly indicate if it cleans
  indexes on all nodes or just the current node.

  * Add the db name to dreyfus `#index{}` record when it initializes so it
  matches Nouveau.
@nickva nickva force-pushed the fix-index-cleanup branch from 1ccf4c6 to 912b00c Compare October 26, 2025 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants