Skip to content

colgrep index update takes a long time on certain repo states #116

@cktang88

Description

@cktang88

I have a large repo that I updated from latest remote, which updated ~276 files. I did colgrep "hello" right after and it hangs forever, ended up taking over 9 minutes.

I believe this particular diff is problematic because there were lots of large files that were deleted (.rbi files in this case). (Happy to provide any more info you may need!)

Image

FWIW Codex says:

Your colgrep hello is not stuck in the search. It is holding the updater lock and burning CPU inside:

search -> run_indexing -> incremental_update -> delete_file_from_index -> next_plaid::filtering::delete

The bad part is that colgrep found ~276 changed indexed files on master. For each changed file, it calls delete_file_from_index separately at /tmp/next-plaid-colgrep-debug/colgrep/src/index/mod.rs:2585. That calls delete_from_index and filtering::delete at /tmp/next-plaid-colgrep-debug/colgrep/src/index/mod.rs:2594.

Both of those are expensive full-index rewrites:

next-plaid/src/delete.rs:91 loops through every chunk and rewrites chunk files.
next-plaid/src/delete.rs:187 rebuilds the IVF from all remaining codes.
next-plaid/src/filtering.rs:1139 creates a temp table for the whole metadata table.
next-plaid/src/filtering.rs:1148 deletes all metadata rows.
next-plaid/src/filtering.rs:1152 inserts everything back with renumbered subset.
So with 276 changed files, it is effectively doing hundreds of full-ish rewrites of a ~1.2GB index. That looks like “hangs forever” because it is O(changed_files * index_size).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions