Skip to content

feat(index): log resume progress when continuing an interrupted build#118

Merged
raphaelsty merged 1 commit into
mainfrom
feat/resume-progress-log
Jun 1, 2026
Merged

feat(index): log resume progress when continuing an interrupted build#118
raphaelsty merged 1 commit into
mainfrom
feat/resume-progress-log

Conversation

@raphaelsty
Copy link
Copy Markdown
Collaborator

Summary

When a resumable build resumes after an interruption (timeout, Ctrl-C, crash), it now prints a one-line summary of how much work is being skipped:

📋 Resuming interrupted build: 100 files already indexed, 200 remaining

The already-committed files are excluded from the todo set in build_resumable, so they are never re-parsed or re-embedded. Previously a resume looked identical to a fresh build from the outside, which made it hard to tell that prior work was being reused. This just surfaces the existing behavior.

Why

Came out of verifying the resumable-build guarantee end-to-end. I generated a ~12,000-unit synthetic repo (300 files) and interrupted indexing mid-build, both ways:

Interrupt Committed at cut After resume Re-encoded on resume
SIGKILL (hard crash) 100 files / 4,100 docs 300 files / 12,300 docs only the 200 remaining files
SIGINT (graceful Ctrl-C) 100 files / 4,100 docs 300 files / 12,300 docs only the 200 remaining files

In both cases:

  • The resume re-embedded only the not-yet-committed files — the 100 already-committed files were skipped entirely.
  • The fully-committed chunk files were left byte-identical across the resume (only the trailing partial chunk was reopened to append new docs).
  • The final index was identical to a clean full build — same file count (300) and document count (12,300), no duplication, no loss — and searched correctly.

So the build correctly resumes from the last checkpoint and never recomputes committed work.

Testing

  • cargo build -p colgrep, cargo fmt/clippy -D warnings clean (pre-commit CI passed)
  • Manual SIGKILL + SIGINT interrupt/resume cycles as above

When a resumable build picks up after an interruption, print how many files
were already committed and how many remain. The already-committed files are
excluded from the `todo` set, so they are never re-parsed or re-embedded —
this just surfaces that to the user (previously a resume looked identical to a
fresh build from the outside).

Verified end-to-end by hard-killing (SIGKILL) and gracefully interrupting
(SIGINT) a ~12k-unit build mid-way: in both cases the resume re-encoded only
the not-yet-committed files and produced an index identical to a clean full
build (same file and document counts), with the already-committed chunks left
byte-identical.
@raphaelsty raphaelsty merged commit 4e02558 into main Jun 1, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant