Skip to content

Stop watching gitignored directories in the repo file watcher#12122

Merged
kevinyang372 merged 3 commits into
masterfrom
kevin/watcher-gitignore-prune
Jun 4, 2026
Merged

Stop watching gitignored directories in the repo file watcher#12122
kevinyang372 merged 3 commits into
masterfrom
kevin/watcher-gitignore-prune

Conversation

@kevinyang372
Copy link
Copy Markdown
Member

Description

What: Make Warp's repository file watcher stop registering filesystem watches on gitignored directories (node_modules, build output, vendored deps, etc.), while still watching directories that consumers explicitly care about via registered "ignored-path interests".

Why: On large monorepos the watcher recursively registered an inotify watch for every non-.git directory, including gitignored ones. On a customer's remote-server daemon this ballooned to ~11 GB RSS, with ~98% of the live heap inside notify::inotify::EventLoop::add_watch (the per-directory watch table + cloned path buffers). The descend filter only pruned .git/ internals, never gitignored content.

How: The watcher's descend predicate (repo_watch_filter, used by both DirectoryWatcher and LocalRepoMetadataModel) is now gitignore-aware:

  • New should_watch_repo_directory(path, gitignores, interests):
    1. .git/ internals keep the existing allowlist (should_watch_directory_in_git_path).
    2. Any directory on the path to — or inside — a registered ignored-path interest is always watched (reuses matches_ignored_path_interest, which also matches the ancestor prefixes leading to an interest).
    3. Otherwise, gitignored directories are pruned.
  • Pruning uses ancestor-aware matching (matched_path_or_any_parents), which guarantees the watcher's monotonicity invariant: a child of an ignored directory is also reported ignored, so we never accept a descendant after rejecting its parent. Directory-only re-include negations (e.g. parentdir/* + !parentdir/*/) keep working because the re-included directory matches as not-ignored on its own path (last-match-wins).
  • repo_watch_filter(gitignores, interests) is now parameterized; both watch-registration sites pass the repo's root + global gitignores (matching the existing is_ignored tagging) and their interest list.
  • DirectoryWatcher gains an ignored_path_interests list + register_ignored_path_interests; app/src/lib.rs registers the skill-provider paths on it too (they were already registered on RepoMetadataModel), so gitignored skill directories (e.g. .agents/skills) stay watched on both the LSP/MCP path and the file-tree/skills path.

Consumer impact: Skills are covered on both watcher paths via interests. Project rules already skip gitignored files on live updates; MCP config files are normally tracked. The emit predicate is unchanged (gitignored files inside a watched, non-ignored directory are still emitted and tagged is_ignored).

Known follow-ups (not in this PR):

  • LSP servers that register didChangeWatchedFiles globs pinned to a gitignored base directory: plumb those base dirs in as interests. Default behavior otherwise matches typical editors (no events for gitignored paths). LSP already filters to server-registered globs.
  • Nested per-directory .gitignore files are not consulted by the descend filter (same limitation as the existing is_ignored tagging); this can only over-watch, never miss events.
  • Consolidating the two repo_metadata watchers and adding a watch-count metric/cap.

Linked Issue

  • The linked issue is labeled ready-to-spec or ready-to-implement.
  • Where appropriate, screenshots or a short video of the implementation are included below (especially for user-visible or UI changes).

No linked GitHub issue. This is an internal reliability fix surfaced by heap profiling of a customer's remote-server daemon.

Testing

Added unit tests in crates/repo_metadata/src/entry_tests.rs for should_watch_repo_directory:

  • gitignored directory is pruned (and its descendants, ancestor-aware);
  • a registered interest is descended to and into, even under a gitignored ancestor;
  • nested case: a/b ignored but a/b/c is an interest → full prefix descended, ignored sibling pruned;
  • directory-only re-include negation (parentdir/* + !parentdir/*/) descends subdirectories while the loose ignored file stays filtered;
  • .git allowlist behavior preserved.

Validation run locally:

  • cargo test -p repo_metadata --features local_fs — 72 passed (2 pre-existing flaky ignored), including the 5 new tests.
  • cargo clippy -p repo_metadata --features local_fs --all-targets — clean.
  • ./script/format / cargo fmt -- --check — clean.
  • cargo check --workspace --all-targets — compiles (incl. lsp, warp app crate).
  • Targeted cargo-clippy-diff origin/master HEAD — no new lint violations.

Full cargo clippy --all-targets --all-features -D warnings and cargo nextest run --workspace will run in CI.

  • I have manually tested my changes locally with ./script/run

Agent Mode

  • Warp Agent Mode - This PR was created via Warp's AI Agent Mode

CHANGELOG-BUG-FIX: Reduced memory usage of the file watcher on large projects by no longer watching gitignored directories (e.g. node_modules, build output).

Co-Authored-By: Warp agent@warp.dev

@cla-bot cla-bot Bot added the cla-signed label Jun 3, 2026
@kevinyang372 kevinyang372 marked this pull request as ready for review June 3, 2026 03:15
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented Jun 3, 2026

@kevinyang372

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR updates the repository file watcher to prune gitignored directories during recursive watch registration while preserving explicitly registered ignored-path interests such as skill directories. The implementation wires the new gitignore-aware descend filter into both repository watcher paths and adds focused unit coverage for ignored directories, interests, nested interests, re-include negations, and .git allowlist behavior.

Concerns

  • No blocking correctness, security, or spec-alignment concerns found in the annotated diff.

Verdict

Found: 0 critical, 0 important, 0 suggestions

Approve

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

@kevinyang372 kevinyang372 force-pushed the kevin/watcher-gitignore-prune branch from ca30a85 to ec4a5b5 Compare June 3, 2026 18:04
Copy link
Copy Markdown
Member Author

kevinyang372 commented Jun 3, 2026

@kevinyang372 kevinyang372 requested a review from alokedesai June 3, 2026 18:05
Copy link
Copy Markdown
Member

@alokedesai alokedesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main question is whether this still allows for watching a gitignored directory if it's lazy loaded? That's obviously important for us to continue to support.

Comment thread crates/repo_metadata/src/entry.rs Outdated
Comment thread crates/repo_metadata/src/entry.rs Outdated
Comment thread crates/repo_metadata/src/watcher.rs Outdated
Comment thread crates/repo_metadata/src/watcher.rs Outdated
Comment thread crates/repo_metadata/src/watcher.rs Outdated
kevinyang372 and others added 2 commits June 3, 2026 18:03
Make the repository file watcher's descend predicate gitignore-aware so we
no longer register inotify watches on gitignored directories (node_modules,
build output, vendored deps), while still watching registered ignored-path
interests (e.g. skill provider dirs). Fixes excessive memory usage observed
on large monorepos.

Co-Authored-By: Warp <agent@warp.dev>
@kevinyang372 kevinyang372 force-pushed the kevin/watcher-gitignore-prune branch from ec4a5b5 to 10c4d2e Compare June 3, 2026 22:25
@kevinyang372
Copy link
Copy Markdown
Member Author

My main question is whether this still allows for watching a gitignored directory if it's lazy loaded? That's obviously important for us to continue to support.

@alokedesai There shouldn't be regression in most of the cases (MacOS and Windows doesn't go through the tree descend code path to register inode watchers)

One case that won't work after this is expanding a gitignored folder in Linux file tree. This will be fixed in the follow-up to this PR (we will register non-recursive watcher as user expands the folder)

@kevinyang372 kevinyang372 enabled auto-merge (squash) June 4, 2026 04:10
@kevinyang372 kevinyang372 merged commit 3497d18 into master Jun 4, 2026
27 checks passed
@kevinyang372 kevinyang372 deleted the kevin/watcher-gitignore-prune branch June 4, 2026 04:26
kevinyang372 added a commit that referenced this pull request Jun 5, 2026
## Description
On Linux, navigating a remote (or local) session into a large non-git
directory such as `/workspaces` registered a **recursive** filesystem
watch over the entire subtree. Because a non-git parent has no root
`.gitignore`, the gitignore-based prune (PR #12122) has nothing to
prune, so every nested repo's `target/`, `node_modules/`, etc. got
watched — 20,000+ inotify watches and the ~11 GB remote-server daemon
memory blowup a customer reported.

This PR makes the watch for lazy-loaded (non-git) standalone roots as
lazy as the directory tree itself, **on Linux only**: instead of one
recursive watch on the root, we watch only the directories that are
actually loaded, each `NonRecursive`, and add a watch for each
subdirectory as the user expands it. Memory now scales with what the
user expands rather than the whole subtree.

### Platform scoping
The per-directory inotify cost is Linux-specific. macOS (FSEvents) and
Windows (ReadDirectoryChangesW) watch a tree recursively with a single
OS handle, so recursive watching there is cheap and unaffected. Git
repos remain recursively watched on all platforms (they rely on
gitignore pruning for correctness).

### Key changes (`crates/repo_metadata/src/local_model.rs`)
- Introduced a `RepoWatchMode` enum that is the stored per-repo watch
state: `RecursiveOnRoot`, or `LazyNonRecursive { watched_dirs }` which
carries the set of directories currently watched under a lazy root (root
+ expanded subdirs).
- `add_repository_internal` takes the mode, maps it to the watcher's
`RecursiveMode`, records it, and (when replacing a prior lazy
registration, e.g. a lazy root upgraded to a git repo) unregisters stale
per-directory watches.
- `index_lazy_loaded_path` registers the root `NonRecursive` on Linux
(`Recursive` elsewhere).
- `load_directory` and the watcher-event handler register a
`NonRecursive` watch on each newly loaded subdirectory and record it.
- `remove_repository` matches on the recorded mode to unregister every
tracked per-directory watch (lazy) or just the root (recursive).
- Event processing is unchanged: a single
`BulkFilesystemWatcher`/debouncer funnels all events, and
`handle_watcher_event` routes each path to its owning root by
longest-prefix match, so non-recursive watches need no new plumbing.

## Linked Issue
<!--
Link the GitHub issue this PR addresses. Before opening this PR, please
confirm:
-->
- [ ] The linked issue is labeled `ready-to-spec` or
`ready-to-implement`.
- [ ] Where appropriate, screenshots or a short video of the
implementation are included below (especially for user-visible or UI
changes).

This is a follow-up to #12122 (gitignore-aware pruning for git-repo
recursive watches); this PR bounds non-git lazy roots, which that prune
cannot help.

## Testing
- Added unit tests in `crates/repo_metadata/src/local_model_tests.rs`:
- `index_lazy_loaded_path_tracks_only_root` — lazy root records
`LazyNonRecursive` with only the root on Linux, `RecursiveOnRoot`
elsewhere.
- `load_directory_tracks_expanded_subdir_for_lazy_root` — expanding a
subdir adds its watch on Linux.
- `recursive_repo_uses_recursive_watch_mode` — git repos record
`RecursiveOnRoot` and aren't tracked as lazy.
- `remove_lazy_loaded_path_clears_tracked_watches` — teardown clears all
tracked watches and repo state.
- `cargo nextest run -p repo_metadata --features local_fs` (new +
adjacent tests pass), `cargo check -p repo_metadata` (default,
non-`local_fs`), `./script/format`, and `cargo clippy -p repo_metadata
--features local_fs --all-targets -- -D warnings` all clean.
- Manual: built/deployed the remote-server daemon to a Linux devbox, `cd
/workspaces`, and confirmed via `[watch_inode_debug]` logging (added in
#12122) and `/proc/<pid>/fdinfo` that the watched-directory count drops
from ~20,000 to a small number that grows only as folders are expanded;
git repos opened at their root are unaffected.

- [x] I have manually tested my changes locally

## Agent Mode
- [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode

CHANGELOG-BUG-FIX: Fixed excessive memory usage (and inotify watch
exhaustion) on Linux when a Warp session navigated into a large non-git
directory; lazy directory roots are now watched incrementally as folders
are expanded instead of recursively up front.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants