Skip to content

fix: harden task status locks for transient ENOENT#47

Merged
tctinh merged 2 commits intotctinh:mainfrom
imarshallwidjaja:fix/harden-worktree-lock-handling
Feb 15, 2026
Merged

fix: harden task status locks for transient ENOENT#47
tctinh merged 2 commits intotctinh:mainfrom
imarshallwidjaja:fix/harden-worktree-lock-handling

Conversation

@imarshallwidjaja
Copy link
Collaborator

Fix ENOENT During hive_worktree_create Status Lock Update

Summary

This PR hardens task status locking so hive_worktree_create no longer fails when lock-file creation hits transient ENOENT (seen on Windows setups).

The previous flow could fail after the worktree was successfully created, when updating task status (status.json.lock) during TaskService.update.

Root Cause

Two issues combined:

  1. Lock acquisition (acquireLock/acquireLockSync) treated non-EEXIST errors as fatal and did not recover from transient ENOENT at lock-file creation time.
  2. TaskService.update used read-then-lock/write ordering, leaving a race window between reading status.json and acquiring the file lock.

What Changed

1) Lock acquisition hardened for transient ENOENT

File: packages/hive-core/src/utils/paths.ts

  • acquireLock and acquireLockSync now:
    • Ensure lock parent directory exists before the first lock attempt.
    • Treat ENOENT as retriable (re-ensure directory and retry until timeout).
    • Keep existing EEXIST stale-lock handling unchanged.

2) TaskService.update moved to lock-first read-modify-write

File: packages/hive-core/src/services/taskService.ts

  • TaskService.update now:
    • Acquires lock first.
    • Reads current status while lock is held.
    • Applies updates and writes atomically.
    • Releases lock in finally.

This removes the TOCTOU window from the update path.

3) Regression tests added

File: packages/hive-core/src/utils/paths.test.ts

  • Added test to verify parent directories are created before lock acquisition.
  • Added test to simulate transient ENOENT from openSync and verify lock acquisition retries successfully.

Impact

  • Improves robustness of task status persistence under Windows filesystem timing behavior.
  • Prevents false-negative failures where worktree creation succeeds but status transition fails.
  • No behavior change to lock semantics for normal paths.

Testing

Executed and passing:

  • bun test packages/hive-core/src/utils/paths.test.ts packages/hive-core/src/services/taskService.test.ts
  • bun run test (from packages/hive-core)
  • bun run build (from packages/hive-core)

Issues

Should close issue #45

Ensure lock directories exist and retry ENOENT during lock acquisition to handle transient filesystem behavior on Windows.

Switch TaskService.update to lock-first read-modify-write to avoid race windows between read and lock/write.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens Hive’s on-disk task status locking to avoid transient ENOENT failures (notably on Windows) and removes a TOCTOU window in TaskService.update by switching to lock-first read/modify/write.

Changes:

  • Ensure lock parent directories exist and retry lock acquisition on transient ENOENT in acquireLock / acquireLockSync.
  • Change TaskService.update to acquire the lock before reading and then write via writeJsonAtomic while the lock is held.
  • Add regression tests covering lock parent directory creation and transient ENOENT retry behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
packages/hive-core/src/utils/paths.ts Harden lock acquisition by ensuring lock directories exist and treating ENOENT as retriable.
packages/hive-core/src/utils/paths.test.ts Add tests for nested directory lock acquisition and transient ENOENT retry simulation.
packages/hive-core/src/services/taskService.ts Update task status with lock-first read/modify/write to remove TOCTOU window.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Guard TaskService.update before lock acquisition so missing tasks fail fast without creating task directories via lock path setup.

Also adds a regression assertion and aligns new test literals with existing quote style.
@tctinh tctinh merged commit 37030d9 into tctinh:main Feb 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants