Skip to content

feat(database): add WAL checkpoint on shutdown and startup recovery#3353

Open
Mustaqeem66 wants to merge 1 commit into
tailcallhq:mainfrom
Mustaqeem66:fix/wal-checkpoint-on-shutdown
Open

feat(database): add WAL checkpoint on shutdown and startup recovery#3353
Mustaqeem66 wants to merge 1 commit into
tailcallhq:mainfrom
Mustaqeem66:fix/wal-checkpoint-on-shutdown

Conversation

@Mustaqeem66
Copy link
Copy Markdown

Summary

This fix addresses .forge.db corruption issues that cause ForgeCode to get stuck or lose data.

Changes

  1. Startup WAL Recovery - Checkpoints any leftover WAL from previous crashed sessions and runs database integrity check on startup

  2. Auto-Checkpoint Threshold Reduced - Changed from 1000 to 100 frames (~5MB max instead of ~50MB), preventing massive WAL files during long sessions

  3. Async Checkpoint Method - Added checkpoint_async() for graceful shutdown scenarios using pool-based connection (async-safe)

  4. Drop Checkpoint - Checkpoints WAL when DatabasePool is dropped to prevent data loss

  5. Comprehensive Tests - Added 7 tests covering all new functionality

Motivation

This addresses issues similar to #3260 where:

  • WAL files accumulate without proper checkpointing
  • Crash/interrupt leaves uncommitted data
  • Database corruption occurs during long sessions

Testing

All existing tests pass plus 7 new tests:

  • test_checkpoint_method_exists
  • test_drop_calls_checkpoint
  • test_in_memory_pool_has_checkpoint
  • test_checkpoint_truncates_wal
  • test_wal_recovery_on_startup
  • test_async_checkpoint_method
  • test_autocheckpoint_threshold_reduced

This fix addresses .forge.db corruption issues in ForgeCode by:

1. Startup WAL Recovery:
   - Checkpoints any leftover WAL from previous crashed sessions
   - Runs database integrity check on startup
   - Ensures data is recovered before new session starts

2. Auto-Checkpoint Threshold Reduced:
   - Changed from 1000 to 100 frames (~5MB max instead of ~50MB)
   - Prevents massive WAL files during long sessions

3. Async Checkpoint Method:
   - Added checkpoint_async() for graceful shutdown scenarios
   - Uses pool-based connection (async-safe)

4. Drop Checkpoint:
   - Checkpoints WAL when DatabasePool is dropped
   - Logs warnings if fails (expected on force-kill)

5. Comprehensive Tests:
   - test_checkpoint_method_exists
   - test_drop_calls_checkpoint
   - test_in_memory_pool_has_checkpoint
   - test_checkpoint_truncates_wal
   - test_wal_recovery_on_startup
   - test_async_checkpoint_method
   - test_autocheckpoint_threshold_reduced

Fixes tailcallhq#3260 related corruption issues by preventing WAL accumulation
and ensuring data integrity on startup.

Co-authored-by: Mustaqeem66 <ageisnode@gmail.com>
@github-actions github-actions Bot added the type: feature Brand new functionality, features, pages, workflows, endpoints, etc. label May 18, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 18, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feature Brand new functionality, features, pages, workflows, endpoints, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants