You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background
OLake currently supports Change Data Capture (CDC) for MongoDB using Change Streams, which efficiently tracks inserts, updates, and deletes in real time. However, many users—especially those on MongoDB shared clusters (e.g., Atlas M0, M2, M5 tiers) or self-hosted deployments without replica sets—may not have access to Change Streams, as it requires replica set or sharded cluster configurations.
For these users, OLake should support an alternative incremental sync method that does not rely on Change Streams but still efficiently captures newly inserted and updated documents.
Feature Scope
This feature will introduce a query-based incremental sync mechanism for MongoDB that does not require Change Streams. Possible approaches include:
1. Timestamp-based Sync (updatedAt field tracking)
Users specify a timestamp field (e.g., updatedAt, lastModified, or a custom field) to track changes.
OLake will store the last synced timestamp and query documents where updatedAt > last_synced_timestamp.
This method requires users to ensure that the updatedAt field is updated on every modification (e.g., using MongoDB triggers or application logic).
Background
OLake currently supports Change Data Capture (CDC) for MongoDB using Change Streams, which efficiently tracks inserts, updates, and deletes in real time. However, many users—especially those on MongoDB shared clusters (e.g., Atlas M0, M2, M5 tiers) or self-hosted deployments without replica sets—may not have access to Change Streams, as it requires replica set or sharded cluster configurations.
For these users, OLake should support an alternative incremental sync method that does not rely on Change Streams but still efficiently captures newly inserted and updated documents.
Feature Scope
This feature will introduce a query-based incremental sync mechanism for MongoDB that does not require Change Streams. Possible approaches include:
1. Timestamp-based Sync (updatedAt field tracking)
2. ObjectId-based Sync (For Insert-Only Collections)
3. Soft Delete Handling (Optional)
Implementation Details
1. User Configuration:
2. Efficient Query Execution:
3. Checkpointing & State Management:
Fallback Mechanisms:
If the specified tracking field is missing or unreliable, OLake should:
Schema Evolution Handling:
Deliverables
Impact
The text was updated successfully, but these errors were encountered: