-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement incremental update system with snapshot restore #232
base: main
Are you sure you want to change the base?
feat: Implement incremental update system with snapshot restore #232
Conversation
WalkthroughThe pull request introduces an Changes
Sequence DiagramsequenceDiagram
participant PS as ParsingService
participant IUS as IncrementalUpdateService
participant DB as Database
participant Graph as Knowledge Graph
PS->>IUS: analyze_directory(changed_files)
IUS->>IUS: create_snapshot()
IUS->>IUS: update_files(changed_files)
IUS->>Graph: identify affected nodes
IUS->>Graph: remove old nodes
IUS->>Graph: create new nodes/relationships
IUS->>IUS: update_inferences(affected_nodes)
alt Update Successful
IUS-->>PS: return update results
else Update Failed
IUS->>IUS: restore_snapshot()
IUS-->>PS: return error
end
Assessment against linked issues
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
app/modules/parsing/graph_construction/parsing_service.py (1)
516-529
: Consider attaching the original exception for clarity.Static analysis suggests raising exceptions with
raise ... from e
to keep the original traceback context. You might do the following to preserve the chain:-raise HTTPException( - status_code=500, - detail=f"Update failed and restored to previous state: {str(e)}" -) +raise HTTPException( + status_code=500, + detail=f"Update failed and restored to previous state: {str(e)}" +) from e🧰 Tools
🪛 Ruff (0.8.2)
525-528: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
app/modules/parsing/incremental_update_service.py (2)
89-110
: Handy retrieval of change history.Limiting the results to 100 is a pragmatic safeguard. Provide pagination or filtering if your users need deeper logs.
201-213
: Snapshot listing is concise.Returning snapshot objects sorted by creation date is straightforward. Consider storing the snapshot creation pipeline for quick retrieval or UI display.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
app/modules/parsing/graph_construction/parsing_service.py
(9 hunks)app/modules/parsing/incremental_update_service.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
app/modules/parsing/graph_construction/parsing_service.py
525-528: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
535-535: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (28)
app/modules/parsing/graph_construction/parsing_service.py (14)
7-8
: Nice import additions for incremental updates.Using
Optional[List[str]]
forchanged_files
is a clean approach, and importingtime
for snapshot naming seems appropriate.
25-25
: Good modular design.Importing
IncrementalUpdateService
here centralizes the incremental logic, helping maintain separation of concerns between parsing and incremental updates.
47-54
: Great initialization of the incremental update service.Fetching Neo4j config from
config_provider
and passing it intoIncrementalUpdateService
ensures consistent configuration usage across services. The approach is clean and maintains your existing injection pattern.
174-179
: Docstring clarity is appreciated.These lines provide a clear explanation of the new
changed_files
parameter and how it impacts the parsing process.
194-201
: Proactive snapshot creation is wise.Capturing the graph state before updating files is essential for ensuring rollback safety. This design choice enhances resilience.
202-214
: Incremental update logic is well-structured.Invoking
update_files
, then aggregating node and relationship counts, provides clear feedback on the scope of changes. Logging the results is helpful for debugging and tracking.
218-220
: Good use of project status updates.Updating the project status to
READY
upon successful incremental updates aligns well with user expectations.
224-236
: Robust failure handling.Catching exceptions, attempting snapshot restoration, and gracefully falling back to a full parse is a solid approach to error recovery. This ensures minimal data loss and a consistent final state.
239-244
: Smart snapshot creation again.Even for full parses, retaining a snapshot before the heavy-lift parsing ensures a safe fallback point. Great consistency.
280-286
: Additional safety net for full parse failures.Restoring from a snapshot on parse failure helps maintain data integrity. Great to see consistent fallback steps.
300-327
: Parallel incremental logic for other languages.Using the same fallback approach for non-Python/JS languages fosters a uniform strategy across the codebase. Maintain consistency with the next lines if you plan to expand supported languages.
468-474
: Newupdate_files
method is well-defined.Accepting
project_id
,file_paths
, and user info is consistent with your existing patterns. This method neatly encapsulates incremental file updates for external callers.
485-490
: Snapshot creation before partial updates.Continuing the snapshot pattern ensures safe rollbacks for multi-file updates as well. Great design reuse.
492-507
: Incremental update block is cohesive.Using
update_files
fromIncrementalUpdateService
and returning the results object with node/relationship counts fosters clarity for API consumers.app/modules/parsing/incremental_update_service.py (14)
13-24
: Constructor’s scope is clear.Injecting Neo4j driver,
CodeGraphService
, andInferenceService
fosters straightforward usage across methods. Consider gracefully closing resources if used in a long-lived context.
25-38
: Correct node retrieval method.Querying Neo4j by
file_path
is a direct, efficient filter. Remember to guard for directory separators or OS-specific path nuances if that becomes relevant (e.g., Windows vs. Unix).
39-55
: Good approach for identifying connected nodes.Pulling connected nodes up to two hops away ensures capturing relevant references for partial inference updates. For large graphs, watch performance cost.
56-67
: Straightforward node deletion.Detaching and removing the file’s nodes helps ensure a clean slate for subsequent insertion. Confirm that any global references that aren’t in the same invoice are not inadvertently removed.
68-88
: Solid change logging mechanism.Creating a
CHANGE
node in Neo4j for each update event is a nice way to track revision history. Consider indexing ontimestamp
orfile_path
to optimize large change logs.
111-200
: Snapshot creation thoroughly captures nodes and relationships.Storing node labels, properties, and edges ensures a robust restore path. This approach is a good foundation for partial rollbacks or advanced versioning features down the line.
214-300
: Restore workflow is thorough.Deleting all existing nodes before re-inserting from snapshot ensures a consistent state. A chunk-based approach with transactions is wise to prevent partial restores.
301-326
: Snapshot deletion is correct.The straightforward query with
DELETE s
effectively cleans up the node. You could add checks to remove any associated indexes if you store them in separate structures, but that’s optional.
327-358
: Snapshot info retrieval.This method is valuable for introspection, returning node/relationship counts along with a timestamp.
359-454
: Incremental file updates with rollback approach.
- Good use of
_remove_file_nodes
to purge outdated references.- Incorporating
RepoMap
for file-level graph generation is consistent.- The final call to
_create_change_log
ensures traceability.Watch out for concurrency issues if multiple updates happen simultaneously, but your single-transaction pattern likely mitigates it for now.
455-482
: Targeted inference updates.Limiting inference updates to affected nodes conserves resources. The chunked approach (
batch_size = 50
) is a good balance for large data sets.
483-507
: Bulk file updates.Looping over file paths and calling
update_file
individually is a simple, maintainable approach. Results are aggregated in a dict, which is easy to parse and interpret.
508-547
: File deletion logic.
- Calculating
affected_nodes
then removing them is correct.- The final call to
_update_inferences
preserves data integrity for connected references.Consider partial verification that removing certain files doesn't break references in unrelated files, though your design likely addresses that with the multi-hop approach.
548-582
: File status retrieval.Provides a concise summary: total nodes, types, and last update. Great for an at-a-glance check.
hey @onyedikachi-david thanks for contributing! can you please fix merge conflicts, I will pick up the review today |
Fixes: #221
/claim #221
Summary by CodeRabbit
Release Notes
New Features
Improvements
Performance