-
Notifications
You must be signed in to change notification settings - Fork 2
feat(#81): add snapshot capture for fault debugging #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Capture topic data when faults transition to CONFIRMED status, enabling
post-mortem debugging of system state at the time of fault occurrence.
- Add SnapshotCapture class with configurable topic resolution
(fault_specific > patterns > default_topics priority)
- Support on-demand and background capture modes
- Store snapshots as JSON in SQLite with indexed fault_code lookup
- Add GetSnapshots service to FaultManager
- Add REST endpoints: GET /api/v1/faults/{code}/snapshots and
GET /api/v1/components/{id}/faults/{code}/snapshots
- Add unit tests for snapshot storage operations
Closes #81
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds snapshot capture functionality to enable post-mortem debugging of system state when faults transition to CONFIRMED status. The implementation captures topic data (serialized as JSON) and stores it in SQLite, with REST API endpoints for retrieval.
Note: There is an inconsistency in the PR description - the title references issue #81, but the description's "Issue" section says "closes #82". Please verify and correct the issue number.
Changes:
- Add SnapshotCapture class with configurable topic resolution (fault_specific > patterns > default_topics priority) and support for on-demand and background capture modes
- Implement snapshot storage in SQLite with indexed lookup and in-memory storage for tests
- Add GetSnapshots ROS 2 service and REST API endpoints for retrieving snapshots
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ros2_medkit_msgs/srv/GetSnapshots.srv | New service definition for retrieving fault snapshots with optional topic filtering |
| src/ros2_medkit_msgs/CMakeLists.txt | Register GetSnapshots service in build system |
| src/ros2_medkit_gateway/src/http/rest_server.cpp | Add two new REST endpoints for snapshot access (system-wide and component-scoped) |
| src/ros2_medkit_gateway/src/http/handlers/fault_handlers.cpp | Implement HTTP handlers for snapshot retrieval with validation and error handling |
| src/ros2_medkit_gateway/src/fault_manager.cpp | Add client-side implementation for GetSnapshots service calls |
| src/ros2_medkit_gateway/include/ros2_medkit_gateway/http/handlers/fault_handlers.hpp | Add handler method declarations for snapshot endpoints |
| src/ros2_medkit_gateway/include/ros2_medkit_gateway/fault_manager.hpp | Add get_snapshots method and service client member |
| src/ros2_medkit_fault_manager/test/test_sqlite_storage.cpp | Add unit tests for snapshot storage operations (store, retrieve, filter, multiple) |
| src/ros2_medkit_fault_manager/src/sqlite_fault_storage.cpp | Implement snapshot table schema and storage operations with indexes |
| src/ros2_medkit_fault_manager/src/snapshot_capture.cpp | New class implementing topic capture logic with on-demand and background modes |
| src/ros2_medkit_fault_manager/src/fault_storage.cpp | Add snapshot storage to in-memory implementation |
| src/ros2_medkit_fault_manager/src/fault_manager_node.cpp | Integrate snapshot capture triggering on fault confirmation and add GetSnapshots service handler |
| src/ros2_medkit_fault_manager/package.xml | Add dependencies for ros2_medkit_serialization and nlohmann-json-dev |
| src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/sqlite_fault_storage.hpp | Add snapshot method declarations to SQLite storage interface |
| src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/snapshot_capture.hpp | New header defining SnapshotCapture class and configuration structures |
| src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/fault_storage.hpp | Add SnapshotData struct and snapshot methods to storage interface |
| src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/fault_manager_node.hpp | Add snapshot capture member and service handler declaration |
| src/ros2_medkit_fault_manager/CMakeLists.txt | Add snapshot_capture.cpp to build, link serialization and JSON libraries |
- Add snapshot endpoints to Gateway README with examples - Add 3 snapshot requests to Postman collection - Add REQ_INTEROP_088 for fault snapshots in requirements - Add note about SOVD "environment data" terminology mapping - Add @verifies tags to snapshot storage tests
c69117e to
2c1209e
Compare
- Fix reentrancy: use dedicated callback group and local executor instead of spin_some on main node (avoids service callback issues) - Improve regex handling: log ERROR for invalid patterns, track count - Improve error message: specify 1-256 character limit for fault codes - Fix captured_at: omit field when no snapshots exist (was epoch 0.0) - Add yaml-cpp comment explaining transitive dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.
postman/collections/ros2-medkit-gateway.postman_collection.json
Outdated
Show resolved
Hide resolved
- Fix README: document correct response format (topics object, not array) - Fix Postman: update description to reflect actual response structure - Fix executor cleanup: use local callback group per capture operation (ensures clean lifecycle - both executor and callback group destroyed together when function returns)
- Add 14 unit tests for SnapshotCapture class (constructor, config, regex, topic resolution) - Add 6 unit tests for FaultManager::get_snapshots client (timeout, JSON parsing, errors) - Add 5 integration tests for REST snapshot endpoints (404/400 error handling) - Add snapshot endpoints to root endpoint response
2c1209e to
d9135b2
Compare
- Add snapshots.config_file parameter for loading fault_specific and patterns - Implement load_snapshot_config_from_yaml() using yaml-cpp - Add example config file (config/snapshots.yaml) - Add 5 integration tests for snapshot capture functionality - Tests verify capture on fault confirmation, error handling, and topic filtering
6049fcf to
511f2fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.
- Add validation for timeout_sec parameter (must be positive) - Add validation for max_message_size parameter (must be positive) - Document snapshots.config_file parameter in README - Remove outdated TODO markers, add YAML config example - Use assertGreater instead of assertTrue for better error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated no new comments.
- Add snapshot cleanup to SqliteFaultStorage::clear_fault - Add snapshot cleanup to InMemoryFaultStorage::clear_fault - Add unit test verifying snapshots are deleted on clear
9745d8a to
95f2208
Compare
- Add ~/get_snapshots service to Services table - Add snapshot capture feature description - Add Snapshot Parameters section with all config options - Include YAML config example and topic resolution priority
- Create docs/tutorials/snapshots.rst with full configuration guide - Cover quick start, advanced YAML config, querying via REST API - Include example workflow and troubleshooting section - Add to tutorials index in Basic Tutorials section
Capture topic data when faults transition to CONFIRMED status, enabling post-mortem debugging of system state at the time of fault occurrence.
Closes #81
Pull Request
Summary
Briefly describe what changed and why.
Issue
Link the related issue (required):
Type
Testing
How was this tested / how should reviewers verify it?
Checklist