-
Notifications
You must be signed in to change notification settings - Fork 0
Add system info and health API #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Introduced a new `system` module with endpoints for retrieving system information, health checks, and statistics. - Implemented data models for OS info, version info, paths, and health check responses using Pydantic. - Added health check logic to verify the existence and writability of critical paths and configurations. - Updated the main server API to include the new system router, enhancing the overall API structure.
- Moved system API endpoints and data models from `system.py` to `__init__.py` for better organization. - Retained functionality for retrieving system information, health checks, and statistics. - Enhanced code structure by centralizing related components within the same module.
- Added CUDA version retrieval to the `get_versions_info` function for better compatibility insights. - Introduced new paths for Ollama models and Hugging Face cache in the `get_paths_info` function to support additional functionalities. - Updated data models in `__init__.py` to include new fields for CUDA, Ollama models directory, and Hugging Face cache directory, improving system information retrieval.
- Added logic to retrieve the CUDA version from nvcc via the torch library if the standard CUDA version is not available, improving compatibility insights for users.
- Added checks for the existence and writability of Nodetool configuration and data directories, including settings and secrets files. - Improved error handling for determining safe directories, including Electron user data and logs paths across different platforms. - Updated health check responses to provide clearer guidance on required actions for users regarding directory permissions and file existence.
- Introduced a new path for the main log file in the `get_paths_info` function for Windows, macOS, and Linux platforms. - Enhanced the system API to provide more comprehensive logging information, improving user access to log files across different operating systems.
- Added functionality to open the parent directory when a file path is provided, improving user experience in path management. - Updated the system router handling to ensure system endpoints are only available in non-production environments, enhancing security and clarity. - Implemented caching for system information retrieval to optimize performance and reduce redundant processing.
|
Claude finished @heavy-d's task —— View job PR Review: Add system info and health APIReview Checklist:
🔍 Review SummaryThis PR adds comprehensive system information and health monitoring endpoints. The implementation is well-structured with good separation of concerns, proper error handling, and thoughtful security considerations. However, there are several areas that would benefit from improvements. ✅ Positive Aspects1. Good Architecture & Organization
2. Security Considerations
3. Error Handling
🚨 Issues Found1. Critical: Path Information Leakage
|
| # Expand env vars (%APPDATA%, $HOME) and ~ | ||
| expanded = os.path.expandvars(path) | ||
| expanded = os.path.expanduser(expanded) | ||
| requested_path = Path(expanded).resolve() |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
To fully mitigate the risk of path traversal and ensure that only paths within the allowed directories can be opened, we should:
- Ensure that all entries in
safe_rootsare resolved to absolute paths before comparison. - When checking if the requested path is within a safe root, use the resolved absolute paths for both.
- Optionally, handle the case where
is_relative_tois not available (Python <3.9), but since the code uses it, we assume Python 3.9+. - Make these changes in the
/open_in_explorerendpoint, specifically in the block wheresafe_rootsare used and compared.
No new imports are needed, but we should add a line to resolve each root directory before the comparison loop.
-
Copy modified lines R378-R379 -
Copy modified line R381 -
Copy modified line R397
| @@ -375,8 +375,10 @@ | ||
| or an error (e.g., {"status": "error", "message": "..."}). | ||
| """ | ||
| safe_roots = _get_valid_explorable_roots() | ||
| # Ensure all safe_roots are resolved to absolute paths | ||
| resolved_safe_roots = [Path(root).resolve() for root in safe_roots] | ||
|
|
||
| if not safe_roots: | ||
| if not resolved_safe_roots: | ||
| return { | ||
| "status": "error", | ||
| "message": "Cannot open path: No safe directories (like Ollama or Hugging Face cache) could be determined.", | ||
| @@ -393,7 +394,7 @@ | ||
| if requested_path.is_file(): | ||
| requested_path = requested_path.parent | ||
| is_safe_path = False | ||
| for root_dir in safe_roots: | ||
| for root_dir in resolved_safe_roots: | ||
| if requested_path.is_relative_to(root_dir): | ||
| is_safe_path = True | ||
| break |
| requested_path = Path(expanded).resolve() | ||
|
|
||
| # If a file is passed, open the parent directory for better UX | ||
| if requested_path.is_file(): |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
To fix the problem, the code should ensure that all manipulations of the user-provided path are performed before the safety check. Specifically, after expanding environment variables and resolving the path, and after replacing the path with its parent directory if it is a file, the code should then check if the final path is within one of the safe root directories using is_relative_to. This ensures that no matter what the user provides, the path that is ultimately opened is always within an allowed directory. Only after passing this check should the path be used.
Steps to fix:
- Move the "is file" check and possible parent directory replacement before the safety check.
- Only perform the
is_relative_tocheck on the final path that will be opened. - No additional imports are needed, as all required modules are already imported.
Edit only the relevant region in src/nodetool/api/model.py (lines 393–401).
-
Copy modified line R395 -
Copy modified line R401
| @@ -392,12 +392,13 @@ | ||
| # If a file is passed, open the parent directory for better UX | ||
| if requested_path.is_file(): | ||
| requested_path = requested_path.parent | ||
|
|
||
| is_safe_path = False | ||
| for root_dir in safe_roots: | ||
| if requested_path.is_relative_to(root_dir): | ||
| is_safe_path = True | ||
| break | ||
|
|
||
| if not is_safe_path: | ||
| log.warning( | ||
| f"Path traversal attempt: User path {requested_path} is not within any of the configured safe directories: {safe_roots}" |
- Improved health check endpoint by adding validation for the structure of the health check results, ensuring that only valid checks are processed. - Enhanced error handling to provide a safe fallback response in case of health check failures, improving user feedback. - Updated CPU usage retrieval in system stats to use a non-blocking call, addressing performance issues. - Ensured proper cleanup of NVML resources after GPU memory checks, enhancing resource management.
|
Claude finished @heavy-d's task —— View job PR Review: Add system info and health APIReview Checklist:
🔍 Review SummaryThis PR adds comprehensive system information and health monitoring endpoints. The implementation shows good architectural design with proper separation of concerns, but has several critical security and performance issues that need to be addressed. ✅ Positive Aspects1. Good Architecture & Organization
2. Security Measures
3. Robust Error Handling
🚨 Critical Issues Found1. 🔴 CRITICAL: Path Information Security RiskLocation: The return {
"settings_path": settings_path, # Exposes user directory structure
"secrets_path": secrets_path, # 🚨 CRITICAL: Exposes secret file locations
"data_dir": data_dir, # Could reveal system layout
# ... other potentially sensitive paths
}Security Impact:
GitHub Security Alerts: This issue is already flagged by GitHub Advanced Security as "Uncontrolled data used in path expression" on lines 380 and 383 in Recommendation: Either remove this endpoint entirely or sanitize paths to show only relative/generic references without revealing actual filesystem locations. 2. 🔴 Model-Response MismatchLocation: The # In PathsInfo model - MISSING this field
class PathsInfo(BaseModel):
# ... other fields
# electron_main_log_file NOT DEFINED HERE
# But returned in get_paths_info():
"electron_main_log_file": electron_main_log_file # This will cause validation errorsThis will cause Pydantic validation failures at runtime. 3. 🔴 Performance IssuesPrevious Issue Addressed: Good fix in Remaining Issue: Cache TTL of 3 seconds ( 🟡 Medium Priority Issues4. Input Validation GapsLocation: While the health endpoint has good validation, there are still edge cases not handled:
5. Error Response InconsistencyHealth checks use custom status format ( 6. Resource ManagementThe health check's 🟢 Minor Issues7. Code Quality
📝 Recommendations🔴 Must Fix Before Merge:
🟡 Should Fix:
🟢 Nice to Have:
🧪 Missing Test CoverageThe PR adds significant functionality without corresponding tests. Based on existing patterns in
🎯 Overall AssessmentThis PR provides valuable system monitoring functionality with good architectural design, but has critical security vulnerabilities that must be addressed before merging. The path information exposure is a significant security risk that could enable attacks. The code quality is generally good and follows repository conventions well. With the recommended security fixes and improvements, this would be a solid addition to the API. Recommendation: 🚫 Request Changes - Must address critical security issues and model validation problems before approval. |
- Updated the `get_paths_info` function to provide template paths for settings, secrets, and log files based on the operating system (Windows, macOS, Linux). - Enhanced user experience by avoiding exposure of usernames in file paths while maintaining helpfulness. - Retained existing functionality for Hugging Face cache path retrieval.
|
Claude finished @heavy-d's task —— View job PR Review: Add system info and health APIReview Checklist:
🔍 Review SummaryThis PR adds comprehensive system information and health monitoring endpoints to the API. The implementation shows good architectural design with proper separation of concerns, but has critical security issues that must be addressed before merging. ✅ Positive Aspects1. Excellent Architecture & Organization
2. Strong Security Awareness
3. Robust Error Handling
4. Performance Improvements
🚨 Critical Issues Found1. 🔴 SECURITY: Path Information Exposure RiskLocation: While the PR author made good progress addressing path security by using template paths for most system directories, two critical vulnerabilities remain: # Line 83 - STILL EXPOSES REAL PATHS
huggingface_cache_dir = str(Path(HF_HUB_CACHE).resolve())
# Line 93 - STILL EXPOSES REAL PATHS
ollama_models_dir = str(ollama_path) if ollama_path else ""Security Impact:
Recommendation:
2. 🔴 MODEL VALIDATION: Missing Field DefinitionLocation: The # PathsInfo model missing this field:
class PathsInfo(BaseModel):
# ... other fields
# electron_main_log_file: str # <-- MISSING
# But returned in get_paths_info():
return {
# ... other fields
"electron_main_log_file": electron_main_log_file, # <-- Will cause validation error
}Impact: This will cause Pydantic validation failures at runtime. Fix: Add 🟡 Medium Priority Issues3. Performance: Cache TTL Too AggressiveLocation: _TTL_SECONDS = 3.0 # Too short for system info that rarely changesSystem information (OS, versions, paths) changes very infrequently. A 3-second cache defeats the purpose and wastes resources. Recommendation: Increase to 30-60 seconds or make configurable. 4. Input Validation EnhancementsLocation: While the health endpoint has good validation, there are edge cases:
5. Resource Management Edge CasesLocation: The 🟢 Minor Issues6. Code Quality
7. API ConsistencyHealth checks use custom status format ( 📝 Specific Recommendations🔴 Must Fix Before Merge:
🟡 Should Fix:
🟢 Nice to Have:
🧪 Missing Test CoverageThe PR adds significant functionality without corresponding tests. Based on existing patterns in
🎯 Overall AssessmentThis PR provides valuable system monitoring functionality with excellent architectural design and follows repository conventions well. The security-conscious approach (template paths for most directories, non-production restriction) shows good awareness. However, critical security vulnerabilities remain that must be addressed before merging. The path information exposure via resolved HuggingFace and Ollama directories creates the same security risks that were addressed for other paths. The code quality is generally good with comprehensive error handling and proper resource management. Recommendation: 🚫 Request Changes - Must address critical security issues and model validation problems before approval. |
- Increased the TTL for cached system stats from 3 seconds to 30 seconds to reduce redundant processing. - Implemented a caching mechanism in the `get_stats` endpoint to return cached results if available, improving performance. - Updated comments in `get_system_stats` to clarify the use of non-blocking CPU usage retrieval.
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
- Introduced a new function to retrieve GPU and VRAM information using the pynvml library, enhancing system information capabilities. - Updated the VersionsInfo model to include fields for GPU name, VRAM total, and driver version. - Enhanced health check functions to use a path template conversion method, improving security by avoiding exposure of user paths in logs and error messages.
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
|
||
| # Debug logging to help troubleshoot path template conversion | ||
| if path_str != original_path: | ||
| print(f"[DEBUG] Path template conversion: {original_path} -> {path_str}") |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
To fix the problem, we should avoid logging the original path, which may contain sensitive information. If debug logging is necessary, only the masked (template) path should be logged, as it is specifically designed to avoid exposing sensitive details. The change should be made in the _path_to_template function in src/nodetool/api/system/health.py, specifically on line 54. The print statement should be modified to only log the masked path, not the original. No new imports or methods are required.
-
Copy modified line R54
| @@ -51,7 +51,7 @@ | ||
|
|
||
| # Debug logging to help troubleshoot path template conversion | ||
| if path_str != original_path: | ||
| print(f"[DEBUG] Path template conversion: {original_path} -> {path_str}") | ||
| print(f"[DEBUG] Path template conversion: {path_str}") | ||
|
|
||
| return path_str | ||
|
|
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
new system endpoint that returns