Feature/auth and deployment polish#13
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces an in-memory logging system with a /system/logs API endpoint, adds comprehensive deployment testing infrastructure through bash and Python scripts, and migrates from KBUtilLib to direct HTTP requests for KBase Workspace API interactions. The changes improve observability and testing capabilities but introduce several critical security and reliability issues that must be addressed.
Changes:
- Added in-memory logging system (
MemoryLogHandler) with/system/logsendpoint for real-time log access - Removed KBUtilLib dependency and migrated to direct HTTP requests with
requests.Session(trust_env=False) - Added 10+ test scripts covering deployment validation, integration testing, and end-to-end scenarios
- Updated documentation with testing guides and deployment instructions
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 33 comments.
Show a summary per file
| File | Description |
|---|---|
| app/services/logger.py | New in-memory log handler storing up to 1000 recent log entries |
| app/routes.py | Added /system/logs endpoint to retrieve recent logs with filtering |
| app/main.py | Integrated logging setup into app lifespan |
| app/utils/workspace.py | Migrated from KBUtilLib to direct HTTP API calls with custom session handling |
| app/utils/init.py | Updated documentation to reflect direct HTTP usage |
| Dockerfile | Removed KBUtilLib cloning and PYTHONPATH configuration |
| validate_deployment.sh | New deployment validation script with health checks and API tests |
| tests/integration/*.sh | Multiple integration test scripts for comprehensive backend validation |
| scripts/test_deployment.sh | Comprehensive deployment test suite with 13 test categories |
| scripts/run_e2e.sh | End-to-end test script for deployment simulation |
| scripts/README.md | Documentation for deployment test workflow |
| tests/integration/README.md | Integration test usage guide and troubleshooting |
| tests/integration/test_deployment.py | Python-based deployment integration tests |
| tests/integration/test_docker_deployment.py | Docker environment simulation tests |
| scripts/test_production_ready.py | Production readiness validation script |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __init__( | ||
| self, | ||
| token: str, | ||
| token: str | None, |
There was a problem hiding this comment.
The type annotation 'str | None' uses Python 3.10+ union syntax, but the Dockerfile specifies python3.13-alpine which supports this. However, the inconsistency with line 8's 'str | None' vs line 30's 'str | None' and line 244's 'str | None' in routes.py suggests the codebase may have mixed usage. Verify that all deployment targets support Python 3.10+ or use 'Optional[str]' for broader compatibility.
app/services/logger.py
Outdated
| except Exception: | ||
| self.handleError(record) |
There was a problem hiding this comment.
The handleError method call on line 29 will be called when emit() encounters an exception, but the parent class's handleError will silently ignore it by default. This could mask logging failures. Consider adding explicit error logging or raising to ensure logging issues are visible during development and debugging.
| # Get token from .env | ||
| TOKEN=$(grep KB_SERVICE_AUTH_TOKEN .env | cut -d'=' -f2 | tr -d '"' | tr -d "'") |
There was a problem hiding this comment.
The TOKEN variable extraction uses a fragile parsing approach that could fail if the .env file format varies. Line 45 uses cut/tr to parse the token, but doesn't handle cases where the value is quoted with spaces inside quotes (e.g., 'KB_SERVICE_AUTH_TOKEN="token value"'). Consider using a more robust parsing method or a dedicated .env file parser.
| # Get token from .env | |
| TOKEN=$(grep KB_SERVICE_AUTH_TOKEN .env | cut -d'=' -f2 | tr -d '"' | tr -d "'") | |
| # Get token from .env (handle spaces and quoted values robustly) | |
| TOKEN=$(awk ' | |
| /^[[:space:]]*KB_SERVICE_AUTH_TOKEN[[:space:]]*=/ { | |
| line = $0 | |
| # Remove the key and '=' (with optional surrounding spaces) | |
| sub(/^[[:space:]]*KB_SERVICE_AUTH_TOKEN[[:space:]]*=/, "", line) | |
| # Trim leading/trailing whitespace | |
| gsub(/^[[:space:]]*/, "", line) | |
| gsub(/[[:space:]]*$/, "", line) | |
| # Strip a single pair of surrounding quotes, if present | |
| if (line ~ /^".*"$/ || line ~ /^'\''.*'\''$/) { | |
| line = substr(line, 2, length(line) - 2) | |
| } | |
| print line | |
| exit | |
| }' .env) |
| # Use a session to ensure clean connection handling and bypass proxies | ||
| with requests.Session() as session: | ||
| session.trust_env = False # Critical for avoiding stalls |
There was a problem hiding this comment.
Setting trust_env=False disables reading proxy settings from environment variables (HTTP_PROXY, HTTPS_PROXY, etc.). While the comment indicates this is intentional to avoid stalls, this could break deployments in environments that require proxy configuration. Consider making this configurable via a setting rather than hardcoding, or document this behavior clearly for deployment scenarios.
| # Use a session to ensure clean connection handling and bypass proxies | |
| with requests.Session() as session: | |
| session.trust_env = False # Critical for avoiding stalls | |
| # Use a session to ensure clean connection handling and (by default) bypass proxies | |
| with requests.Session() as session: | |
| # Allow overriding trust_env via environment for proxy-dependent deployments. | |
| # Default remains False to avoid stalls in environments where proxies misbehave. | |
| trust_env_override = os.getenv("WORKSPACE_TRUST_ENV") | |
| if trust_env_override is not None: | |
| session.trust_env = trust_env_override.lower() in ("1", "true", "yes", "on") | |
| else: | |
| session.trust_env = False |
| def _workspace_auth_header(self) -> str: | ||
| """Return Authorization header value. | ||
|
|
||
| Args: | ||
| handle_ref: Handle ID (KBH_xxxxx format) | ||
| target_path: Where to save the file | ||
|
|
||
| Returns: | ||
| Path to downloaded file | ||
| KBase workspace API expects just the token string, NOT "Bearer <token>". | ||
| This method ensures we send the raw token without any prefix. | ||
| """ | ||
| # Ensure directory exists | ||
| target_path = Path(target_path) | ||
| target_path.parent.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| if self._use_kbutillib and self._client: | ||
| try: | ||
| # Ensure KBUtilLib has the token set | ||
| if hasattr(self._client, 'save_token'): | ||
| self._client.save_token(self.token, namespace="kbase") | ||
| result = self._client.download_blob_file(handle_ref, str(target_path)) | ||
| if result: | ||
| return Path(result) | ||
| except Exception as e: | ||
| logger.warning(f"KBUtilLib download_blob_file failed: {e}. Using fallback.", exc_info=True) | ||
|
|
||
| return Path(self._download_blob_fallback(handle_ref, str(target_path))) | ||
|
|
||
| # ========================================================================= | ||
| # FALLBACK METHODS (Direct API calls) | ||
| # ========================================================================= | ||
|
|
||
| def _workspace_auth_header(self) -> str: | ||
| """Return Authorization header value for KBase Workspace API. Workspace expects Bearer token.""" | ||
| t = self.token or "" | ||
| if t.startswith("Bearer ") or t.startswith("OAuth "): | ||
| return t | ||
| return f"Bearer {t}" if t else "" | ||
| if not self.token: | ||
| return "" | ||
| # Ensure we strip any Bearer prefix that might have been passed incorrectly | ||
| token = self.token.strip() | ||
| if token.startswith("Bearer "): | ||
| token = token[7:].strip() | ||
| return token |
There was a problem hiding this comment.
The _workspace_auth_header method strips "Bearer " prefix from tokens (lines 56-58), but this may break authentication if the KBase Workspace API actually expects "Bearer " in some contexts. The comment on line 50 states the API expects just the token, but this should be verified. If both formats are valid in different contexts, the method should preserve the original format or clearly document which format is expected where.
| from pathlib import Path | ||
| from typing import Any | ||
| import time | ||
| import shutil |
There was a problem hiding this comment.
Import of 'shutil' is not used.
| import shutil |
| import requests | ||
| from app.utils.cache import get_upa_cache_path | ||
| from pathlib import Path | ||
| from typing import Any, Optional |
There was a problem hiding this comment.
Import of 'Optional' is not used.
| from typing import Any, Optional | |
| from typing import Any |
|
|
||
| def test_file_permissions(self): | ||
| """Test that cache directory has correct permissions.""" | ||
| app = create_app() |
There was a problem hiding this comment.
Variable app is not used.
| app = create_app() | |
| create_app() |
| print('=' * 70) | ||
|
|
||
| # Test 2: Workspace access | ||
| pangenome_data = test_workspace_access(upa) |
There was a problem hiding this comment.
Variable pangenome_data is not used.
| pangenome_data = test_workspace_access(upa) | |
| test_workspace_access(upa) |
app/services/logger.py
Outdated
| level = level.lower() | ||
| # Filter by level severity logic could be added here, | ||
| # for now just exact match or simple inclusion | ||
| # But usually we want min level. | ||
| # Let's just return all for clients to filter or implement simple filtering | ||
| pass |
There was a problem hiding this comment.
Unnecessary 'pass' statement.
| level = level.lower() | |
| # Filter by level severity logic could be added here, | |
| # for now just exact match or simple inclusion | |
| # But usually we want min level. | |
| # Let's just return all for clients to filter or implement simple filtering | |
| pass | |
| # Normalize level; actual filtering can be implemented here later. | |
| level = level.lower() |
This pull request introduces a new in-memory logging system for the backend, allowing recent logs to be accessed via a new API endpoint. It also adds comprehensive documentation and scripts for deployment and integration testing, improving observability and test coverage. The most significant changes are grouped below:
In-memory logging and log API:
MemoryLogHandlerinapp/services/logger.pyto store up to 1000 recent log entries in memory, with a setup function to attach it to the root logger. (F3833d7R1)app/main.py, ensuring logs are captured from startup./system/logsendpoint inapp/routes.pyto retrieve recent logs, with optional filtering by log level and limit. Includes aLogEntryresponse model. [1] [2] [3]Testing and deployment documentation:
scripts/test_deployment.shand its documentation inscripts/README.md, providing a comprehensive deployment test suite covering all major features, authentication methods, and object-specific tests.scripts/run_e2e.sh, a full end-to-end test script simulating deployment scenarios and validating data integrity, authentication, and multi-database support.tests/integration/README.mddescribing available integration test scripts, usage, features, and troubleshooting, and referencing the consolidated deployment test.Other improvements:
app/utils/__init__.pyto clarify that KBase Workspace API interactions now use direct HTTP requests instead ofKBUtilLib.… testsThis pull request introduces a new in-memory logging system with a corresponding system logs API endpoint, improves developer documentation and deployment testing, and cleans up legacy dependencies. The main highlights are the addition of a
/system/logsendpoint for real-time log access, a newMemoryLogHandlerfor capturing logs in memory, and comprehensive end-to-end deployment test scripts for validating backend functionality.System Logging and API Enhancements:
MemoryLogHandlerinapp/services/logger.pyto capture and store recent log records in memory, along with utility functions for setup and retrieval.setup_logging()inapp/main.py) and exposed a new/system/logsendpoint inapp/routes.pyfor retrieving recent logs, with support for filtering and limiting results. [1] [2] [3]Developer Experience and Testing:
scripts/run_e2e.sh) to perform end-to-end backend validation, including authentication, table access, pagination, statistics, and multi-database support.scripts/README.mdto document available scripts and describe the deployment test workflow in detail.Dependency and Documentation Cleanup:
KBUtilLibdependency from the Dockerfile, reflecting a move to direct HTTP requests for KBase Workspace API interactions. [1] [2]These changes improve observability, streamline deployment validation, and simplify the codebase for future maintenance.