fix: ensure KBase Workspace auth header uses Bearer prefix#12
fix: ensure KBase Workspace auth header uses Bearer prefix#12VibhavSetlur merged 1 commit intodevelopfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the KBase Workspace utility client to consistently send properly formatted Authorization headers (Bearer) for Workspace API calls, while also removing some legacy/unused code in the workspace database download path.
Changes:
- Added
_workspace_auth_header()and switched Workspace JSON-RPC calls to use it (Bearer prefix handling). - Removed
sanitize_idusage for per-database cache subdirectories indownload_db_multi. - Removed
was_cachedfromdownload_db_multi’s return payload.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| t = self.token or "" | ||
| if t.startswith("Bearer ") or t.startswith("OAuth "): | ||
| return t |
There was a problem hiding this comment.
_workspace_auth_header currently passes through tokens prefixed with OAuth , but the Workspace API expects Bearer auth (per the docstring and PR intent). If a caller accidentally supplies an OAuth ... header value, this will preserve the wrong scheme and likely fail. Consider normalizing by stripping any leading Bearer / OAuth prefix from self.token and always returning Bearer <token> for Workspace requests.
| t = self.token or "" | |
| if t.startswith("Bearer ") or t.startswith("OAuth "): | |
| return t | |
| t = (self.token or "").strip() | |
| # Normalize any existing scheme to Bearer by stripping known prefixes | |
| if t.startswith("Bearer "): | |
| t = t[len("Bearer ") :].lstrip() | |
| elif t.startswith("OAuth "): | |
| t = t[len("OAuth ") :].lstrip() |
|
|
||
| # Fast path: if already cached, return without hitting Workspace/Shock | ||
| db_dir = base_dir / safe_db_name | ||
| db_dir = base_dir / db_name |
There was a problem hiding this comment.
There are integration/security tests for path traversal protections (e.g. sanitize_id), but nothing appears to cover traversal via the multi-db db_name parameter. Adding a test that exercises download_db_multi (or the /db/{db_name}/... routes) with a malicious db_name like ..%2F..%2F... would help prevent regressions once db_name sanitization is restored.
| db_dir = base_dir / db_name | |
| # Ensure that db_name cannot be used for directory traversal outside base_dir. | |
| db_dir = (base_dir / db_name).resolve() | |
| base_dir_resolved = base_dir.resolve() | |
| try: | |
| db_dir.relative_to(base_dir_resolved) | |
| except ValueError: | |
| raise ValueError(f"Invalid database name: {db_name!r}") |
| # Fast path: if already cached, return without hitting Workspace/Shock | ||
| db_dir = base_dir / safe_db_name | ||
| db_dir = base_dir / db_name | ||
| db_path = db_dir / "tables.db" | ||
| if db_path.exists(): |
There was a problem hiding this comment.
download_db_multi now joins base_dir / db_name directly. Since db_name ultimately comes from a request path parameter, this reintroduces a path traversal risk (e.g. ../..) in the cache lookup and can lead to returning/reading a DB outside the cache dir on the fast-path. Use sanitize_id(db_name) (or an equivalent strict validation) before using it in any filesystem path, and ideally perform the cache path computation only after confirming db_name matches one of the pangenome_id values from the Workspace object.
| return { | ||
| "db_name": db_name, | ||
| "db_display_name": db_name, | ||
| "db_path": db_path, | ||
| "handle_ref": None, | ||
| "was_cached": True, | ||
| } |
There was a problem hiding this comment.
This function no longer returns the was_cached key, but at least one caller relies on it to report the response source (see app/routes.py where it checks target_db.get("was_cached")). Either keep was_cached in the return contract (recommended for backward compatibility) or update all callers and any API response fields derived from it.
app/utils/workspace.py
Outdated
| # Disable KBUtilLib for now - use direct API calls with proper timeouts | ||
| # KBUtilLib can hang indefinitely and doesn't respect timeouts | ||
| logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}") | ||
| self._use_kbutillib = False | ||
| return | ||
|
|
||
| # Original KBUtilLib code (disabled): | ||
| # try: | ||
| # if not HAS_KBUTILLIB: | ||
| # raise ImportError("KBUtilLib not found") | ||
| # ... | ||
| # except Exception as e: | ||
| # logger.warning(f"KBUtilLib not available: {e}. Using fallback.") | ||
| # self._use_kbutillib = False | ||
|
|
There was a problem hiding this comment.
This comment appears to contain commented-out code.
| # Disable KBUtilLib for now - use direct API calls with proper timeouts | |
| # KBUtilLib can hang indefinitely and doesn't respect timeouts | |
| logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}") | |
| self._use_kbutillib = False | |
| return | |
| # Original KBUtilLib code (disabled): | |
| # try: | |
| # if not HAS_KBUTILLIB: | |
| # raise ImportError("KBUtilLib not found") | |
| # ... | |
| # except Exception as e: | |
| # logger.warning(f"KBUtilLib not available: {e}. Using fallback.") | |
| # self._use_kbutillib = False | |
| # Disable KBUtilLib for now - use direct API calls with proper timeouts. | |
| # KBUtilLib can hang indefinitely and doesn't respect timeouts, so we | |
| # always fall back to the direct API implementation in this method. | |
| logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}") | |
| self._use_kbutillib = False | |
| return | |
| # Note: A previous implementation attempted to use KBUtilLib first and | |
| # fall back to direct API calls if the library was unavailable or failed. | |
| # That behavior has been intentionally removed due to stability concerns. |
app/utils/workspace.py
Outdated
| # Disable KBUtilLib for now - use direct API calls with proper timeouts | ||
| # KBUtilLib can hang indefinitely and doesn't respect timeouts | ||
| logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}") | ||
| self._use_kbutillib = False | ||
| return | ||
|
|
||
| # Original KBUtilLib code (disabled): | ||
| # try: | ||
| # if not HAS_KBUTILLIB: | ||
| # raise ImportError("KBUtilLib not found") | ||
| # ... | ||
| # except Exception as e: | ||
| # logger.warning(f"KBUtilLib not available: {e}. Using fallback.") | ||
| # self._use_kbutillib = False |
There was a problem hiding this comment.
This comment appears to contain commented-out code.
| # Disable KBUtilLib for now - use direct API calls with proper timeouts | |
| # KBUtilLib can hang indefinitely and doesn't respect timeouts | |
| logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}") | |
| self._use_kbutillib = False | |
| return | |
| # Original KBUtilLib code (disabled): | |
| # try: | |
| # if not HAS_KBUTILLIB: | |
| # raise ImportError("KBUtilLib not found") | |
| # ... | |
| # except Exception as e: | |
| # logger.warning(f"KBUtilLib not available: {e}. Using fallback.") | |
| # self._use_kbutillib = False | |
| # Disable KBUtilLib for now - use direct API calls with proper timeouts. | |
| # KBUtilLib can hang indefinitely and does not respect timeouts; if | |
| # KBUtilLib-based initialization is needed in the future, refer to | |
| # version control history for the previous implementation details. | |
| logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}") | |
| self._use_kbutillib = False | |
| return |
c7e2ae5 to
ba251d3
Compare
This pull request makes several updates to the
app/utils/workspace.pyfile, primarily focused on improving authentication handling for Workspace API calls and cleaning up some legacy or unnecessary code. The most significant changes include introducing a method to consistently generate the correct authorization header, updating API calls to use this method, and removing unused code related to path sanitization and cache flags.Authentication and API Call Improvements:
_workspace_auth_headermethod to generate the correctAuthorizationheader for Workspace API requests, ensuring Bearer tokens are handled properly._workspace_auth_headermethod instead of using the raw token, improving security and compatibility. [1] [2]Code Cleanup and Refactoring:
sanitize_idimport and eliminated the path sanitization logic for database names indownload_db_multi, simplifying the code and relying on trusted inputs. [1] [2]was_cachedflag from the return value ofdownload_db_multi, streamlining the return object. [1] [2]These changes improve the maintainability, security, and clarity of the Workspace utility code.