Skip to content

fix: ensure KBase Workspace auth header uses Bearer prefix#12

Merged
VibhavSetlur merged 1 commit intodevelopfrom
fix/kbase-auth-bearer
Feb 12, 2026
Merged

fix: ensure KBase Workspace auth header uses Bearer prefix#12
VibhavSetlur merged 1 commit intodevelopfrom
fix/kbase-auth-bearer

Conversation

@VibhavSetlur
Copy link
Collaborator

This pull request makes several updates to the app/utils/workspace.py file, primarily focused on improving authentication handling for Workspace API calls and cleaning up some legacy or unnecessary code. The most significant changes include introducing a method to consistently generate the correct authorization header, updating API calls to use this method, and removing unused code related to path sanitization and cache flags.

Authentication and API Call Improvements:

  • Added a new _workspace_auth_header method to generate the correct Authorization header for Workspace API requests, ensuring Bearer tokens are handled properly.
  • Updated all direct Workspace API calls to use the new _workspace_auth_header method instead of using the raw token, improving security and compatibility. [1] [2]

Code Cleanup and Refactoring:

  • Removed the unused sanitize_id import and eliminated the path sanitization logic for database names in download_db_multi, simplifying the code and relying on trusted inputs. [1] [2]
  • Removed the was_cached flag from the return value of download_db_multi, streamlining the return object. [1] [2]
  • Clarified comments and disabled the legacy KBUtilLib client initialization code, making it clear that only direct API calls are used for now.

These changes improve the maintainability, security, and clarity of the Workspace utility code.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the KBase Workspace utility client to consistently send properly formatted Authorization headers (Bearer) for Workspace API calls, while also removing some legacy/unused code in the workspace database download path.

Changes:

  • Added _workspace_auth_header() and switched Workspace JSON-RPC calls to use it (Bearer prefix handling).
  • Removed sanitize_id usage for per-database cache subdirectories in download_db_multi.
  • Removed was_cached from download_db_multi’s return payload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +133 to +135
t = self.token or ""
if t.startswith("Bearer ") or t.startswith("OAuth "):
return t
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_workspace_auth_header currently passes through tokens prefixed with OAuth , but the Workspace API expects Bearer auth (per the docstring and PR intent). If a caller accidentally supplies an OAuth ... header value, this will preserve the wrong scheme and likely fail. Consider normalizing by stripping any leading Bearer / OAuth prefix from self.token and always returning Bearer <token> for Workspace requests.

Suggested change
t = self.token or ""
if t.startswith("Bearer ") or t.startswith("OAuth "):
return t
t = (self.token or "").strip()
# Normalize any existing scheme to Bearer by stripping known prefixes
if t.startswith("Bearer "):
t = t[len("Bearer ") :].lstrip()
elif t.startswith("OAuth "):
t = t[len("OAuth ") :].lstrip()

Copilot uses AI. Check for mistakes.

# Fast path: if already cached, return without hitting Workspace/Shock
db_dir = base_dir / safe_db_name
db_dir = base_dir / db_name
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are integration/security tests for path traversal protections (e.g. sanitize_id), but nothing appears to cover traversal via the multi-db db_name parameter. Adding a test that exercises download_db_multi (or the /db/{db_name}/... routes) with a malicious db_name like ..%2F..%2F... would help prevent regressions once db_name sanitization is restored.

Suggested change
db_dir = base_dir / db_name
# Ensure that db_name cannot be used for directory traversal outside base_dir.
db_dir = (base_dir / db_name).resolve()
base_dir_resolved = base_dir.resolve()
try:
db_dir.relative_to(base_dir_resolved)
except ValueError:
raise ValueError(f"Invalid database name: {db_name!r}")

Copilot uses AI. Check for mistakes.
Comment on lines 659 to 662
# Fast path: if already cached, return without hitting Workspace/Shock
db_dir = base_dir / safe_db_name
db_dir = base_dir / db_name
db_path = db_dir / "tables.db"
if db_path.exists():
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_db_multi now joins base_dir / db_name directly. Since db_name ultimately comes from a request path parameter, this reintroduces a path traversal risk (e.g. ../..) in the cache lookup and can lead to returning/reading a DB outside the cache dir on the fast-path. Use sanitize_id(db_name) (or an equivalent strict validation) before using it in any filesystem path, and ideally perform the cache path computation only after confirming db_name matches one of the pangenome_id values from the Workspace object.

Copilot uses AI. Check for mistakes.
Comment on lines 663 to 668
return {
"db_name": db_name,
"db_display_name": db_name,
"db_path": db_path,
"handle_ref": None,
"was_cached": True,
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function no longer returns the was_cached key, but at least one caller relies on it to report the response source (see app/routes.py where it checks target_db.get("was_cached")). Either keep was_cached in the return contract (recommended for backward compatibility) or update all callers and any API response fields derived from it.

Copilot uses AI. Check for mistakes.
Comment on lines 69 to 83
# Disable KBUtilLib for now - use direct API calls with proper timeouts
# KBUtilLib can hang indefinitely and doesn't respect timeouts
logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}")
self._use_kbutillib = False
return

# Original KBUtilLib code (disabled):
# try:
# if not HAS_KBUTILLIB:
# raise ImportError("KBUtilLib not found")
# ...
# except Exception as e:
# logger.warning(f"KBUtilLib not available: {e}. Using fallback.")
# self._use_kbutillib = False

Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# Disable KBUtilLib for now - use direct API calls with proper timeouts
# KBUtilLib can hang indefinitely and doesn't respect timeouts
logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}")
self._use_kbutillib = False
return
# Original KBUtilLib code (disabled):
# try:
# if not HAS_KBUTILLIB:
# raise ImportError("KBUtilLib not found")
# ...
# except Exception as e:
# logger.warning(f"KBUtilLib not available: {e}. Using fallback.")
# self._use_kbutillib = False
# Disable KBUtilLib for now - use direct API calls with proper timeouts.
# KBUtilLib can hang indefinitely and doesn't respect timeouts, so we
# always fall back to the direct API implementation in this method.
logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}")
self._use_kbutillib = False
return
# Note: A previous implementation attempted to use KBUtilLib first and
# fall back to direct API calls if the library was unavailable or failed.
# That behavior has been intentionally removed due to stability concerns.

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +82
# Disable KBUtilLib for now - use direct API calls with proper timeouts
# KBUtilLib can hang indefinitely and doesn't respect timeouts
logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}")
self._use_kbutillib = False
return

# Original KBUtilLib code (disabled):
# try:
# if not HAS_KBUTILLIB:
# raise ImportError("KBUtilLib not found")
# ...
# except Exception as e:
# logger.warning(f"KBUtilLib not available: {e}. Using fallback.")
# self._use_kbutillib = False
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# Disable KBUtilLib for now - use direct API calls with proper timeouts
# KBUtilLib can hang indefinitely and doesn't respect timeouts
logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}")
self._use_kbutillib = False
return
# Original KBUtilLib code (disabled):
# try:
# if not HAS_KBUTILLIB:
# raise ImportError("KBUtilLib not found")
# ...
# except Exception as e:
# logger.warning(f"KBUtilLib not available: {e}. Using fallback.")
# self._use_kbutillib = False
# Disable KBUtilLib for now - use direct API calls with proper timeouts.
# KBUtilLib can hang indefinitely and does not respect timeouts; if
# KBUtilLib-based initialization is needed in the future, refer to
# version control history for the previous implementation details.
logger.info(f"Using direct API calls (KBUtilLib disabled) for {self.kb_env}")
self._use_kbutillib = False
return

Copilot uses AI. Check for mistakes.
@VibhavSetlur VibhavSetlur merged commit 3013d5d into develop Feb 12, 2026
6 checks passed
@VibhavSetlur VibhavSetlur deleted the fix/kbase-auth-bearer branch February 13, 2026 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants