Skip to content

Conversation

@hiroyukinakazato-db
Copy link

Changes

This PR adds the llm-transpile command for LLM-powered SQL conversion using the Switch transpiler.

What does this PR do?

Adds llm-transpile CLI command that runs Switch transpiler jobs with parameter passing support.

Relevant implementation details

CLI Integration:

  • Add llm-transpile command to Lakebridge CLI
  • Input source validation (workspace paths and local files)
  • Parameter passing to Switch job runs

Switch Runner Implementation:

  • SwitchConfig: manages Switch resources and job ID retrieval from InstallState
  • SwitchRunner: orchestrates Switch job execution with parameters

Testing:

  • Unit tests for Switch runner with parameter verification
  • Integration tests for Switch installation lifecycle

Development Environment:

  • Add .env to .gitignore for local development credentials

Caveats/things to watch out for when reviewing:

  • Parameter design: Follows transpile and recon command patterns
  • Catalog/schema usage: Uses values configured during Switch installation (following recon pattern)
  • Output parameter naming: Uses --output-ws-folder (not --output-folder) to explicitly indicate workspace folder
  • Dependencies: Requires PR Add Switch transpiler with --include-llm-transpiler flag #2066 (Switch installation) to be merged first

Linked issues

Resolves #2047

Functionality

  • added relevant user documentation
  • added new CLI command: databricks labs lakebridge llm-transpile
  • modified existing command

Tests

  • manually tested
  • added unit tests
  • added integration tests

@hiroyukinakazato-db hiroyukinakazato-db added enhancement New feature or request feat/cli actions that are visible to the user labels Oct 6, 2025
@github-actions
Copy link

github-actions bot commented Oct 6, 2025

✅ 46/46 passed, 7 flaky, 3m5s total

Flaky tests:

  • 🤪 test_validate_non_empty_tables (10ms)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[False] (14.375s)
  • 🤪 test_transpile_teradata_sql_non_interactive[True] (15.492s)
  • 🤪 test_transpile_teradata_sql (19.182s)
  • 🤪 test_transpile_teradata_sql_non_interactive[False] (4.998s)
  • 🤪 test_transpiles_informatica_to_sparksql_non_interactive[True] (8.237s)
  • 🤪 test_transpiles_informatica_to_sparksql (9.502s)

Running from acceptance #2826

Implement llm-transpile command for LLM-based code transpilation:
- Add SwitchInstaller for Switch transpiler package management
  - Install Switch package and deploy to workspace
  - Create and manage Databricks jobs with job-level parameters
  - Configure Switch resources (catalog, schema, volume)
- Add SwitchRunner for executing Switch transpilation jobs
  - Upload source files to workspace volume
  - Execute transpilation via Databricks job
  - Download results and handle job lifecycle
- Add llm-transpile CLI command with Switch transpiler support
- Add comprehensive unit and integration tests
Move _get_switch_package_path() from WorkspaceInstallation to SwitchDeployment
as a protected method, following Single Responsibility Principle. SwitchDeployment
now resolves its own package path internally.

Changes:
- Add _get_switch_package_path() protected method to SwitchDeployment
- Update SwitchDeployment.install() signature to remove path parameter
- Remove duplicate _get_switch_package_path() from WorkspaceInstallation
- Remove unused sys and TranspilerRepository imports from installation.py
- Update tests to use new interface with mocked path resolution
Update test_installation.py to match the refactored SwitchDeployment.install()
interface that now takes only resources parameter (path resolution is internal).

Changes:
- Remove switch_repository fixture parameter from test methods
- Delete unused _StubTranspilerRepository stub class
- Remove unused imports (Path, TranspilerRepository)
- Update assertions to check only resources argument

The tests verify that:
1. Switch installation uses configured resources correctly
2. Missing resources logs appropriate error message
Sync with main branch to incorporate latest documentation updates

# Conflicts:
#	labs.yml
The wait_for_completion option is intended for local CLI execution only
and should not be included in Databricks job parameters. This change
filters it out when building job parameter definitions.

Changes:
- Add excluded_options set to filter local-only options
- Skip wait_for_completion when converting config.yml options
- Add test using FriendOfSwitchDeployment pattern to verify exclusion
…ntrol

Add test_switch_install_with_transpile for full workflow testing including
job execution and output verification. Test automatically skips without
LAKEBRIDGE_SWITCH_E2E=true environment variable.

Refactor existing test to test_switch_install and extract helper functions
for DRY implementation. All changes in single file with no CI impact.
Merged latest changes from main branch including:
- Profiler skeleton and Synapse profiler scripts
- Transpiler product_name to transpiler_id rename
- Additional transpile command arguments support
- Test improvements for MSSQL and transpiler repository
@sundarshankar89 sundarshankar89 changed the base branch from main to switch-integration October 27, 2025 06:31
@sundarshankar89 sundarshankar89 added the stacked PR Should be reviewed, but not merged label Oct 27, 2025
@sundarshankar89 sundarshankar89 changed the base branch from switch-integration to main October 27, 2025 06:32
Copy link
Collaborator

@sundarshankar89 sundarshankar89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

) -> None:
"""Transpile source code to Databricks using LLM Transpiler (Switch)"""
ctx = ApplicationContext(w)
ctx.add_user_agent_extra("cmd", "llm-transpile")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx.add_user_agent_extra("cmd", "llm-transpile")
ctx.add_user_agent_extra("cmd", "switch")

from databricks.labs.lakebridge.transpiler.switch_runner import SwitchConfig, SwitchRunner


class FriendOfSwitchRunner(SwitchRunner):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class FriendOfSwitchRunner(SwitchRunner):
class MockSwitchRunner(SwitchRunner):

Copy link
Collaborator

@gueniai gueniai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if ctx is None:
ctx = ApplicationContext(w)
del w
ctx.add_user_agent_extra("cmd", "transpile-switch")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the command is llm-transpile

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feat/cli actions that are visible to the user stacked PR Should be reviewed, but not merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add Switch transpiler CLI integration and testing

4 participants