Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 75 additions & 122 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,8 @@
<div align="center">

### 🤖 **Built by [Markov](https://markov.bot)**
**When AI changes everything, you start from scratch.**

*Markov specializes in cutting-edge AI solutions and automation. From neural ledgers to MCP servers,
we're building the tools that power the next generation of AI-driven applications.*

💼 **We're always hiring exceptional engineers!** Join us in shaping the future of AI.

**[🌐 Visit markov.bot](https://markov.bot) • [✉️ Get in Touch](mailto:[email protected]) • [🚀 Careers](mailto:[email protected]?subject=Engineering%20Career%20Opportunity)**

</div>

<br>

# Databricks MCP Server

A Model Completion Protocol (MCP) server for Databricks that provides access to Databricks functionality via the MCP protocol. This allows LLM-powered tools to interact with Databricks clusters, jobs, notebooks, and more.

> **Version 0.3.1** - Latest release with issue #9 fix and enhanced MCP client compatibility.

## 🚀 One-Click Install

### For Cursor Users
**Click this link to install instantly:**
```
cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==
```

**Or copy and paste this deeplink:**
`cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==`

**[→ Install Databricks MCP in Cursor ←](cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==)**

This project is maintained by Olivier Debeuf De Rijcker <[email protected]>.

Credit for the initial version goes to [@JustTryAI](https://github.com/JustTryAI/databricks-mcp-server).
Credit for the original version goes to [@markov-kernel](https://github.com/markov-kernel/databricks-mcp).

## Features

Expand All @@ -49,13 +16,15 @@ Credit for the initial version goes to [@JustTryAI](https://github.com/JustTryAI
The Databricks MCP Server exposes the following tools:

### Cluster Management

- **list_clusters**: List all Databricks clusters
- **create_cluster**: Create a new Databricks cluster
- **terminate_cluster**: Terminate a Databricks cluster
- **get_cluster**: Get information about a specific Databricks cluster
- **start_cluster**: Start a terminated Databricks cluster

### Job Management

- **list_jobs**: List all Databricks jobs
- **run_job**: Run a Databricks job
- **run_notebook**: Submit and wait for a one-time notebook run
Expand All @@ -66,6 +35,7 @@ The Databricks MCP Server exposes the following tools:
- **cancel_run**: Cancel a running job

### Workspace Files

- **list_notebooks**: List notebooks in a workspace directory
- **export_notebook**: Export a notebook from the workspace
- **import_notebook**: Import a notebook into the workspace
Expand All @@ -74,22 +44,26 @@ The Databricks MCP Server exposes the following tools:
- **get_workspace_file_info**: Get metadata about workspace files

### File System

- **list_files**: List files and directories in a DBFS path
- **dbfs_put**: Upload a small file to DBFS
- **dbfs_delete**: Delete a DBFS file or directory

### Cluster Libraries

- **install_library**: Install libraries on a cluster
- **uninstall_library**: Remove libraries from a cluster
- **list_cluster_libraries**: Check installed libraries on a cluster

### Repos

- **create_repo**: Clone a Git repository
- **update_repo**: Update an existing repo
- **list_repos**: List repos in the workspace
- **pull_repo**: Pull the latest commit for a Databricks repo

### Unity Catalog

- **list_catalogs**: List catalogs
- **create_catalog**: Create a catalog
- **list_schemas**: List schemas in a catalog
Expand All @@ -99,45 +73,15 @@ The Databricks MCP Server exposes the following tools:
- **get_table_lineage**: Fetch lineage information for a table

### Composite

- **sync_repo_and_run_notebook**: Pull a repo and execute a notebook in one call

### SQL Execution
- **execute_sql**: Execute a SQL statement (warehouse_id optional if DATABRICKS_WAREHOUSE_ID env var is set)

## 🎉 Recent Updates (v0.3.0)

**New Features - Repo Sync & Notebook Execution:**
- ✅ **Repository Management**: Pull latest commits from Databricks repos with `pull_repo` tool
- ✅ **One-time Notebook Execution**: Submit and wait for notebook runs with `run_notebook` tool
- ✅ **Composite Operations**: Combined repo sync + notebook execution with `sync_repo_and_run_notebook` tool
- ✅ **Enhanced Job Management**: Extended job APIs with submit, status checking, and run management
- ✅ **Comprehensive Testing**: Full test coverage for all new functionality

**Bug Fixes:**
- ✅ **Issue #9 Fixed**: Resolved "Missing required parameter 'params'" error in Cursor and other MCP clients
- ✅ **Parameter Handling**: All MCP tools now correctly handle both nested and flat parameter structures
- ✅ **Cursor Compatibility**: Full compatibility with Cursor's MCP implementation

**Previous Updates:**
- **v0.2.1**: Enhanced Codespaces support, documentation improvements, publishing process streamlining
- **v0.2.0**: Major package refactoring from `src/` to `databricks_mcp/` structure

**Backwards Compatibility:** All existing MCP tools continue to work unchanged. New features extend functionality without breaking changes.
- **execute_sql**: Execute a SQL statement (warehouse_id optional if DATABRICKS_WAREHOUSE_ID env var is set)

## Installation

### Quick Install (Recommended)

Use the link above to install with one click:

**[→ Install Databricks MCP in Cursor ←](cursor://anysphere.cursor-deeplink/mcp/install?name=databricks-mcp&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyJkYXRhYnJpY2tzLW1jcC1zZXJ2ZXIiXSwiZW52Ijp7IkRBVEFCUklDS1NfSE9TVCI6IiR7REFUQUJSSUNLU19IT1NUfSIsIkRBVEFCUklDS1NfVE9LRU4iOiIke0RBVEFCUklDS1NfVE9LRU59IiwiREFUQUJSSUNLU19XQVJFSE9VU0VfSUQiOiIke0RBVEFCUklDS1NfV0FSRUhPVVNFX0lEfSJ9fQ==)**

This will automatically install the MCP server using `uvx` and configure it in Cursor. You'll need to set these environment variables:

- `DATABRICKS_HOST` - Your Databricks workspace URL
- `DATABRICKS_TOKEN` - Your Databricks personal access token
- `DATABRICKS_WAREHOUSE_ID` - (Optional) Your default SQL warehouse ID

### Manual Installation

#### Prerequisites
Expand All @@ -160,12 +104,14 @@ This will automatically install the MCP server using `uvx` and configure it in C
Restart your terminal after installation.

2. Clone the repository:

```bash
git clone https://github.com/markov-kernel/databricks-mcp.git
git clone https://github.com/JuanjoFuchs/databricks-mcp.git
cd databricks-mcp
```

3. Run the setup script:

```bash
# Linux/Mac
./scripts/setup.sh
Expand All @@ -180,65 +126,68 @@ This will automatically install the MCP server using `uvx` and configure it in C
- Install all project dependencies
- Verify the installation works

**Alternative manual setup:**
```bash
# Create and activate virtual environment
uv venv

# On Windows
.\.venv\Scripts\activate

# On Linux/Mac
source .venv/bin/activate

# Install dependencies in development mode
uv pip install -e .

# Install development dependencies
uv pip install -e ".[dev]"
```
## Running the MCP Server

4. Set up environment variables:
```bash
# Required variables
# Windows
set DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net
set DATABRICKS_TOKEN=your-personal-access-token

# Linux/Mac
export DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net
export DATABRICKS_TOKEN=your-personal-access-token

# Optional: Set default SQL warehouse (makes warehouse_id optional in execute_sql)
export DATABRICKS_WAREHOUSE_ID=sql_warehouse_12345
```
### Integrating with AI Clients

You can also create an `.env` file based on the `.env.example` template.
To use this server with AI clients like Cursor or Claude CLI, you need to register it.

## Running the MCP Server
#### VS Code Setup

### Standalone
1. Install the **GitHub Copilot** extension in VS Code if you haven't already.

To start the MCP server directly for testing or development, run:
2. Open your VS Code MCP configuration file. You can do this by:
- Opening the Command Palette (`Ctrl+Shift+P` / `Cmd+Shift+P`)
- Running the command: `MCP: Open User Configuration`
- Or manually editing the file at:
- **Windows**: `%APPDATA%\Code\User\mcp.json`
- **macOS**: `~/Library/Application Support/Code/User/mcp.json`
- **Linux**: `~/.config/Code/User/mcp.json`

```bash
# Activate your virtual environment if not already active
source .venv/bin/activate
3. Add the following entry to the `servers` object in your `mcp.json` file:

# Run the start script (handles finding env vars from .env if needed)
./scripts/start_mcp_server.sh
```
```json
{
"servers": {
"databricks-mcp-local": {
"type": "stdio",
"command": "pwsh",
"args": [
"-NoProfile",
"-File",
"D:\\path\\to\\your\\databricks-mcp\\scripts\\start_mcp_server.ps1",
"-SkipPrompt"
],
"gallery": true,
"env": {
"DATABRICKS_HOST": "https://your-databricks-instance.azuredatabricks.net",
"DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"DATABRICKS_WAREHOUSE_ID": "sql_warehouse_12345"
}
}
}
}
```

This is useful for seeing direct output and logs.
4. **Important Configuration Notes:**
- Replace `D:\\path\\to\\your\\databricks-mcp\\` with the actual absolute path to this project directory
- Replace the environment variable values with your actual Databricks credentials
- On **macOS/Linux**, change the command to `"bash"` and update the script path to use forward slashes and the `.sh` extension:
```json
"command": "bash",
"args": [
"/absolute/path/to/your/databricks-mcp/scripts/start_mcp_server.sh"
]
```

### Integrating with AI Clients
5. Save the file and **restart VS Code**.

To use this server with AI clients like Cursor or Claude CLI, you need to register it.
6. You can now use Databricks tools in VS Code through GitHub Copilot by mentioning the tools in your prompts (e.g., "list my Databricks clusters" or "execute this SQL query on Databricks").

#### Cursor Setup

1. Open your global MCP configuration file located at `~/.cursor/mcp.json` (create it if it doesn't exist).
2. Add the following entry within the `mcpServers` object, replacing placeholders with your actual values and ensuring the path to `start_mcp_server.sh` is correct:
1. Open your global MCP configuration file located at `~/.cursor/mcp.json` (create it if it doesn't exist).
2. Add the following entry within the `mcpServers` object, replacing placeholders with your actual values and ensuring the path to `start_mcp_server.sh` is correct:

```json
{
Expand All @@ -259,15 +208,15 @@ To use this server with AI clients like Cursor or Claude CLI, you need to regist
}
```

3. **Important:** Replace `/absolute/path/to/your/project/databricks-mcp-server/` with the actual absolute path to this project directory on your machine.
4. Replace the `DATABRICKS_HOST` and `DATABRICKS_TOKEN` values with your credentials.
5. Save the file and **restart Cursor**.
3. **Important:** Replace `/absolute/path/to/your/project/databricks-mcp-server/` with the actual absolute path to this project directory on your machine.
4. Replace the `DATABRICKS_HOST` and `DATABRICKS_TOKEN` values with your credentials.
5. Save the file and **restart Cursor**.

6. You can now invoke tools using `databricks-mcp-local:<tool_name>` (e.g., `databricks-mcp-local:list_jobs`).
6. You can now invoke tools using `databricks-mcp-local:<tool_name>` (e.g., `databricks-mcp-local:list_jobs`).

#### Claude CLI Setup

1. Use the `claude mcp add` command to register the server. Provide your credentials using the `-e` flag for environment variables and point the command to the `start_mcp_server.sh` script using `--` followed by the absolute path:
1. Use the `claude mcp add` command to register the server. Provide your credentials using the `-e` flag for environment variables and point the command to the `start_mcp_server.sh` script using `--` followed by the absolute path:

```bash
claude mcp add databricks-mcp-local \
Expand All @@ -278,10 +227,10 @@ To use this server with AI clients like Cursor or Claude CLI, you need to regist
-- /absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.sh
```

2. **Important:** Replace `/absolute/path/to/your/project/databricks-mcp-server/` with the actual absolute path to this project directory on your machine.
3. Replace the `DATABRICKS_HOST` and `DATABRICKS_TOKEN` values with your credentials.
2. **Important:** Replace `/absolute/path/to/your/project/databricks-mcp-server/` with the actual absolute path to this project directory on your machine.
3. Replace the `DATABRICKS_HOST` and `DATABRICKS_TOKEN` values with your credentials.

4. You can now invoke tools using `databricks-mcp-local:<tool_name>` in your Claude interactions.
4. You can now invoke tools using `databricks-mcp-local:<tool_name>` in your Claude interactions.

## Querying Databricks Resources

Expand All @@ -298,6 +247,7 @@ uv run scripts/show_notebooks.py
## Usage Examples

### SQL Execution with Default Warehouse

```python
# With DATABRICKS_WAREHOUSE_ID set, warehouse_id is optional
await session.call_tool("execute_sql", {
Expand All @@ -312,6 +262,7 @@ await session.call_tool("execute_sql", {
```

### Workspace File Content Retrieval

```python
# Get JSON file content from workspace
await session.call_tool("get_workspace_file_content", {
Expand All @@ -331,6 +282,7 @@ await session.call_tool("get_workspace_file_info", {
```

### Repo Sync and Notebook Execution

```python
await session.call_tool("sync_repo_and_run_notebook", {
"repo_id": 123,
Expand All @@ -339,6 +291,7 @@ await session.call_tool("sync_repo_and_run_notebook", {
```

### Create Nightly ETL Job

```python
job_conf = {
"name": "Nightly ETL",
Expand Down Expand Up @@ -488,4 +441,4 @@ This project is licensed under the MIT License - see the LICENSE file for detail

## About

A Model Completion Protocol (MCP) server for interacting with Databricks services. Maintained by markov.bot.
A Model Completion Protocol (MCP) server for interacting with Databricks services. Maintained by markov.bot.
Loading