diff --git a/.cursor/rules/default.md b/.cursor/rules/default.md new file mode 100644 index 0000000..88973c5 --- /dev/null +++ b/.cursor/rules/default.md @@ -0,0 +1,54 @@ +# BigQuery MCP Development Standards + +## Code Quality +- Write clean, efficient, minimal code +- No unnecessary complexity or over-engineering +- Follow existing patterns and conventions +- Use descriptive variable names, avoid abbreviations + +## Version Control +- Update version numbers consistently across all files when making releases +- Use semantic versioning (MAJOR.MINOR.PATCH) +- Keep commits focused and atomic +- Write clear, concise commit messages + +## Code Standards +- Run `ruff format` before committing +- Run `ruff check --fix` to resolve linting issues +- Add type hints for function parameters and returns +- Include docstrings for public functions and classes + +## Testing +- Add tests for new functionality +- Run `pytest` before committing +- Ensure all tests pass in CI + +## Documentation +- Keep documentation minimal and focused +- Update relevant docs when changing functionality +- Use clear, direct language - no fluff or emojis +- Focus on what users need to know, not implementation details + +## Changelog Formatting +- Use clean, elegant formatting without bold text +- Write concise, descriptive entries +- No unnecessary formatting or emphasis +- Keep entries focused on what actually changed + +## Architecture +- Prefer command-line arguments over config files +- Keep dependencies minimal +- Follow MCP protocol standards +- Maintain backwards compatibility when possible + +## Error Handling +- Use custom exception classes for specific error types +- Provide actionable error messages +- Log errors appropriately for debugging + +## User Preferences +- Avoid over-engineering and unnecessary complexity +- Don't overthink security implementations unless specifically requested +- Focus on what's needed, not what could be possible +- Maintain high repository standards suitable for GitHub +- Keep configuration clean with proper .example files and gitignored personal configs \ No newline at end of file diff --git a/.env.example b/.env.example index cc679f1..4e54942 100644 --- a/.env.example +++ b/.env.example @@ -1,23 +1,16 @@ # BigQuery MCP Server Environment Variables +# These variables override config file settings -# Response formatting (optional) -# Set to 'true' for compact responses optimized for LLMs -COMPACT_FORMAT=false - -# Override billing project (optional) -# If not set, uses the value from config.yaml -# BIGQUERY_BILLING_PROJECT=your-project-id +# BigQuery configuration +BIGQUERY_BILLING_PROJECT=your-project-id +BIGQUERY_LOCATION=EU -# Service account credentials (optional) -# If not set, uses Application Default Credentials -# GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json +# Google Cloud credentials +GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json -# Logging level -# Options: DEBUG, INFO, WARNING, ERROR -LOG_LEVEL=INFO - -# Maximum query execution time in seconds (optional) -# MAX_QUERY_TIMEOUT=60 +# Response formatting +COMPACT_FORMAT=false -# Maximum rows to return in query results (optional) -# MAX_LIMIT=10000 +# Logging configuration +LOG_QUERIES=true +LOG_RESULTS=false diff --git a/.gitignore b/.gitignore index 67a9bd2..0f57359 100644 --- a/.gitignore +++ b/.gitignore @@ -51,11 +51,11 @@ coverage.xml .env.local .env.*.local -# Configuration +# Configuration files (contain personal project IDs) config/config.yaml config/config.dev.yaml -# Docker Compose (contains personal project IDs) +# Docker Compose files (contain personal project IDs) docker-compose.yml docker-compose.dev.yml @@ -69,6 +69,7 @@ development-notes/ # Tasks file TASKS.md TESTING.md +SETUP-GUIDE.md # OS files .DS_Store diff --git a/CHANGELOG.md b/CHANGELOG.md index fd68960..e96d617 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,56 +5,49 @@ All notable changes to the BigQuery MCP Server project will be documented in thi The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [1.1.1] - 2025-07-17 + +### Added +- 20-second default timeout (reduced from 60 seconds) with configurable `--timeout` CLI argument +- Clean configuration structure with proper .example files and gitignored personal configs +- Comprehensive CLI arguments for all configuration options with proper precedence (CLI > Config > Env > Defaults) +- Enterprise pattern system supporting multiple projects with dataset patterns +- AI-friendly error messages with source classification and actionable suggestions + +### Fixed +- Runtime crashes from missing `log_results` attribute in Config class +- Error handling bugs in QueryExecutionError constructor +- Configuration inconsistencies between config files and examples +- Version management removed from config files (now code-only) +- Configuration file structure inconsistencies +- Docker setup confusion - simplified to single service architecture +- Broken documentation links and outdated examples + +### Changed +- Simplified Docker setup to single `bigquery-mcp` service +- Moved setup guide from root to docs/setup.md +- Updated README to be more concise and to-the-point +- Cleaned up repository structure - removed redundant files + ## [1.1.0] - 2025-07-16 ### Added -- Command-line argument support for project configuration - - Direct CLI specification of project:dataset patterns - - Preferred over config file approach for easier deployment - - Example: `python src/server.py project1:dataset_* project2:table_*` -- Query progress indication and complexity estimation - - Query complexity estimation (simple, moderate, complex, very_complex) - - Execution time tracking and logging for performance monitoring - - Progress feedback for long-running queries -- Comprehensive parameter documentation in tools.md - - All tool parameters documented with types and examples - - Automatic type conversion explanation for MCP protocol compatibility - - Error response format documentation +- CLI argument support for project configuration with `project:dataset` patterns +- Query complexity estimation and execution time tracking +- Comprehensive parameter documentation ### Fixed -- Critical parameter type validation errors ("max_rows must be integer") - - Automatic string-to-integer conversion for max_rows, timeout, sample_size parameters - - Enhanced parameter validation in execute_query() and analyze_columns() - - Fixed MCP protocol compatibility where agents pass strings instead of integers -- analyze_columns intermittent failures ("No result received from client-side tool execution") - - Added SAFE.* functions to prevent calculation errors in BigQuery - - Implemented 60-second query timeouts with proper error handling - - Enhanced sampling queries with better NULL handling - - Improved fallback analysis for failed queries -- Enhanced error handling with more specific and actionable error messages -- Complex data type display issues (JSON/Array serialization) - - Enhanced _serialize_value() function with proper NULL filtering - - Fixed "Array cannot have a null element" errors in BigQuery results +- Parameter type validation errors for MCP protocol compatibility +- analyze_columns intermittent failures with enhanced NULL handling +- Complex data type serialization issues - Parameter naming inconsistencies across configuration files - - Standardized to use `default_limit` and `max_limit` consistently - - Updated all config files, tests, and environment examples ### Changed -- Consolidated validation logic - removed redundant query validation - - Eliminated duplicate validation between _validate_query_safety() and SQLValidator - - All SQL validation now consolidated into SQLValidator class -- Improved tool registration with enhanced debugging and reliability -- Configuration file approach marked as deprecated in favor of CLI arguments +- Consolidated validation logic into SQLValidator class +- Improved tool registration with enhanced debugging ### Removed - Duplicate and redundant tools for cleaner architecture - - Removed `list_allowed_projects()` - redundant with `list_projects()` - - Removed `list_accessible_projects()` - redundant functionality - - Removed `get_current_context()` - identified as unnecessary complexity - - Deleted entire `context.py` file containing overkill context management -- Cleaned up temporary test files from repository root - - Deleted test_banned_keywords.py, test_execution_check.py, test_fix.py, test_union_validation.py - - All functionality preserved in proper unit test suite ## [1.0.0] - 2025-07-10 diff --git a/README-Docker.md b/README-Docker.md deleted file mode 100644 index 2b33612..0000000 --- a/README-Docker.md +++ /dev/null @@ -1,30 +0,0 @@ -# Docker Setup - -## Configuration Templates - -This repository includes template files for Docker configuration: - -- `docker-compose.yml.example` - Template for production Docker Compose setup -- `docker-compose.dev.yml.example` - Template for development Docker Compose setup - -## Setup Instructions - -1. Copy the template files to create your local configuration: - ```bash - cp docker-compose.yml.example docker-compose.yml - cp docker-compose.dev.yml.example docker-compose.dev.yml - ``` - -2. Edit the files to replace placeholder values with your actual project IDs: - - Replace `your-project-id` with your actual BigQuery project ID - - Replace `your-billing-project` with your billing project ID - - Update dataset patterns as needed - -3. The actual `docker-compose.yml` and `docker-compose.dev.yml` files are in `.gitignore` to prevent committing personal project information. - -## CLI vs Config File - -- **CLI approach (recommended)**: Use `docker-compose.yml` with command-line arguments -- **Config file approach (deprecated)**: Use `docker-compose.yml` with config files - -Both approaches are supported for backward compatibility. \ No newline at end of file diff --git a/README.md b/README.md index 1428e4a..54bdd1f 100644 --- a/README.md +++ b/README.md @@ -1,97 +1,51 @@ # BigQuery MCP Server -A production-ready Model Context Protocol server that provides secure, cross-project access to BigQuery datasets. Built with FastMCP for Python, enabling LLMs to explore data, analyze schemas, and execute queries across multiple Google Cloud projects. +Production-ready Model Context Protocol server for secure BigQuery access across multiple Google Cloud projects. -## Key Features +## Features -- **Cross-Project Access** - Query data across multiple BigQuery projects with a single connection -- **Advanced Analytics** - Column-level analysis for nulls, cardinality, and data quality -- **Safety Controls** - SQL validation, query limits, and read-only operations -- **Token Optimization** - Compact response formats designed for LLM efficiency -- **Flexible Configuration** - YAML-based project and dataset access control -- **Docker Support** - Containerized deployment for easy integration +- **Multi-Project Access** - Query across BigQuery projects with pattern matching +- **Advanced Analytics** - Column analysis, data quality checks, schema exploration +- **Security Controls** - SQL validation, query limits, read-only operations +- **CLI-First Configuration** - Command-line arguments with config file fallback +- **Docker Ready** - Containerized deployment for easy integration ## Quick Start ### Prerequisites - - Python 3.11+ -- Google Cloud SDK with BigQuery access -- Docker (optional, for containerized deployment) +- Google Cloud SDK +- Docker (optional) -### Installation +### Setup -1. **Clone the repository:** +1. **Clone and install:** ```bash git clone https://github.com/aicayzer/bigquery-mcp.git cd bigquery-mcp - ``` - -2. **Install dependencies:** - ```bash pip install -r requirements.txt ``` -3. **Configure authentication:** +2. **Authenticate:** ```bash - # Option 1: Application Default Credentials (recommended) gcloud auth application-default login - - # Option 2: Service Account (for production) - export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json ``` -4. **Run the server:** +3. **Run:** ```bash - # Using command-line arguments (recommended) - python src/server.py sandbox-dev:dev_* sandbox-main:main_* + # CLI (recommended) + python src/server.py --project "your-project:*" --billing-project "your-project" - # Or using config file (deprecated) - cp config/config.yaml.example config/config.yaml - # Edit config.yaml with your project details - python src/server.py + # Docker + docker build -t bigquery-mcp . + docker run -v ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro bigquery-mcp \ + python src/server.py --project "your-project:*" --billing-project "your-project" ``` -### Docker Deployment - -```bash -# Using CLI arguments (recommended) -docker-compose up bigquery-mcp-cli --build - -# Using config file (deprecated) -docker-compose up bigquery-mcp-config --build - -# Custom project patterns -docker run -it --rm \ - -v ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro \ - -e BIGQUERY_BILLING_PROJECT=your-project \ - bigquery-mcp:latest \ - python src/server.py your-project:your_dataset_* -``` - -## Available Tools - -The server provides 6 core tools for BigQuery interaction: - -### Discovery Tools -- **`list_projects()`** - List configured BigQuery projects -- **`list_datasets(project)`** - List datasets in a project -- **`list_tables(dataset, table_type)`** - List tables in a dataset +## MCP Client Setup -### Analysis Tools -- **`analyze_table(table)`** - Get table structure and statistics -- **`analyze_columns(table, columns, include_examples, sample_size)`** - Deep column analysis - -### Query Execution -- **`execute_query(query, format, limit, timeout, dry_run, parameters)`** - Execute SELECT queries - -## Integration Examples - -### MCP Client Setup - -For complete setup instructions with Claude Desktop, Cursor IDE, and other MCP clients, see the **[Client Setup Guide](docs/setup.md)**. - -Quick Docker configuration example: +### Claude Desktop +Add to `~/.config/claude/claude_desktop_config.json`: ```json { "mcpServers": { @@ -99,89 +53,99 @@ Quick Docker configuration example: "command": "docker", "args": [ "run", "--rm", "-i", - "--env", "BIGQUERY_BILLING_PROJECT=your-project", - "--volume", "~/.config/gcloud:/home/mcpuser/.config/gcloud:ro", + "--volume", "/Users/YOUR_USERNAME/.config/gcloud:/home/mcpuser/.config/gcloud:ro", + "--volume", "/ABSOLUTE/PATH/TO/bigquery-mcp/logs:/app/logs", "bigquery-mcp:latest", - "python", "src/server.py", "your-project:your_dataset_*" + "python", "src/server.py", + "--project", "your-project:*", + "--billing-project", "your-project" ] } } } ``` -## Documentation +### Cursor IDE +Add to MCP settings: +```json +{ + "bigquery": { + "command": "docker", + "args": [ + "run", "--rm", "-i", + "--volume", "/Users/YOUR_USERNAME/.config/gcloud:/home/mcpuser/.config/gcloud:ro", + "bigquery-mcp:latest", + "python", "src/server.py", + "--project", "your-project:*", + "--billing-project", "your-project" + ] + } +} +``` -📚 **Complete Documentation** +## Tools -- **[Installation Guide](docs/installation.md)** - Detailed installation and setup -- **[Client Setup Guide](docs/setup.md)** - Claude Desktop, Cursor IDE, and other MCP clients -- **[Tools Reference](docs/tools.md)** - Complete tool documentation with examples -- **[Configuration Guide](docs/configuration.md)** - YAML configuration and environment variables +- **`list_projects()`** - List configured BigQuery projects +- **`list_datasets(project)`** - List datasets in a project +- **`list_tables(dataset, table_type)`** - List tables in a dataset +- **`analyze_table(table)`** - Get table structure and statistics +- **`analyze_columns(table, columns, sample_size)`** - Deep column analysis +- **`execute_query(query, format, limit, timeout)`** - Execute SELECT queries -### Building Documentation Locally +## Configuration +### CLI Arguments ```bash -# Install documentation dependencies -pip install mkdocs mkdocs-material - -# Serve documentation locally -mkdocs serve - -# Open http://localhost:8000 in your browser +python src/server.py \ + --project "analytics-prod:user_*,session_*" \ + --project "logs-prod:application_*" \ + --billing-project "my-billing-project" \ + --log-level INFO \ + --timeout 300 \ + --max-limit 50000 ``` -## Testing - -```bash -# Run all tests -pytest +### Config File (Optional) +```yaml +# config/config.yaml +bigquery: + billing_project: "your-project" + location: "US" + +projects: + - project_id: "analytics-prod" + datasets: ["user_*", "session_*"] + - project_id: "logs-prod" + datasets: ["application_*"] + +limits: + max_limit: 10000 + max_query_timeout: 60 +``` -# Run with coverage -pytest --cov=src tests/ +## Documentation -# Run specific test file -pytest tests/unit/test_discovery.py -``` +- **[Setup Guide](docs/setup.md)** - Detailed installation and configuration +- **[AI Setup Assistant](docs/ai-setup.md)** - ChatGPT-powered configuration helper +- **[Tools Reference](docs/tools.md)** - Complete API documentation +- **[Configuration](docs/configuration.md)** - All configuration options ## Development +```bash +# Install dev dependencies +pip install -r requirements.txt +# Run tests +pytest -### Code Quality - -```bash # Format code -ruff format src tests +ruff format -# Check and fix linting issues -ruff check src tests --fix +# Build docs +mkdocs serve ``` -For development, follow the installation guide and use the standard Python development workflow. - -## Security & Safety - -- **Read-only operations** - Only SELECT queries and CTEs (WITH clauses) are allowed -- **SQL validation** - Configurable banned keywords and safety checks -- **Query limits** - Row limits, timeouts, and byte processing limits -- **Project isolation** - Access control via YAML configuration -- **No credentials in code** - Uses Google Cloud authentication - ## License -MIT License - see [LICENSE](LICENSE) file for details. - -## Contributing - -We welcome contributions! Please follow these guidelines: - -- Use `ruff format` and `ruff check --fix` before committing -- Add tests for new functionality with `pytest` -- Follow the existing code patterns and conventions -- Update documentation for any user-facing changes - -## Support - -- 📖 [Documentation](docs/index.md) -- 🐛 [Issue Tracker](https://github.com/aicayzer/bigquery-mcp/issues) -- 💬 [Discussions](https://github.com/aicayzer/bigquery-mcp/discussions) +MIT License - see [LICENSE](LICENSE) file. diff --git a/bigquery-mcp.code-workspace b/bigquery-mcp.code-workspace deleted file mode 100644 index 443f5a5..0000000 --- a/bigquery-mcp.code-workspace +++ /dev/null @@ -1,7 +0,0 @@ -{ - "folders": [ - { - "path": "." - } - ] -} \ No newline at end of file diff --git a/config/config.dev.yaml b/config/config.dev.yaml deleted file mode 100644 index 1c02e94..0000000 --- a/config/config.dev.yaml +++ /dev/null @@ -1,70 +0,0 @@ -# BigQuery MCP Development Server Configuration -# This configuration is specifically for development and testing - -server: - name: "BigQuery Development MCP" - version: "1.1.0-dev" - -# BigQuery configuration -bigquery: - # Default project for billing and unqualified table references - billing_project: "your-billing-project" - - # BigQuery location/region - location: "US" - - # Optional: Path to service account JSON key file - # If not provided, uses Application Default Credentials - service_account_path: "" - -# Project access control -# Configure your development projects here -projects: - - project_id: "sandbox-dev-466110" - project_name: "Sandbox Development" - description: "Development and testing data" - datasets: ["dev_*", "test_*", "staging_*"] - - - project_id: "sandbox-main-466110" - project_name: "Sandbox Main" - description: "Main testing data" - datasets: ["main_*", "prod_*", "*"] - -# Query execution limits (more permissive for development) -limits: - # Default number of rows to return if not specified - default_limit: 50 - - # Maximum rows that can be requested - max_limit: 1000 - - # Query timeout in seconds - max_query_timeout: 300 - - # Maximum bytes processed per query (0 = unlimited) - max_bytes_processed: 10737418240 # 10GB - - # Require explicit LIMIT clause in queries - require_limit: false - -# Security settings (relaxed for development) -security: - # Banned keywords (case-insensitive) - banned_sql_keywords: ["DROP", "DELETE", "UPDATE", "INSERT", "TRUNCATE", "ALTER", "CREATE"] - - # Allow only SELECT and WITH statements - select_only: true - - # Require explicit LIMIT clause in queries - require_explicit_limits: false - -# Development-specific features -development: - # Enable development tools - tools_enabled: true - - # Enable verbose logging - verbose_logging: true - - # Allow experimental features - experimental_features: true diff --git a/config/config.yaml.example b/config/config.yaml.example index d8feb4f..10efa78 100644 --- a/config/config.yaml.example +++ b/config/config.yaml.example @@ -2,7 +2,6 @@ server: name: "BigQuery Development Server" - version: "1.1.0" # BigQuery configuration bigquery: diff --git a/docker-compose.dev.yml b/docker-compose.dev.yml index fd43aa6..e6b2cf8 100644 --- a/docker-compose.dev.yml +++ b/docker-compose.dev.yml @@ -1,24 +1,50 @@ services: + # BigQuery MCP Server - Development Configuration bigquery-mcp-dev: build: context: . dockerfile: Dockerfile image: bigquery-mcp-dev:latest - container_name: bigquery-mcp-dev-server + container_name: bigquery-mcp-dev + command: [ + "python", "src/server.py", + "--project", "cayzer-xyz:*", + "--billing-project", "cayzer-xyz", + "--log-level", "DEBUG", + "--log-queries", "true", + "--log-results", "true", + "--compact-format", "false" + ] volumes: - - ./config:/app/config:ro - ./logs:/app/logs - ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro + # Optional: Mount config for complex development setups + - ./config:/app/config:ro environment: - - LOG_LEVEL=${LOG_LEVEL:-DEBUG} - - COMPACT_FORMAT=${COMPACT_FORMAT:-false} - # BIGQUERY_BILLING_PROJECT is set in config.dev.yaml + - LOG_LEVEL=DEBUG + - COMPACT_FORMAT=false + # CLI arguments take precedence over environment variables + - BIGQUERY_BILLING_PROJECT=${BIGQUERY_BILLING_PROJECT:-} - MCP_SERVER_NAME=BigQuery Development MCP - - CONFIG_FILE=/app/config/config.dev.yaml - ENABLE_DEV_TOOLS=true stdin_open: true tty: false # Note: No ports exposed - this is for MCP usage, not HTTP labels: - "mcp.server.type=development" - - "mcp.server.version=1.1.0-dev" + - "mcp.server.version=1.1.1-dev" + + # Alternative configurations for different development scenarios: + + # Multi-project development: + # command: [ + # "python", "src/server.py", + # "--project", "dev-project-1:test_*,demo_*", + # "--project", "dev-project-2:staging_*", + # "--billing-project", "dev-billing-project", + # "--log-level", "DEBUG", + # "--timeout", "120" + # ] + + # Config file development: + # command: ["python", "src/server.py", "--config", "/app/config/config.dev.yaml"] diff --git a/docker-compose.dev.yml.example b/docker-compose.dev.yml.example index 7e65798..63c7224 100644 --- a/docker-compose.dev.yml.example +++ b/docker-compose.dev.yml.example @@ -1,24 +1,50 @@ services: + # BigQuery MCP Server - Development Configuration bigquery-mcp-dev: build: context: . dockerfile: Dockerfile image: bigquery-mcp-dev:latest - container_name: bigquery-mcp-dev-server + container_name: bigquery-mcp-dev + command: [ + "python", "src/server.py", + "--project", "your-dev-project:*", + "--billing-project", "your-dev-project", + "--log-level", "DEBUG", + "--log-queries", "true", + "--log-results", "true", + "--compact-format", "false" + ] volumes: - - ./config:/app/config:ro - ./logs:/app/logs - ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro + # Optional: Mount config for complex development setups + - ./config:/app/config:ro environment: - - LOG_LEVEL=${LOG_LEVEL:-DEBUG} - - COMPACT_FORMAT=${COMPACT_FORMAT:-false} - # BIGQUERY_BILLING_PROJECT is set in config.dev.yaml + - LOG_LEVEL=DEBUG + - COMPACT_FORMAT=false + # CLI arguments take precedence over environment variables + - BIGQUERY_BILLING_PROJECT=${BIGQUERY_BILLING_PROJECT:-} - MCP_SERVER_NAME=BigQuery Development MCP - - CONFIG_FILE=/app/config/config.dev.yaml - ENABLE_DEV_TOOLS=true stdin_open: true tty: false # Note: No ports exposed - this is for MCP usage, not HTTP labels: - "mcp.server.type=development" - - "mcp.server.version=1.1.0-dev" \ No newline at end of file + - "mcp.server.version=1.1.1-dev" + + # Alternative configurations for different development scenarios: + + # Multi-project development: + # command: [ + # "python", "src/server.py", + # "--project", "dev-project-1:test_*,demo_*", + # "--project", "dev-project-2:staging_*", + # "--billing-project", "dev-billing-project", + # "--log-level", "DEBUG", + # "--timeout", "120" + # ] + + # Config file development: + # command: ["python", "src/server.py", "--config", "/app/config/config.dev.yaml"] \ No newline at end of file diff --git a/docker-compose.yml b/docker-compose.yml index 57f9d09..290e61e 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,31 +1,39 @@ services: - # Example 1: Using CLI arguments (recommended) - bigquery-mcp-cli: + # BigQuery MCP Server - Single clean service supporting CLI-first architecture + bigquery-mcp: build: . image: bigquery-mcp:latest - container_name: bigquery-mcp-cli - command: ["python", "src/server.py", "sandbox-dev-466110:dev_*", "sandbox-main-466110:main_*"] + container_name: bigquery-mcp + command: ["python", "src/server.py", "--project", "cayzer-xyz:*", "--billing-project", "cayzer-xyz"] volumes: - ./logs:/app/logs - ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro - environment: - - LOG_LEVEL=${LOG_LEVEL:-INFO} - - COMPACT_FORMAT=${COMPACT_FORMAT:-true} - - BIGQUERY_BILLING_PROJECT=${BIGQUERY_BILLING_PROJECT:-cayzer-xyz} - stdin_open: true - tty: false - - # Example 2: Using config file (deprecated) - bigquery-mcp-config: - build: . - image: bigquery-mcp:latest - container_name: bigquery-mcp-config - volumes: + # Optional: Mount config file for complex setups - ./config:/app/config:ro - - ./logs:/app/logs - - ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro environment: - LOG_LEVEL=${LOG_LEVEL:-INFO} - - COMPACT_FORMAT=${COMPACT_FORMAT:-true} + # Note: CLI arguments take precedence over environment variables + - COMPACT_FORMAT=${COMPACT_FORMAT:-false} + - BIGQUERY_BILLING_PROJECT=${BIGQUERY_BILLING_PROJECT:-} stdin_open: true tty: false + + # Example configurations for different use cases: + + # Simple usage (single project): + # command: ["python", "src/server.py", "--project", "my-project:*", "--billing-project", "my-project"] + + # Enterprise usage (multiple projects): + # command: [ + # "python", "src/server.py", + # "--project", "analytics-prod:user_*,session_*", + # "--project", "logs-prod:application_*,system_*", + # "--project", "ml-dev:training_*,models_*", + # "--billing-project", "my-billing-project", + # "--log-level", "INFO", + # "--compact-format", "true", + # "--timeout", "300" + # ] + + # Config file fallback (if no --project specified): + # command: ["python", "src/server.py", "--config", "/app/config/config.yaml"] diff --git a/docker-compose.yml.example b/docker-compose.yml.example index 378c696..e989fe2 100644 --- a/docker-compose.yml.example +++ b/docker-compose.yml.example @@ -1,31 +1,61 @@ services: - # Example 1: Using CLI arguments (recommended) - bigquery-mcp-cli: + # BigQuery MCP Server - Single clean service supporting CLI-first architecture + bigquery-mcp: build: . image: bigquery-mcp:latest - container_name: bigquery-mcp-cli - command: ["python", "src/server.py", "your-project-id:dataset_pattern", "another-project:pattern_*"] + container_name: bigquery-mcp + command: ["python", "src/server.py", "--project", "your-project-id:*", "--billing-project", "your-billing-project"] volumes: - ./logs:/app/logs - ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro + # Optional: Mount config file for complex setups + - ./config:/app/config:ro environment: - LOG_LEVEL=${LOG_LEVEL:-INFO} - - COMPACT_FORMAT=${COMPACT_FORMAT:-true} - - BIGQUERY_BILLING_PROJECT=${BIGQUERY_BILLING_PROJECT:-your-billing-project} + # Note: CLI arguments take precedence over environment variables + - COMPACT_FORMAT=${COMPACT_FORMAT:-false} + - BIGQUERY_BILLING_PROJECT=${BIGQUERY_BILLING_PROJECT:-} stdin_open: true tty: false + + # Example configurations for different use cases: + + # Simple usage (single project): + # command: ["python", "src/server.py", "--project", "my-project:*", "--billing-project", "my-project"] + + # Enterprise usage (multiple projects): + # command: [ + # "python", "src/server.py", + # "--project", "analytics-prod:user_*,session_*", + # "--project", "logs-prod:application_*,system_*", + # "--project", "ml-dev:training_*,models_*", + # "--billing-project", "my-billing-project", + # "--log-level", "INFO", + # "--compact-format", "true", + # "--timeout", "300" + # ] + + # Config file fallback (if no --project specified): + # command: ["python", "src/server.py", "--config", "/app/config/config.yaml"] - # Example 2: Using config file (deprecated) - bigquery-mcp-config: - build: . - image: bigquery-mcp:latest - container_name: bigquery-mcp-config - volumes: - - ./config:/app/config:ro - - ./logs:/app/logs - - ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro - environment: - - LOG_LEVEL=${LOG_LEVEL:-INFO} - - COMPACT_FORMAT=${COMPACT_FORMAT:-true} - stdin_open: true - tty: false \ No newline at end of file +# For development with additional debugging: +# Uncomment the following service for development usage +# bigquery-mcp-dev: +# build: . +# image: bigquery-mcp-dev:latest +# container_name: bigquery-mcp-dev +# command: [ +# "python", "src/server.py", +# "--project", "YOUR_DEV_PROJECT:*", +# "--billing-project", "YOUR_DEV_PROJECT", +# "--log-level", "DEBUG", +# "--log-queries", "true", +# "--log-results", "true" +# ] +# volumes: +# - ./logs:/app/logs +# - ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro +# environment: +# - LOG_LEVEL=DEBUG +# stdin_open: true +# tty: false \ No newline at end of file diff --git a/docs/ai-setup.md b/docs/ai-setup.md new file mode 100644 index 0000000..f6722ca --- /dev/null +++ b/docs/ai-setup.md @@ -0,0 +1,131 @@ +# AI-Assisted BigQuery MCP Setup Guide + +## Overview + +This guide provides a ChatGPT prompt that will help you configure the BigQuery MCP Server for your specific needs. Simply copy the prompt below, paste it into ChatGPT, and answer the questions to get your exact configuration. + +## ChatGPT Setup Prompt + +Copy and paste this entire prompt into ChatGPT: + +--- + +**PROMPT START** + +You are an expert assistant for configuring the BigQuery MCP Server. Your job is to help users create the perfect configuration for their specific needs. + +**CONTEXT:** +- BigQuery MCP Server v1.1.1 supports CLI-first architecture +- Users can configure via command-line arguments or config files +- Supports multiple BigQuery projects with dataset pattern matching +- Can be deployed via Docker or direct Python execution +- Integrates with Claude Desktop, Cursor IDE, and other MCP clients + +**YOUR TASK:** +Ask the user targeted questions to understand their setup preferences, then provide the exact configuration they need. + +**QUESTIONS TO ASK:** + +1. **Client Type**: What MCP client are you using? + - Claude Desktop + - Cursor IDE + - Other (ask them to specify) + +2. **Deployment Method**: How do you want to run the server? + - Docker (recommended) + - Direct Python execution + +3. **Project Setup**: What BigQuery projects do you need access to? + - Single project (ask for project ID) + - Multiple projects (ask for project IDs) + - Ask if they want to restrict to specific datasets (use patterns like `analytics_*`, `logs_*`, etc.) + +4. **Billing Project**: Which project should be used for billing? (usually the same as main project) + +5. **Configuration Style**: How do you prefer to configure? + - Command-line arguments (recommended for most users) + - Config file (good for complex setups) + +6. **Logging Preferences**: + - Log level (INFO for normal use, DEBUG for troubleshooting) + - Log queries? (true/false) + - Log results? (false recommended for security) + +7. **Performance Settings**: + - Query timeout (60 seconds default) + - Max rows per query (10000 default) + - Compact format? (false for detailed responses, true for concise) + +**RESPONSE FORMAT:** +After gathering the information, provide: + +1. **Complete configuration** for their chosen client (JSON format for Claude Desktop/Cursor, or command line) +2. **Step-by-step setup instructions** specific to their choices +3. **Testing commands** to verify the setup works +4. **Troubleshooting tips** for common issues + +**EXAMPLE PATTERNS:** +- Single project, all datasets: `"your-project:*"` +- Multiple specific datasets: `"your-project:analytics_*,logs_*,staging_*"` +- Multiple projects: `"--project", "project1:*", "--project", "project2:specific_*"` + +**IMPORTANT NOTES:** +- Always use absolute paths in configurations +- Include volume mounts for Google Cloud credentials +- Mention that `gcloud auth application-default login` is required +- For Claude Desktop, the config goes in `~/.config/claude/claude_desktop_config.json` +- For Cursor, it goes in the MCP settings + +Start by asking: "Hi! I'll help you set up the BigQuery MCP Server. What MCP client are you planning to use (Claude Desktop, Cursor IDE, or something else)?" + +**PROMPT END** + +--- + +## Quick Start Examples + +If you just want to get started quickly, here are some common configurations: + +### Claude Desktop + Docker (Single Project) +```json +{ + "mcpServers": { + "bigquery": { + "command": "docker", + "args": [ + "run", "--rm", "-i", + "--volume", "/Users/YOUR_USERNAME/.config/gcloud:/home/mcpuser/.config/gcloud:ro", + "--volume", "/ABSOLUTE/PATH/TO/bigquery-mcp/logs:/app/logs", + "bigquery-mcp:latest", + "python", "src/server.py", + "--project", "your-project-id:*", + "--billing-project", "your-project-id" + ] + } + } +} +``` + +### Cursor IDE + Docker (Multiple Projects) +Add this to your Cursor MCP settings: +```json +{ + "bigquery": { + "command": "docker", + "args": [ + "run", "--rm", "-i", + "--volume", "/Users/YOUR_USERNAME/.config/gcloud:/home/mcpuser/.config/gcloud:ro", + "--volume", "/ABSOLUTE/PATH/TO/bigquery-mcp/logs:/app/logs", + "bigquery-mcp:latest", + "python", "src/server.py", + "--project", "analytics-prod:user_*,session_*", + "--project", "logs-prod:application_*,system_*", + "--billing-project", "my-billing-project" + ] + } +} +``` + +## Manual Setup Guide + +If you prefer to configure manually, see the [setup guide](setup.md) for detailed instructions. \ No newline at end of file diff --git a/docs/configuration.md b/docs/configuration.md index a465bd5..6b295f8 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1,41 +1,101 @@ # Configuration Guide -This guide covers all configuration options for the BigQuery MCP Server. +This guide covers all configuration options for the BigQuery MCP Server v1.1.1 with CLI-first architecture. -## Command-Line Arguments (Recommended) +## Command-Line Arguments (Primary Interface) The preferred way to configure the server is using command-line arguments: ```bash -# Basic usage -python src/server.py sandbox-dev:dev_* sandbox-main:main_* +# Basic usage with new CLI-first format +python src/server.py --project "your-project-1:dev_*" --project "your-project-2:main_*" --billing-project "your-billing-project" + +# Single project with all datasets +python src/server.py --project "your-project:*" --billing-project "your-project" # Multiple patterns for same project -python src/server.py cayzer-xyz:demo_* cayzer-xyz:analytics_* +python src/server.py --project "your-project:demo_*,analytics_*" --billing-project "your-project" + +# Enterprise usage with multiple projects and settings +python src/server.py \ + --project "analytics-prod:user_*,session_*" \ + --project "logs-prod:application_*,system_*" \ + --project "ml-dev:training_*,models_*" \ + --billing-project "my-billing-project" \ + --log-level "INFO" \ + --compact-format "true" \ + --timeout 300 \ + --max-limit 50000 +``` + +## Complete CLI Arguments Reference + +### Core Arguments +- **`--project`**: Project access patterns (can be repeated). Format: `project_id:dataset_pattern[:table_pattern]` +- **`--billing-project`**: BigQuery billing project (overrides environment variable) +- **`--config`**: Path to config file (fallback when no projects specified) +- **`--location`**: BigQuery location (default: EU) + +### Logging Options +- **`--log-level`**: Logging level (DEBUG, INFO, WARNING, ERROR). Default: INFO +- **`--log-queries`**: Log queries for audit purposes (true/false). Default: true +- **`--log-results`**: Log query results - be careful with sensitive data (true/false). Default: false + +### Performance & Limits +- **`--timeout`**: Query timeout in seconds. Default: 60 +- **`--max-limit`**: Maximum rows that can be requested. Default: 10000 +- **`--max-bytes-processed`**: Maximum bytes processed for cost control. Default: 1073741824 (1GB) + +### Security Options +- **`--select-only`**: Allow only SELECT statements (true/false). Default: true +- **`--require-explicit-limits`**: Require explicit LIMIT clause in SELECT queries (true/false). Default: false +- **`--banned-keywords`**: Comma-separated list of banned SQL keywords. Default: CREATE,DELETE,DROP,TRUNCATE,ALTER,INSERT,UPDATE +### Formatting +- **`--compact-format`**: Use compact response format (true/false). Default: false + +## Configuration Precedence + +**CLI Arguments > Config File > Environment Variables > Hardcoded Defaults** + +This means CLI arguments always take precedence over config file settings, which take precedence over environment variables. + +## Project Pattern Examples + +### Simple Patterns +```bash # All datasets in a project -python src/server.py your-project:* +--project "my-project:*" + +# Specific dataset patterns +--project "my-project:analytics_*,logs_*" -# With additional options -python src/server.py sandbox-dev:dev_* \ - --billing-project cayzer-xyz \ - --location EU +# Multiple projects +--project "project1:*" --project "project2:staging_*" ``` -### Command-Line Options +### Enterprise Patterns +```bash +# Complex multi-project setup +--project "analytics-prod:user_*,session_*,conversion_*" \ +--project "logs-prod:application_*,system_*,error_*" \ +--project "ml-dev:training_*,features_*,models_*" \ +--project "warehouse:daily_*,weekly_*,monthly_*" +``` -- **Project patterns**: `project_id:dataset_pattern` format -- **`--billing-project`**: Project ID for billing (overrides environment variable) -- **`--location`**: BigQuery location (default: EU) -- **`--config`**: Path to config file (fallback only) -- **`--version`**: Show version information +### Table Patterns (Future Enhancement) +```bash +# Table patterns will be supported in future versions +--project "project:dataset:table_pattern" +``` -## Configuration File (Deprecated) +## Configuration File (Fallback) -For backward compatibility, the server still supports YAML configuration files: +For backward compatibility and complex setups, the server still supports YAML configuration files: ```bash -cp config/config.yaml.example config/config.yaml +# Use config file when no --project arguments provided +python src/server.py --config config/config.yaml ``` ## Complete Configuration Reference @@ -45,31 +105,25 @@ cp config/config.yaml.example config/config.yaml ```yaml server: name: "BigQuery MCP Server" - version: "1.1.0" + version: "1.1.1" ``` -- **`name`**: Display name for the server (used in logs and responses) -- **`version`**: Server version (should match the installed version) - ### BigQuery Section ```yaml bigquery: + # Required: Project for billing billing_project: "your-billing-project" - location: "US" - service_account_path: "" + + # Optional: BigQuery location/region + location: "EU" + + # Optional: Service account path + service_account_path: "/path/to/service-account.json" ``` -- **`billing_project`**: Project ID used for billing BigQuery queries (required) -- **`location`**: BigQuery location/region (default: "US") - - Common values: "US", "EU", "asia-northeast1" -- **`service_account_path`**: Path to service account JSON file (optional) - - If not provided, uses Application Default Credentials - ### Projects Section -Define which BigQuery projects and datasets the server can access: - ```yaml projects: - project_id: "analytics-prod" @@ -80,43 +134,31 @@ projects: - project_id: "raw-data-lake" project_name: "Raw Data Lake" description: "Raw data ingestion layer" - datasets: ["*"] # Allow all datasets - - - project_id: "ml-features" - project_name: "ML Feature Store" - description: "Machine learning features and training data" - datasets: ["features_*", "training_*", "models_*"] + datasets: ["*"] # All datasets ``` -**Dataset Patterns:** -- `"*"` - Allow all datasets in the project -- `"dataset_name"` - Allow specific dataset -- `"prefix_*"` - Allow datasets starting with prefix -- `["dataset1", "dataset2"]` - Allow multiple specific datasets - ### Limits Section -Control query execution and resource usage: - ```yaml limits: + # Default rows returned if not specified default_limit: 20 - max_limit: 10000 + + # Maximum query execution time in seconds max_query_timeout: 60 + + # Maximum rows that can be requested + max_limit: 10000 + + # Maximum bytes processed (cost control) max_bytes_processed: 1073741824 # 1GB ``` -- **`default_limit`**: Default number of rows returned (if not specified in query) -- **`max_limit`**: Maximum rows that can be requested in a single query -- **`max_query_timeout`**: Maximum query execution time in seconds -- **`max_bytes_processed`**: Maximum bytes processed per query (for cost control) - ### Security Section -SQL safety and validation settings: - ```yaml security: + # Banned SQL keywords banned_sql_keywords: - "CREATE" - "DELETE" @@ -128,270 +170,103 @@ security: - "GRANT" - "REVOKE" - "MERGE" - - "CALL" - - "EXECUTE" - - "SCRIPT" - require_explicit_limits: false + + # Allow only SELECT statements select_only: true + + # Require explicit LIMIT clause + require_explicit_limits: false ``` -- **`banned_sql_keywords`**: SQL keywords that will cause query rejection -- **`require_explicit_limits`**: If true, all SELECT queries must include LIMIT clause -- **`select_only`**: If true, only SELECT statements and CTEs (WITH clauses) are allowed (recommended) - ### Formatting Section -Response formatting options: - ```yaml formatting: - compact_mode: false + # Use compact format by default + compact_format: false ``` -- **`compact_mode`**: Use compact response format (reduces token usage) - -*Note: Field descriptions are always included in schema responses.* - ### Logging Section -Logging and audit configuration: - ```yaml logging: + # Log queries for audit purposes log_queries: true + + # Log query results (be careful with sensitive data) log_results: false - max_query_log_length: 1000 ``` -- **`log_queries`**: Log SQL queries for audit purposes -- **`log_results`**: Log query results (be careful with sensitive data) -- **`max_query_log_length`**: Maximum length of logged SQL queries - ## Environment Variables -Environment variables override YAML configuration values: - -### Core Settings - -```bash -# BigQuery configuration -export BIGQUERY_BILLING_PROJECT=your-billing-project -export BIGQUERY_LOCATION=US -export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json - -# Server behavior -export LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR -export COMPACT_FORMAT=true # Override formatting.compact_mode -``` - -### Available Environment Variables - -| Variable | Description | Default | -|----------|-------------|---------| -| `BIGQUERY_BILLING_PROJECT` | Override billing project | From config | -| `BIGQUERY_LOCATION` | Override BigQuery location | From config | -| `GOOGLE_APPLICATION_CREDENTIALS` | Path to service account JSON | Auto-detect | -| `LOG_LEVEL` | Logging verbosity | INFO | -| `COMPACT_FORMAT` | Enable compact mode | From config | - -## Configuration Examples - -### Development Setup - -For local development with personal projects: - -```yaml -server: - name: "BigQuery MCP Development" - version: "1.1.0" - -bigquery: - billing_project: "my-dev-project" - location: "US" - -projects: - - project_id: "my-dev-project" - project_name: "Development Project" - description: "Personal development data" - datasets: ["*"] - -limits: - default_limit: 10 - max_limit: 1000 - max_query_timeout: 30 - max_bytes_processed: 104857600 # 100MB - -formatting: - compact_mode: true - -logging: - log_queries: true - log_results: true # OK for development -``` - -### Production Setup - -For production deployments with multiple projects: - -```yaml -server: - name: "BigQuery MCP Production" - version: "1.1.0" - -bigquery: - billing_project: "analytics-billing" - location: "US" - service_account_path: "/app/credentials/service-account.json" - -projects: - - project_id: "data-warehouse" - project_name: "Data Warehouse" - description: "Production data warehouse" - datasets: ["prod_*", "reporting_*"] - - - project_id: "analytics-sandbox" - project_name: "Analytics Sandbox" - description: "Analytics team sandbox" - datasets: ["sandbox_*", "experiments_*"] - -limits: - default_limit: 100 - max_limit: 10000 - max_query_timeout: 300 # 5 minutes - max_bytes_processed: 10737418240 # 10GB - -security: - banned_sql_keywords: - - "CREATE" - - "DELETE" - - "DROP" - - "INSERT" - - "UPDATE" - - "TRUNCATE" - require_explicit_limits: true - select_only: true - -formatting: - compact_mode: true - -logging: - log_queries: true # Query length limited to 500 chars -``` +The following environment variables are supported: -### Multi-Region Setup - -For organizations with data in multiple regions: - -```yaml -bigquery: - billing_project: "global-analytics" - location: "US" # Default location - -projects: - - project_id: "us-data-warehouse" - project_name: "US Data Warehouse" - description: "US region data" - datasets: ["us_*"] - - - project_id: "eu-data-warehouse" - project_name: "EU Data Warehouse" - description: "EU region data" - datasets: ["eu_*"] - - - project_id: "asia-data-warehouse" - project_name: "Asia Data Warehouse" - description: "Asia region data" - datasets: ["asia_*"] -``` +- **`BIGQUERY_BILLING_PROJECT`**: Default billing project +- **`GOOGLE_APPLICATION_CREDENTIALS`**: Path to service account JSON +- **`BIGQUERY_LOCATION`**: BigQuery location +- **`LOG_LEVEL`**: Logging level +- **`COMPACT_FORMAT`**: Use compact format (true/false) +- **`LOG_QUERIES`**: Log queries (true/false) +- **`LOG_RESULTS`**: Log results (true/false) ## Docker Configuration -### Environment File - -Create a `.env` file for Docker deployments: +### CLI-First Docker (Recommended) ```bash -# .env file -BIGQUERY_BILLING_PROJECT=your-billing-project -BIGQUERY_LOCATION=US -LOG_LEVEL=INFO -COMPACT_FORMAT=true +docker run --rm -i \ + --volume ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro \ + --volume ./logs:/app/logs \ + bigquery-mcp:latest \ + python src/server.py \ + --project "your-project:*" \ + --billing-project "your-project" \ + --log-level "INFO" ``` ### Docker Compose -Use environment variables in `docker-compose.yml`: - ```yaml services: bigquery-mcp: build: . - environment: - - BIGQUERY_BILLING_PROJECT=${BIGQUERY_BILLING_PROJECT} - - BIGQUERY_LOCATION=${BIGQUERY_LOCATION:-US} - - LOG_LEVEL=${LOG_LEVEL:-INFO} - - COMPACT_FORMAT=${COMPACT_FORMAT:-true} + image: bigquery-mcp:latest + container_name: bigquery-mcp + command: [ + "python", "src/server.py", + "--project", "your-project:*", + "--billing-project", "your-project", + "--log-level", "INFO" + ] volumes: - - ./config:/app/config:ro + - ./logs:/app/logs - ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro + stdin_open: true + tty: false ``` -## Validation - -### Configuration Validation - -The server validates configuration on startup: +## Migration from v1.1.0 -- Required fields must be present -- Project IDs must be valid -- Numeric limits must be positive -- Dataset patterns must be valid +If you're upgrading from v1.1.0: -### Testing Configuration - -Test your configuration: - -```bash -# Test configuration loading -python -c "from src.config import load_config; print('Config OK')" - -# Test BigQuery access -python -c "from src.client import BigQueryClient; client = BigQueryClient('config/config.yaml'); print('BigQuery OK')" -``` +1. **Update CLI usage**: Change from `python src/server.py project:pattern` to `python src/server.py --project "project:pattern"` +2. **Add new arguments**: Take advantage of new CLI options like `--log-level`, `--timeout`, etc. +3. **Update Docker configs**: Use new CLI-first Docker approach +4. **Check config files**: Ensure `log_results` attribute is present in config files ## Troubleshooting -### Common Configuration Issues +### Common Issues -#### Invalid Project Access -``` -Error: Project 'project-id' not found or access denied -``` -**Solution:** Check project ID spelling and IAM permissions +1. **Missing log_results attribute**: Update config files to include `log_results: false` in the logging section +2. **CLI argument parsing**: Ensure you're using `--project` flag instead of positional arguments +3. **Docker issues**: Use absolute paths for volume mounts and ensure gcloud auth is set up -#### Dataset Pattern Errors -``` -Error: No datasets match pattern 'invalid_*' -``` -**Solution:** Verify dataset names and patterns in your BigQuery project +### Debug Mode -#### Resource Limit Errors -``` -Error: Query exceeded maximum bytes processed +```bash +# Enable debug logging +python src/server.py --project "your-project:*" --billing-project "your-project" --log-level DEBUG --log-queries true --log-results true ``` -**Solution:** Increase `max_bytes_processed` or optimize your query - -### Configuration Best Practices - -1. **Use environment variables** for sensitive values (project IDs, paths) -2. **Set appropriate limits** based on your use case and costs -3. **Use specific dataset patterns** rather than `"*"` where possible -4. **Enable query logging** for audit purposes -5. **Disable result logging** in production for security -6. **Test configuration changes** in development first - -## Next Steps -- [Tools Reference](tools.md) - Learn about available MCP tools -- [Development Guide](development.md) - Set up development environment -- [Installation Guide](installation.md) - Deployment options \ No newline at end of file +This will provide detailed information about configuration loading, query execution, and error handling. \ No newline at end of file diff --git a/docs/installation.md b/docs/installation.md index c16f033..36a4e92 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -245,10 +245,10 @@ If you encounter issues: 1. Check the `logs/` directory in your project root for detailed error messages 2. Verify your configuration matches the [examples](configuration.md) 3. Test BigQuery access independently with `gcloud` or `bq` commands -4. Review the [development guide](development.md) for debugging tips +4. Review the setup guide for debugging tips ## Next Steps - [Configuration Guide](configuration.md) - Detailed configuration options - [Tools Reference](tools.md) - Available MCP tools and examples -- [Integration Examples](index.md#integration-examples) - Claude Desktop and Cursor setup \ No newline at end of file +- [Client Setup Guide](setup.md) - Claude Desktop and Cursor setup \ No newline at end of file diff --git a/docs/setup.md b/docs/setup.md index 1fdd95e..e3ada10 100644 --- a/docs/setup.md +++ b/docs/setup.md @@ -1,131 +1,278 @@ -# MCP Client Setup Guide +# BigQuery MCP Server v1.1.1 - Complete Setup Guide -This guide shows how to configure the BigQuery MCP server to work with various MCP clients. +## 🎯 Overview -## Prerequisites +The BigQuery MCP Server now uses a **CLI-first architecture** with comprehensive configuration options. This guide covers both Docker and non-Docker setups for simple and enterprise use cases. -1. **Docker Desktop** - Ensure Docker is installed and running -2. **Google Cloud Authentication** - Set up using one of these methods: - ```bash - # Option 1: Application Default Credentials (recommended) - gcloud auth application-default login +**Configuration Precedence**: CLI Arguments > Config File > Environment Variables > Defaults - # Option 2: Service Account JSON file - export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json - ``` -3. **BigQuery Access** - Ensure your account has the necessary permissions - -## Setup Steps - -### Step 1: Build the Docker Image +## 🚀 Quick Setup (Docker - Recommended) +### Step 1: Authenticate with Google Cloud ```bash -cd /path/to/bigquery-mcp -docker-compose build +# This gives you access to everything your account can access +gcloud auth application-default login ``` -### Step 2: Test the Container - +### Step 2: Build the Docker Image ```bash -# Test that the container starts correctly -docker-compose run --rm bigquery-mcp -``` +# Clone the repository if you haven't already +git clone https://github.com/aicayzer/bigquery-mcp.git +cd bigquery-mcp -You should see the BigQuery MCP server start. Press Ctrl+C to stop. - -### Step 3: Configure Your Client +# Build the image +docker build -t bigquery-mcp:latest . +``` -## Claude Desktop +### Step 3: Test the Server +```bash +# Simple test with one project +docker run --rm -i \ + --volume ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro \ + --volume ./logs:/app/logs \ + bigquery-mcp:latest \ + python src/server.py \ + --project "YOUR_PROJECT_ID:*" \ + --billing-project "YOUR_PROJECT_ID" +``` -**Location**: Claude Desktop → Settings → Developer → Edit Config +## 🎯 Claude Desktop Configuration +### Option 1: Simple Setup (Single Project) ```json { "mcpServers": { "bigquery": { "command": "docker", "args": [ - "run", - "--rm", - "-i", - "--env", "BIGQUERY_BILLING_PROJECT=your-billing-project", - "--env", "LOG_LEVEL=INFO", - "--env", "COMPACT_FORMAT=true", - "--volume", "/path/to/bigquery-mcp/config:/app/config:ro", - "--volume", "/path/to/bigquery-mcp/logs:/app/logs", - "--volume", "/Users/your-username/.config/gcloud:/home/mcpuser/.config/gcloud:ro", - "bigquery-mcp:latest" + "run", "--rm", "-i", + "--volume", "/Users/YOUR_USERNAME/.config/gcloud:/home/mcpuser/.config/gcloud:ro", + "--volume", "/ABSOLUTE/PATH/TO/bigquery-mcp/logs:/app/logs", + "bigquery-mcp:latest", + "python", "src/server.py", + "--project", "your-project-id:*", + "--billing-project", "your-project-id", + "--log-level", "INFO", + "--compact-format", "true" ] } } } ``` -## Cursor IDE - -**Location**: Cursor Settings → Extensions → MCP Servers → Edit Configuration +### Option 2: Enterprise Setup (Multiple Projects) +```json +{ + "mcpServers": { + "bigquery": { + "command": "docker", + "args": [ + "run", "--rm", "-i", + "--volume", "/Users/YOUR_USERNAME/.config/gcloud:/home/mcpuser/.config/gcloud:ro", + "--volume", "/ABSOLUTE/PATH/TO/bigquery-mcp/logs:/app/logs", + "bigquery-mcp:latest", + "python", "src/server.py", + "--project", "analytics-prod:user_*,session_*", + "--project", "logs-prod:application_*,system_*", + "--project", "ml-dev:training_*,models_*", + "--billing-project", "my-billing-project", + "--log-level", "INFO", + "--compact-format", "true", + "--timeout", "300", + "--max-limit", "50000" + ] + } + } +} +``` +### Option 3: Config File Approach ```json { "mcpServers": { "bigquery": { "command": "docker", "args": [ - "run", - "--rm", - "-i", - "--env", "BIGQUERY_BILLING_PROJECT=your-billing-project", - "--env", "LOG_LEVEL=INFO", - "--env", "COMPACT_FORMAT=true", - "--volume", "/path/to/bigquery-mcp/config:/app/config:ro", - "--volume", "/path/to/bigquery-mcp/logs:/app/logs", - "--volume", "/Users/your-username/.config/gcloud:/home/mcpuser/.config/gcloud:ro", - "bigquery-mcp:latest" + "run", "--rm", "-i", + "--volume", "/Users/YOUR_USERNAME/.config/gcloud:/home/mcpuser/.config/gcloud:ro", + "--volume", "/ABSOLUTE/PATH/TO/bigquery-mcp/logs:/app/logs", + "--volume", "/ABSOLUTE/PATH/TO/bigquery-mcp/config:/app/config:ro", + "bigquery-mcp:latest", + "python", "src/server.py", + "--config", "/app/config/config.yaml" ] } } } ``` -## Alternative: Python Setup +## 🖥️ Non-Docker Setup (Advanced) + +### Prerequisites +- Python 3.11+ +- Google Cloud SDK +- Virtual environment (recommended) + +### Step 1: Install Dependencies +```bash +# Create virtual environment +python3 -m venv venv +source venv/bin/activate # On Windows: venv\Scripts\activate + +# Install dependencies +pip install -r requirements.txt +``` -If you prefer not to use Docker: +### Step 2: Authenticate +```bash +# Set up authentication +gcloud auth application-default login +export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account.json" # Optional +``` +### Step 3: Claude Desktop Configuration ```json { "mcpServers": { "bigquery": { "command": "python", - "args": ["/path/to/bigquery-mcp/src/server.py"], + "args": [ + "/ABSOLUTE/PATH/TO/bigquery-mcp/src/server.py", + "--project", "your-project:*", + "--billing-project", "your-project" + ], "env": { - "GOOGLE_APPLICATION_CREDENTIALS": "/path/to/credentials.json", - "BIGQUERY_BILLING_PROJECT": "your-project-id" + "GOOGLE_APPLICATION_CREDENTIALS": "/path/to/credentials.json" } } } } ``` -## Configuration Notes +## 🔧 CLI Arguments Reference + +### Core Arguments +- `--project` - Project patterns (can be repeated). Format: `project_id:dataset_pattern[:table_pattern]` +- `--billing-project` - BigQuery billing project +- `--config` - Path to configuration file (fallback if no projects specified) + +### Logging Options +- `--log-level` - Logging level (DEBUG, INFO, WARNING, ERROR). Default: INFO +- `--log-queries` - Log queries for audit (true/false). Default: true +- `--log-results` - Log query results (true/false). Default: false + +### Performance & Limits +- `--timeout` - Query timeout in seconds. Default: 60 +- `--max-limit` - Maximum rows that can be requested. Default: 10000 +- `--max-bytes-processed` - Maximum bytes processed (cost control). Default: 1073741824 (1GB) + +### Security Options +- `--select-only` - Allow only SELECT statements (true/false). Default: true +- `--require-explicit-limits` - Require explicit LIMIT clause (true/false). Default: false +- `--banned-keywords` - Comma-separated banned SQL keywords. Default: CREATE,DELETE,DROP,TRUNCATE,ALTER,INSERT,UPDATE + +### Formatting +- `--compact-format` - Use compact response format (true/false). Default: false + +## 📋 Pattern Examples + +### Simple Patterns +```bash +# All datasets in a project +--project "my-project:*" + +# Specific dataset patterns +--project "my-project:analytics_*,logs_*" + +# Multiple projects +--project "project1:*" --project "project2:staging_*" +``` + +### Enterprise Patterns +```bash +# Complex multi-project setup +--project "analytics-prod:user_*,session_*,conversion_*" \ +--project "logs-prod:application_*,system_*,error_*" \ +--project "ml-dev:training_*,features_*,models_*" \ +--project "warehouse:daily_*,weekly_*,monthly_*" +``` + +## 🔍 Troubleshooting + +### Common Issues + +#### 1. Authentication Errors +``` +Error: Permission denied +``` +**Solution**: +- Run `gcloud auth application-default login` +- Verify your service account has BigQuery permissions +- Check that billing is enabled on your project + +#### 2. Table Not Found +``` +Error: Table "table_name" must be qualified with a dataset +``` +**Solution**: +- Use fully qualified table names: `project.dataset.table` +- Verify the table exists in BigQuery console +- Check your dataset patterns include the target dataset + +#### 3. Docker Volume Issues +``` +Error: No such file or directory +``` +**Solution**: +- Use absolute paths in volume mounts +- Ensure the gcloud config directory exists: `~/.config/gcloud` +- Create logs directory: `mkdir -p logs` + +#### 4. Configuration Precedence Issues +``` +Error: CLI arguments not taking effect +``` +**Solution**: +- Remember: CLI > Config File > Environment > Defaults +- Use `--log-level DEBUG` to see configuration source +- Check for typos in argument names + +### Debug Mode +```bash +# Enable debug logging to see configuration details +--log-level DEBUG --log-queries true +``` + +### Verify Setup +```bash +# Test with a simple query +docker run --rm -i \ + --volume ~/.config/gcloud:/home/mcpuser/.config/gcloud:ro \ + bigquery-mcp:latest \ + python src/server.py \ + --project "your-project:*" \ + --billing-project "your-project" \ + --log-level DEBUG +``` + +## 📚 Additional Resources -**Important**: Update the paths in the configuration: -- Replace `/path/to/bigquery-mcp` with your actual project path -- Replace `/Users/your-username/.config/gcloud` with your gcloud config path -- Replace `your-billing-project` with your BigQuery billing project ID +- [Configuration Documentation](configuration.md) +- [Tool Reference](tools.md) +- Docker setup instructions are included above +- [GitHub Repository](https://github.com/aicayzer/bigquery-mcp) -## Troubleshooting +## 🆘 Getting Help -### Container Won't Start -- Check Docker is running: `docker info` -- Verify the image exists: `docker images | grep bigquery-mcp` -- Check logs: `docker-compose logs bigquery-mcp` +If you encounter issues: -### Authentication Issues -- Verify gcloud auth: `gcloud auth list` -- Check credentials file exists: `ls -la ~/.config/gcloud/application_default_credentials.json` -- Ensure billing project is correct in config +1. **Check the logs**: `tail -f logs/bigquery_mcp.log` +2. **Enable debug mode**: `--log-level DEBUG` +3. **Verify authentication**: `gcloud auth list` +4. **Test BigQuery access**: `bq ls` (should list your projects) +5. **Check Claude Desktop logs**: Look for MCP connection errors -### Path Issues -- Use absolute paths in the Docker volume mounts -- Ensure all paths exist and are readable -- Check file permissions on mounted directories \ No newline at end of file +For additional support, please open an issue on the GitHub repository with: +- Your configuration (with sensitive data removed) +- Error messages from logs +- Steps to reproduce the issue \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 1263502..0c7f970 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -5,6 +5,7 @@ site_url: https://github.com/aicayzer/bigquery-mcp nav: - Installation: installation.md - Client Setup: setup.md + - AI Setup Assistant: ai-setup.md - Tools Reference: tools.md - Configuration: configuration.md diff --git a/site/404.html b/site/404.html index 88654bf..4bfc13c 100644 --- a/site/404.html +++ b/site/404.html @@ -233,6 +233,25 @@ +
  • + + + + + + + AI Setup Assistant + + +
  • + + + + + + + +
  • @@ -361,6 +380,28 @@ +
  • + + + + + + AI Setup Assistant + + + + + +
  • + + + + + + + + +
  • diff --git a/site/configuration/index.html b/site/configuration/index.html index 7e75a63..7155517 100644 --- a/site/configuration/index.html +++ b/site/configuration/index.html @@ -242,6 +242,25 @@ +
  • + + + + + + + AI Setup Assistant + + +
  • + + + + + + + +
  • @@ -372,6 +391,28 @@ +
  • + + + + + + AI Setup Assistant + + + + + +
  • + + + + + + + + +
  • @@ -445,106 +486,115 @@