From a25ec187a62b1bfefbed438f3ddd79529560d0e2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E2=80=9CRounak?= <“rounakpreet.d@deuexsolutions.com”> Date: Wed, 19 Nov 2025 16:29:21 +0530 Subject: [PATCH] Docs: Query Runner Documentation --- content/v1.10.x/collate-menu.md | 9 + .../query-runner/admin-configuration.md | 229 ++++++++++++ .../how-to-guides/query-runner/index.md | 38 ++ .../how-to-guides/query-runner/sql-studio.md | 340 ++++++++++++++++++ .../query-runner/user-authentication.md | 145 ++++++++ content/v1.11.x-SNAPSHOT/collate-menu.md | 9 + .../query-runner/admin-configuration.md | 229 ++++++++++++ .../how-to-guides/query-runner/index.md | 38 ++ .../how-to-guides/query-runner/sql-studio.md | 340 ++++++++++++++++++ .../query-runner/user-authentication.md | 145 ++++++++ 10 files changed, 1522 insertions(+) create mode 100644 content/v1.10.x/how-to-guides/query-runner/admin-configuration.md create mode 100644 content/v1.10.x/how-to-guides/query-runner/index.md create mode 100644 content/v1.10.x/how-to-guides/query-runner/sql-studio.md create mode 100644 content/v1.10.x/how-to-guides/query-runner/user-authentication.md create mode 100644 content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/admin-configuration.md create mode 100644 content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/index.md create mode 100644 content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/sql-studio.md create mode 100644 content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/user-authentication.md diff --git a/content/v1.10.x/collate-menu.md b/content/v1.10.x/collate-menu.md index bf1cd61b7..1ada30f27 100644 --- a/content/v1.10.x/collate-menu.md +++ b/content/v1.10.x/collate-menu.md @@ -923,6 +923,15 @@ site_menu: - category: How-to Guides / Guide for Data Users / OpenMetadata Browser Extension url: /how-to-guides/guide-for-data-users/browser-ext + - category: How-to Guides / Query Runner + url: /how-to-guides/query-runner + - category: How-to Guides / Query Runner / Admin Configuration + url: /how-to-guides/query-runner/admin-configuration + - category: How-to Guides / Query Runner / User Authentication + url: /how-to-guides/query-runner/user-authentication + - category: How-to Guides / Query Runner / SQL Studio + url: /how-to-guides/query-runner/sql-studio + - category: How-to Guides / Data Discovery url: /how-to-guides/data-discovery - category: How-to Guides / Data Discovery / How to Discover Assets of Interest diff --git a/content/v1.10.x/how-to-guides/query-runner/admin-configuration.md b/content/v1.10.x/how-to-guides/query-runner/admin-configuration.md new file mode 100644 index 000000000..422b4bc40 --- /dev/null +++ b/content/v1.10.x/how-to-guides/query-runner/admin-configuration.md @@ -0,0 +1,229 @@ +--- +title: Query Runner Admin Configuration | Collate Query Runner Guide +description: Configure Query Runner for each database service, choose authentication methods, and control user overrides. +slug: /how-to-guides/query-runner/admin-configuration +collate: true +--- + +# Admin Configuration + +Administrators must configure Query Runner for each database service before users can access it. This section describes the configuration process and authentication options for each supported service. + +## Accessing Admin Configuration + +1. Navigate to **Settings** → **Services** → **Database Services** +2. Select the database service you want to configure (e.g., `my-bigquery`, `production-snowflake`) +3. Click on the **Query Runner** tab +4. Click **Configure** to set up Query Runner + +## Configuration Settings + +All services share these common settings: + +| Setting | Description | Default | +| --- | --- | --- | +| **Authentication Type** | How users authenticate (CollateSSO, ExternalOAuth, Basic) | Varies by service | +| **Enabled** | Whether Query Runner is active for this service | `false` | +| **Max Result Size** | Maximum number of rows returned per query | `100` | +| **User Configurable Fields** | Fields users can override (role, database, schema, warehouse, dataset) | `[]` (empty) | + +## BigQuery Configuration + +BigQuery supports three authentication methods: + +## 2.1.1 CollateSSO (Recommended for Google Workspace) + +Use this when OpenMetadata Collate is configured with Google SSO. + +**Configuration**: +- **Auth Type**: Select `CollateSSO` +- **OAuth Credentials**: Auto-populated from system SSO settings +- **Scope**: `https://www.googleapis.com/auth/bigquery` (auto-populated) +- **User Configurable Fields**: Optionally allow users to override: +- `dataset` - Let users specify a default dataset + +**How It Works**: +- Users authenticate using their Google Workspace account (same as Collate login) +- OAuth tokens are automatically managed and refreshed +- No additional credentials needed + +**Prerequisites**: +- Google SSO must be configured in OpenMetadata Collate +- SSO client must have BigQuery API scope enabled + +## 2.1.2 ExternalOAuth (For External Google OAuth) + +Use this when you want users to authenticate with Google OAuth but Collate uses a different SSO provider. + +**Configuration**: +- **Auth Type**: Select `ExternalOAuth` +- **OAuth Client Credentials**: Admin must provide: +- **Client ID**: From Google Cloud Console OAuth 2.0 Client +- **Client Secret**: From Google Cloud Console OAuth 2.0 Client +- **Redirect URL**: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +- **Scope**: `https://www.googleapis.com/auth/bigquery` +- **User Configurable Fields**: Same as CollateSSO + +**Getting OAuth Credentials**: +1. Go to [Google Cloud Console](https://console.cloud.google.com/) → **APIs & Services** → **Credentials** +2. Create **OAuth 2.0 Client ID** (Web application type) +3. Add authorized redirect URI: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +4. Copy Client ID and Client Secret +5. Store credentials securely in **1Password** under `BigQuery Query Runner OAuth` (example) + +**1Password (Collate Internal)**: https://share.1password.com/s#m3iREgcUOxPqyNOG01bGCKxDNdLcV8niBA1iQ18S_gQ + +## 2.1.3 Basic Authentication (Service Account) + +Use this for non-interactive authentication using GCP Service Accounts. + +**Configuration**: +- **Auth Type**: Select `Basic` +- **User Credentials**: Each user provides their own service account credentials +- **User Configurable Fields**: Optionally allow users to override: +- `dataset` - Default dataset for queries + +**User Setup**: +Users will need to provide: +- Service Account JSON file downloaded from GCP Console +- Fields extracted from JSON: `private_key_id`, `private_key`, `client_email`, `client_id` + +**Getting Service Account Credentials**: +1. Go to [Google Cloud Console](https://console.cloud.google.com/) → **IAM & Admin** → **Service Accounts** +2. Create or select a service account +3. Grant BigQuery roles: `BigQuery Data Viewer` or `BigQuery User` +4. Create JSON key and download +5. Store credentials securely in **1Password** under `BigQuery Service Account - {username}` (example) + +**Important Notes**: +- Service accounts need `bigquery.jobs.create` permission to execute queries +- Project ID is taken from the database service connection configuration +- Credentials are encrypted before storage + +## Snowflake Configuration + +Snowflake supports two authentication methods: + +## 2.2.1 ExternalOAuth (Recommended) + +Use this for OAuth-based authentication with Snowflake. + +**Configuration**: +- **Auth Type**: Select `ExternalOAuth` +- **OAuth Client Credentials**: Admin must provide: +- **Client ID**: From Snowflake OAuth integration +- **Client Secret**: From Snowflake OAuth integration +- **Redirect URL**: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +- **Scope**: `session:role-any` (or specific role scope) +- **User Configurable Fields**: Optionally allow users to override: +- `warehouse` - Compute warehouse to use +- `database` - Default database +- `schema` - Default schema +- `role` - Snowflake role for access control + +**Getting OAuth Credentials**: +1. In Snowflake, create a security integration: + +```jsx +-- SAMPLE +CREATE OR REPLACE SECURITY INTEGRATION OAUTH_SNOWFLAKE_INT + TYPE = OAUTH + OAUTH_CLIENT = CUSTOM + OAUTH_CLIENT_TYPE = 'CONFIDENTIAL' + OAUTH_REDIRECT_URI = 'https:///api/v1/queryRunner/oauth/callback' --'http://localhost:5050/callback' -- <== change if needed + OAUTH_ALLOW_NON_TLS_REDIRECT_URI = TRUE -- dev/local only + OAUTH_ISSUE_REFRESH_TOKENS = TRUE + OAUTH_REFRESH_TOKEN_VALIDITY = 86400 -- 24h + OAUTH_USE_SECONDARY_ROLES = 'IMPLICIT' + ENABLED = TRUE; +``` + + +2. Retrieve Client ID and Client Secret: +`SELECT SYSTEM$SHOW_OAUTH_CLIENT_SECRETS('OAUTH_SNOWFLAKE_INT');` + +**1Password (Collate Internal)**: https://share.1password.com/s#oY6eQL8891iFve3IeDn_iDhsFoK3aI1Cz9RAZZk2I4c + +## 2.2.2 Basic Authentication (Username/Password or Key Pair) + +Use this for username/password or key-pair authentication. + +**Configuration**: +- **Auth Type**: Select `Basic` +- **User Credentials**: Each user provides their own credentials: +- Username + Password, OR +- Username + Private Key + Passphrase +- **User Configurable Fields**: Same as ExternalOAuth + +**User Setup**: +Users will need to provide: +- **Username**: Snowflake username +- **Password** OR **Private Key + Passphrase** for key-pair authentication + +**Getting Key Pair Credentials** (if using key-pair auth): +1. Generate RSA key pair: +`bash openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub` +2. Upload public key to Snowflake: +`sql ALTER USER myusername SET RSA_PUBLIC_KEY='MIIBIjANBgkqhki...';` + +**Important Notes**: +- Account URL, warehouse, database, schema, and role are inherited from service connection unless user overrides +- Credentials are encrypted before storage + +## Trino Configuration + +Trino supports OAuth authentication only: + +## 2.3.1 ExternalOAuth (Starburst OAuth) + +Use this for OAuth-based authentication with Trino/Starburst clusters. + +**Configuration**: +- **Auth Type**: Select `ExternalOAuth` +- **Host Port**: Auto-populated from service connection (e.g., `ometa.galaxy.starburst.io:443`) +- **OAuth Client Credentials**: Admin must provide: +- **Client ID**: From Azure AD or OAuth provider +- **Client Secret**: From Azure AD or OAuth provider +- **Redirect URL**: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +- **Scope**: Typically `openid profile email` or custom scope + +**Getting OAuth Credentials** (Azure AD example for Starburst): +1. In Azure Portal, go to **Azure Active Directory** → **App registrations** +2. Create or select an application +3. Add redirect URI: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +4. Create client secret in **Certificates & secrets** +5. Copy Application (client) ID and Client Secret + +**1Password (Collate Internal)**: https://share.1password.com/s#vTbmCrSSZQMmn3K-Xp7tQn1xL7E_o8-SRh7GqKia9YA + +**Important Notes**: +- Host and port are automatically pulled from the database service connection +- OAuth endpoints are constructed as: `https://{hostPort}/oauth/v2/authorize` and `https://{hostPort}/oauth/v2/token` + +## Enabling Query Runner + +After configuring authentication: + +1. Check the **Enabled** checkbox +2. Set **Max Result Size** (1-100 rows, default: 100) +3. Select **User Configurable Fields** if you want users to override connection settings +4. Click **Save** + +Users will now see the service in SQL Studio and can connect to it. + +## 3. User Configuration + +Once an administrator has configured Query Runner for a database service, users can establish their own connections and begin querying. + +## Understanding Connection Status + +Your connection to a service can be in one of four states: + +| Status | Indicator | Meaning | Action Required | +| --- | --- | --- | --- | +| **Not Configured** | ⚪ Gray | Default state, no authentication attempted | Authenticate to connect | +| **Pending** | 🟡 Yellow | Authentication completed, test connection in progress | Wait for test connection to complete | +| **Connected** | 🟢 Green | Test connection successful, ready to execute queries | None - you can query | +| **Expired** | 🔴 Red | Tokens expired, connection needs re-authentication | Re-authenticate to reconnect | + +You can view your connection status in the SQL Studio sidebar next to the service name. diff --git a/content/v1.10.x/how-to-guides/query-runner/index.md b/content/v1.10.x/how-to-guides/query-runner/index.md new file mode 100644 index 000000000..e0d8ad936 --- /dev/null +++ b/content/v1.10.x/how-to-guides/query-runner/index.md @@ -0,0 +1,38 @@ +--- +title: Query Runner Overview | Collate Query Runner Guide +description: Learn what Query Runner is, supported services, and how it powers the SQL Studio experience in Collate. +slug: /how-to-guides/query-runner +collate: true +--- + +# Query Runner User Guide + +## 1. Introduction to Query Runner + +Query Runner is a powerful feature in OpenMetadata Collate that allows you to execute SQL queries directly against your configured database services from within the UI. This eliminates the need to switch between different database clients and provides a unified interface for data exploration and analysis. + +### Key Benefits + +- **Unified Interface**: Execute queries across multiple database services from a single interface +- **Secure Authentication**: Support for OAuth, SSO, and Basic Authentication with encrypted credential storage +- **Query Management**: Save, organize, and reuse frequently-used queries +- **Database Explorer**: Browse databases, schemas, and tables visually +- **Audit Trail**: All queries are logged for compliance and security + +### Supported Database Services + +Query Runner currently supports: +- **BigQuery** (Google Cloud Platform) +- **Snowflake** +- **Trino** (Starburst) + +### Query Runner Workflow + +``` +Admin Configuration → User Authentication → SQL Studio → Query Execution +``` + +1. **Admin** configures Query Runner for a database service and sets authentication method +2. **User** authenticates and establishes a connection to the service +3. **User** writes and executes queries in SQL Studio +4. **Results** are displayed with execution time and row counts diff --git a/content/v1.10.x/how-to-guides/query-runner/sql-studio.md b/content/v1.10.x/how-to-guides/query-runner/sql-studio.md new file mode 100644 index 000000000..f983e2534 --- /dev/null +++ b/content/v1.10.x/how-to-guides/query-runner/sql-studio.md @@ -0,0 +1,340 @@ +--- +title: SQL Studio | Collate Query Runner Guide +description: Learn how to write, run, and manage SQL queries in SQL Studio after connecting through Query Runner. +slug: /how-to-guides/query-runner/sql-studio +collate: true +--- + +# SQL Studio User Guide + +Once connected to a database service through Query Runner, you can write and execute SQL queries in the SQL Studio interface. + + +## SQL Studio Layout + +``` +┌──────────────────────────────────────────────────────┐ +│ [Service Selector ▼] [Status: Connected 🟢] │ +├────────────┬─────────────────────────────────────────┤ +│ │ Query Tab 1 Query Tab 2 [+ New] │ +│ Saved │ ┌──────────────────────────────────┐ │ +│ Queries │ │ SELECT * FROM users LIMIT 10 │ │ +│ │ │ │ │ +│ Database │ │ │ │ +│ Explorer │ └──────────────────────────────────┘ │ +│ │ [▶ Run] [💾 Save] [⚙ Settings] │ +│ • project │ ───────────────────────────────────── │ +│ • dataset│ Results │ +│ • table│ ┌──────────────────────────────────┐ │ +│ │ │ name │ email │ │ │ +│ │ │──────────│────────────────────│ │ │ +│ │ │ Alice │ [alice@example.com](mailto:alice@example.com) │ │ │ +│ │ │ Bob │ [bob@example.com](mailto:bob@example.com) │ │ │ +│ │ └──────────────────────────────────┘ │ +│ │ 2 rows • 123ms │ +└────────────┴─────────────────────────────────────────┘ +``` + +## Writing Queries + +### SQL Editor Features + +The SQL Editor provides: + +- **Syntax Highlighting**: SQL keywords, strings, and comments are color-coded +- **Auto-complete**: Press `Ctrl+Space` for suggestions (table names, column names, keywords) +- **Multi-line Editing**: Write complex queries across multiple lines +- **Line Numbers**: Easy reference for debugging +- **Keyboard Shortcuts**: + - `Cmd/Ctrl + Enter`: Execute query + - `Cmd/Ctrl + S`: Save query + - `Tab`: Indent selection + - `Shift + Tab`: Unindent selection + +### Query Best Practices + +1. **Always use LIMIT**: Avoid fetching large datasets + + ```sql + SELECT * FROM large_table LIMIT 100; + ``` + +2. **Filter with WHERE**: Reduce data at the source + + ```sql + SELECT name, email FROM users WHERE created_at > '2024-01-01'; + ``` + +3. **Use fully qualified names**: Specify database/schema/table + + ```sql + -- BigQuery + SELECT * FROM `project.dataset.table` LIMIT 10; + + -- Snowflake + SELECT * FROM database.schema.table LIMIT 10; + + -- Trino + SELECT * FROM catalog.schema.table LIMIT 10; + ``` + +4. **Preview table structure**: Quickly see columns and sample data + + ```sql + SELECT * FROM table LIMIT 5; + ``` + +## Executing Queries + +1. Write your SQL query in the editor +2. Click the **Run** button (▶️) in the toolbar, OR press `Cmd/Ctrl + Enter` +3. Query execution starts: + - Status indicator shows "Running…" + - Execution time counter starts +4. Results appear in the **Results Panel** below: + - **Column headers**: Clickable for sorting (if supported) + - **Data rows**: Up to max result size (typically 100 rows) + - **Footer**: Shows row count and execution time (e.g., "45 rows • 234ms") + +**Query Execution Limits**: + +- **Max Result Size**: Set by admin (typically 100 rows) +- **Timeout**: Queries timeout after a configured duration (typically 5 minutes) +- **Permissions**: You can only query objects you have access to in the database + +**Handling Errors**: + +- Syntax errors: Red underline in editor + error message in results panel +- Permission errors: "Access denied" message with details +- Timeout errors: "Query timed out" message - optimize your query + +## Managing Query Tabs + +Work with multiple queries simultaneously using tabs: + +### Creating a New Tab + +1. Click **+ New Query** button in the tab bar +2. A new empty tab opens with a default name (e.g., "Query 1", "Query 2") + +### Switching Between Tabs + +1. Click on tab names to switch +2. Active tab is highlighted +3. Each tab maintains its own query text and results + +### Renaming a Tab + +1. Double-click on the tab name +2. Enter a new name (e.g., "User Analysis", "Revenue Report") +3. Press `Enter` to save + +### Closing a Tab + +1. Click the **×** icon on the tab +2. Unsaved changes are lost (queries are not auto-saved) +3. At least one tab must remain open + +**Note**: Query tabs are session-based and not persisted. To keep queries, use the **Save Query** feature. + +## Saving Queries + +Save frequently-used queries for quick access: + +### Save a Query + +1. Write your query in the editor +2. Click **Save Query** in the toolbar +3. Enter a meaningful name (e.g., "Daily Active Users", "Revenue by Region") +4. Click **Save** +5. Query appears in the **Saved Queries** section of the sidebar + +### Load a Saved Query + +1. Navigate to **Saved Queries** in the left sidebar +2. Click on the query name +3. Query text loads into the current editor tab +4. Execute or modify as needed + +### Edit a Saved Query + +1. Load the query into the editor +2. Make your changes +3. Click **Save Query** again +4. Choose **Update existing** to overwrite, or **Save as new** to create a copy + +### Delete a Saved Query + +1. Hover over the query in the **Saved Queries** sidebar +2. Click the **Delete** (🗑️) icon +3. Confirm deletion +4. Query is permanently removed + +**Note**: Saved queries are private to you. To share, copy the query text and send to your team. + +## Exploring Database Metadata + +The **Database Explorer** in the sidebar shows the structure of your database: + +### Hierarchy + +Depending on the service, you'll see: + +- **BigQuery**: Projects → Datasets → Tables +- **Snowflake**: Databases → Schemas → Tables +- **Trino**: Catalogs → Schemas → Tables + +### Browsing + +1. Click the **▶** icon next to a database/project/catalog to expand +2. Expand schemas/datasets to view tables +3. Click on a table name to: + - View table metadata (columns, types) + - Insert table name into editor at cursor position + +### Using Table Names + +1. Expand to the table you want to query +2. Double-click the table name +3. Fully-qualified table name is inserted into editor: + - BigQuery: `project.dataset.table` + - Snowflake: `database.schema.table` + - Trino: `catalog.schema.table` +4. Build your query around it + +**Example**: + +```sql +-- Double-click on users table in explorer +-- This gets inserted: +SELECT * FROM `[my-project.analytics](http://my-project.analytics).users` LIMIT 10; +``` + +## Query Results + +### Results Display + +Results appear in a table format with: + +- **Column Headers**: Show column names from your SELECT statement +- **Data Rows**: Up to the configured max result size +- **Scrolling**: Vertical and horizontal scroll for large results +- **Footer**: + - Row count (e.g., "45 rows" or "100 rows (limit reached)") + - Execution time (e.g., "234ms") + +### Interacting with Results + +- **Sort**: Click column headers to sort (if supported) +- **Copy**: Select cells and copy to clipboard +- **Export**: (Future feature) Export to CSV, JSON, or Excel + +### Result Limits + +- Maximum rows returned is set by admin (typically 100) +- If your query returns more rows, results are truncated +- Footer indicates: "100 rows (limit reached)" +- Use `LIMIT` clause in your query to control result size + +**Example**: + +```sql +-- Returns first 10 rows +SELECT * FROM large_table LIMIT 10; + +-- Returns rows 11-20 (pagination) +SELECT * FROM large_table LIMIT 10 OFFSET 10; +``` + +## Service-Specific Query Syntax + +### BigQuery + +**Fully Qualified Names**: + +```sql +SELECT * FROM `project-id.dataset_name.table_name` LIMIT 10; +``` + +**Standard SQL**: + +```sql +SELECT name, COUNT(*) as count +FROM `project.dataset.users` +WHERE created_at > '2024-01-01' +GROUP BY name +ORDER BY count DESC +LIMIT 10; +``` + +**Cross-Project Queries**: + +```sql +SELECT * +FROM `project-1.dataset.table1` t1 +JOIN `project-2.dataset.table2` t2 + ON [t1.id](http://t1.id) = [t2.id](http://t2.id) +LIMIT 10; +``` + +**Common Functions**: + +- `DATE()`, `TIMESTAMP()`: Date/time functions +- `ARRAY_AGG()`: Aggregate into array +- `STRUCT()`: Create structured data + +### Snowflake + +**Fully Qualified Names**: + +```sql +SELECT * FROM database_name.schema_name.table_name LIMIT 10; +``` + +**Using Warehouse/Database/Schema**: + +```sql +USE WAREHOUSE compute_wh; +USE DATABASE analytics; +USE SCHEMA public; +SELECT * FROM users LIMIT 10; +``` + +**Common Functions**: + +- `DATEADD()`, `DATEDIFF()`: Date arithmetic +- `LISTAGG()`: Aggregate to comma-separated list +- `FLATTEN()`: Unnest arrays + +**Role-Based Access**: + +```sql +USE ROLE analyst_role; +SELECT * FROM sensitive_table LIMIT 10; +``` + +### Trino + +**Fully Qualified Names**: + +```sql +SELECT * FROM catalog_name.schema_name.table_name LIMIT 10; +``` + +**Cross-Catalog Queries** (Federated): + +```sql +SELECT + [postgres.public.users.name](http://postgres.public.users.name), + [mysql.analytics.orders.total](http://mysql.analytics.orders.total) +FROM postgres.public.users +JOIN [mysql.analytics](http://mysql.analytics).orders + ON [users.id](http://users.id) = orders.user_id +LIMIT 10; +``` + +**Common Functions**: + +- `date_format()`, `from_unixtime()`: Date functions +- `array_agg()`: Aggregate into array +- `regexp_extract()`: Regex extraction diff --git a/content/v1.10.x/how-to-guides/query-runner/user-authentication.md b/content/v1.10.x/how-to-guides/query-runner/user-authentication.md new file mode 100644 index 000000000..7e3301475 --- /dev/null +++ b/content/v1.10.x/how-to-guides/query-runner/user-authentication.md @@ -0,0 +1,145 @@ +--- +title: User Authentication | Collate Query Runner Guide +description: Walk through OAuth and basic authentication flows users follow to connect Query Runner to database services. +slug: /how-to-guides/query-runner/user-authentication +collate: true +--- + +### User Authentication Flow + +### Step 1: Access SQL Studio + +1. Log in to OpenMetadata Collate +2. Navigate to **SQL Studio** from the main navigation menu +3. If this is your first time, you’ll see a landing page with available services + +### Step 2: Select a Service + +1. From the SQL Studio sidebar, click the **Service Selection** dropdown +2. Choose the database service you want to connect to +3. If you haven’t configured a connection yet, you’ll see a **Not Connected** status + +### Step 3: Authenticate + +The authentication process depends on the **authentication type** configured by your administrator: + +### OAuth Authentication (CollateSSO or ExternalOAuth) + +If your service is configured for OAuth: + +1. Click **Connect with OAuth** button +2. A popup window opens with the OAuth provider's login page +3. **Sign in** with your credentials: + - **BigQuery CollateSSO**: Use your Google Workspace account (same as Collate login) + - **BigQuery ExternalOAuth**: Use your Google account + - **Snowflake OAuth**: Use your Snowflake credentials + - **Trino OAuth**: Use Azure AD or configured OAuth provider +4. **Grant permissions** when prompted (e.g., "Allow access to BigQuery") +5. The popup closes automatically, and you're redirected back to SQL Studio +6. Connection status changes to **Pending** 🟡 while test connection is triggered +7. Once test connection succeeds, status changes to **Connected** 🟢 + +**OAuth Token Expiration**: +- OAuth tokens typically expire after a few hours or days +- When expired, your status changes to **Expired** 🔴 +- Click **Refresh Token** to renew without re-authenticating +- If refresh fails, click **Reconnect** to go through OAuth flow again + +### Basic Authentication + +If your service is configured for Basic Auth: + +### BigQuery Basic Auth + +1. Click **Configure Connection** or **Connect** +2. A modal opens requesting your GCP Service Account credentials +3. You can either: + - **Upload JSON**: Click **Upload Service Account JSON** and select your downloaded `.json` file + - **Manual Entry**: Enter fields individually: + - **Private Key ID**: From `private_key_id` field in JSON + - **Private Key**: From `private_key` field in JSON (full PEM-formatted key) + - **Client Email**: From `client_email` field in JSON + - **Client ID**: From `client_id` field in JSON +4. (Optional) If admin allowed, override **Dataset** +5. Click **Test Connection** to verify credentials +6. Connection status changes to **Pending** 🟡 while test is in progress +7. If test succeeds, click **Save** to complete setup +8. Connection status changes to **Connected** 🟢 + +**Where to Get Credentials**: +- Download service account JSON from [Google Cloud Console](https://console.cloud.google.com/) → **IAM & Admin** → **Service Accounts** +- Store credentials in your password manager (e.g., 1Password) +- Retrieve credentials: `op://vault/BigQuery-ServiceAccount-{yourname}/credentials.json` + +### Snowflake Basic Auth + +1. Click **Configure Connection** or **Connect** +2. A modal opens requesting your Snowflake credentials +3. Enter your **Username** (required) +4. Choose authentication method: + - **Password-based**: Enter **Password** + - **Key-pair based**: Enter **Private Key** and **Passphrase** +5. (Optional) If admin allowed, override connection settings: + - **Warehouse**: Compute warehouse to use (e.g., `COMPUTE_WH`) + - **Database**: Default database (e.g., `ANALYTICS`) + - **Schema**: Default schema (e.g., `PUBLIC`) + - **Role**: Role for access control (e.g., `ANALYST`) +6. Click **Test Connection** to verify credentials +7. Connection status changes to **Pending** 🟡 while test is in progress +8. If test succeeds, click **Save** to complete setup +9. Connection status changes to **Connected** 🟢 + +**Where to Get Credentials**: +- Username and password: Provided by your Snowflake administrator +- Key pair: Generate using `openssl` and upload public key to Snowflake +- Store credentials in your password manager (e.g., 1Password) + +### Test Connection Flow + +After entering credentials or completing OAuth authentication, the **Test Connection** process is automatically triggered: + +1. Connection status changes to **Pending** 🟡 +2. Backend validates your credentials by: + - Attempting to connect to the database service + - Executing a simple test query (e.g., `SELECT 1`) +3. **If successful**: + - Connection status changes to **Connected** 🟢 + - Last connection timestamp is recorded + - You can now execute queries +4. **If failed**: + - Connection status is Pending 🟡 + - Review your credentials and try again + +**Troubleshooting Test Connection Failures**: +- **Invalid credentials**: Double-check username, password, keys, etc. +- **Insufficient permissions**: Verify you have query permissions in the database (e.g., BigQuery `bigquery.jobs.create`, Snowflake `USAGE` on warehouse) +- **Network issues**: Ensure the backend can reach the database service (firewall rules, VPN) +- **Incorrect configuration**: Verify service connection is properly configured in OpenMetadata + +## 4. Using SQL Studio + +Once connected, you can write and execute SQL queries in SQL Studio. For detailed instructions on using SQL Studio, see the [SQL Studio User Guide](https://www.notion.so/SQL-Studio-User-Guide-8a3d1f6a412248faa0782e630ef494dc?pvs=21). + +## 5. Summary + +Query Runner provides a seamless experience for querying database services directly from OpenMetadata Collate: + +1. **Admin** configures Query Runner with authentication method (CollateSSO, OAuth, Basic) +2. **User** authenticates and establishes connection (OAuth flow or credential entry) +3. **User** writes and executes SQL queries in SQL Studio +4. **Results** are displayed with execution time and row counts +5. **User** saves queries for reuse and explores database metadata + +### Quick Reference + +| Task | Steps | +| --- | --- | +| **Connect to service** | SQL Studio → Select service → Connect (OAuth or Basic) | +| **Execute query** | Write query → Click Run (or Cmd/Ctrl+Enter) | +| **Save query** | Click Save Query → Enter name → Save | +| **Load saved query** | Saved Queries sidebar → Click query name | +| **Explore databases** | Database Explorer sidebar → Expand hierarchy | +| **Refresh OAuth token** | Click Refresh Token button when status is Expired | +| **Disconnect** | Settings icon → Delete Connection | + +Enjoy exploring your data with Query Runner! diff --git a/content/v1.11.x-SNAPSHOT/collate-menu.md b/content/v1.11.x-SNAPSHOT/collate-menu.md index 784cd07ad..ac7bc628c 100644 --- a/content/v1.11.x-SNAPSHOT/collate-menu.md +++ b/content/v1.11.x-SNAPSHOT/collate-menu.md @@ -924,6 +924,15 @@ site_menu: - category: How-to Guides / Guide for Data Users / OpenMetadata Browser Extension url: /how-to-guides/guide-for-data-users/browser-ext + - category: How-to Guides / Query Runner + url: /how-to-guides/query-runner + - category: How-to Guides / Query Runner / Admin Configuration + url: /how-to-guides/query-runner/admin-configuration + - category: How-to Guides / Query Runner / User Authentication + url: /how-to-guides/query-runner/user-authentication + - category: How-to Guides / Query Runner / SQL Studio + url: /how-to-guides/query-runner/sql-studio + - category: How-to Guides / Data Discovery url: /how-to-guides/data-discovery - category: How-to Guides / Data Discovery / How to Discover Assets of Interest diff --git a/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/admin-configuration.md b/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/admin-configuration.md new file mode 100644 index 000000000..422b4bc40 --- /dev/null +++ b/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/admin-configuration.md @@ -0,0 +1,229 @@ +--- +title: Query Runner Admin Configuration | Collate Query Runner Guide +description: Configure Query Runner for each database service, choose authentication methods, and control user overrides. +slug: /how-to-guides/query-runner/admin-configuration +collate: true +--- + +# Admin Configuration + +Administrators must configure Query Runner for each database service before users can access it. This section describes the configuration process and authentication options for each supported service. + +## Accessing Admin Configuration + +1. Navigate to **Settings** → **Services** → **Database Services** +2. Select the database service you want to configure (e.g., `my-bigquery`, `production-snowflake`) +3. Click on the **Query Runner** tab +4. Click **Configure** to set up Query Runner + +## Configuration Settings + +All services share these common settings: + +| Setting | Description | Default | +| --- | --- | --- | +| **Authentication Type** | How users authenticate (CollateSSO, ExternalOAuth, Basic) | Varies by service | +| **Enabled** | Whether Query Runner is active for this service | `false` | +| **Max Result Size** | Maximum number of rows returned per query | `100` | +| **User Configurable Fields** | Fields users can override (role, database, schema, warehouse, dataset) | `[]` (empty) | + +## BigQuery Configuration + +BigQuery supports three authentication methods: + +## 2.1.1 CollateSSO (Recommended for Google Workspace) + +Use this when OpenMetadata Collate is configured with Google SSO. + +**Configuration**: +- **Auth Type**: Select `CollateSSO` +- **OAuth Credentials**: Auto-populated from system SSO settings +- **Scope**: `https://www.googleapis.com/auth/bigquery` (auto-populated) +- **User Configurable Fields**: Optionally allow users to override: +- `dataset` - Let users specify a default dataset + +**How It Works**: +- Users authenticate using their Google Workspace account (same as Collate login) +- OAuth tokens are automatically managed and refreshed +- No additional credentials needed + +**Prerequisites**: +- Google SSO must be configured in OpenMetadata Collate +- SSO client must have BigQuery API scope enabled + +## 2.1.2 ExternalOAuth (For External Google OAuth) + +Use this when you want users to authenticate with Google OAuth but Collate uses a different SSO provider. + +**Configuration**: +- **Auth Type**: Select `ExternalOAuth` +- **OAuth Client Credentials**: Admin must provide: +- **Client ID**: From Google Cloud Console OAuth 2.0 Client +- **Client Secret**: From Google Cloud Console OAuth 2.0 Client +- **Redirect URL**: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +- **Scope**: `https://www.googleapis.com/auth/bigquery` +- **User Configurable Fields**: Same as CollateSSO + +**Getting OAuth Credentials**: +1. Go to [Google Cloud Console](https://console.cloud.google.com/) → **APIs & Services** → **Credentials** +2. Create **OAuth 2.0 Client ID** (Web application type) +3. Add authorized redirect URI: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +4. Copy Client ID and Client Secret +5. Store credentials securely in **1Password** under `BigQuery Query Runner OAuth` (example) + +**1Password (Collate Internal)**: https://share.1password.com/s#m3iREgcUOxPqyNOG01bGCKxDNdLcV8niBA1iQ18S_gQ + +## 2.1.3 Basic Authentication (Service Account) + +Use this for non-interactive authentication using GCP Service Accounts. + +**Configuration**: +- **Auth Type**: Select `Basic` +- **User Credentials**: Each user provides their own service account credentials +- **User Configurable Fields**: Optionally allow users to override: +- `dataset` - Default dataset for queries + +**User Setup**: +Users will need to provide: +- Service Account JSON file downloaded from GCP Console +- Fields extracted from JSON: `private_key_id`, `private_key`, `client_email`, `client_id` + +**Getting Service Account Credentials**: +1. Go to [Google Cloud Console](https://console.cloud.google.com/) → **IAM & Admin** → **Service Accounts** +2. Create or select a service account +3. Grant BigQuery roles: `BigQuery Data Viewer` or `BigQuery User` +4. Create JSON key and download +5. Store credentials securely in **1Password** under `BigQuery Service Account - {username}` (example) + +**Important Notes**: +- Service accounts need `bigquery.jobs.create` permission to execute queries +- Project ID is taken from the database service connection configuration +- Credentials are encrypted before storage + +## Snowflake Configuration + +Snowflake supports two authentication methods: + +## 2.2.1 ExternalOAuth (Recommended) + +Use this for OAuth-based authentication with Snowflake. + +**Configuration**: +- **Auth Type**: Select `ExternalOAuth` +- **OAuth Client Credentials**: Admin must provide: +- **Client ID**: From Snowflake OAuth integration +- **Client Secret**: From Snowflake OAuth integration +- **Redirect URL**: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +- **Scope**: `session:role-any` (or specific role scope) +- **User Configurable Fields**: Optionally allow users to override: +- `warehouse` - Compute warehouse to use +- `database` - Default database +- `schema` - Default schema +- `role` - Snowflake role for access control + +**Getting OAuth Credentials**: +1. In Snowflake, create a security integration: + +```jsx +-- SAMPLE +CREATE OR REPLACE SECURITY INTEGRATION OAUTH_SNOWFLAKE_INT + TYPE = OAUTH + OAUTH_CLIENT = CUSTOM + OAUTH_CLIENT_TYPE = 'CONFIDENTIAL' + OAUTH_REDIRECT_URI = 'https:///api/v1/queryRunner/oauth/callback' --'http://localhost:5050/callback' -- <== change if needed + OAUTH_ALLOW_NON_TLS_REDIRECT_URI = TRUE -- dev/local only + OAUTH_ISSUE_REFRESH_TOKENS = TRUE + OAUTH_REFRESH_TOKEN_VALIDITY = 86400 -- 24h + OAUTH_USE_SECONDARY_ROLES = 'IMPLICIT' + ENABLED = TRUE; +``` + + +2. Retrieve Client ID and Client Secret: +`SELECT SYSTEM$SHOW_OAUTH_CLIENT_SECRETS('OAUTH_SNOWFLAKE_INT');` + +**1Password (Collate Internal)**: https://share.1password.com/s#oY6eQL8891iFve3IeDn_iDhsFoK3aI1Cz9RAZZk2I4c + +## 2.2.2 Basic Authentication (Username/Password or Key Pair) + +Use this for username/password or key-pair authentication. + +**Configuration**: +- **Auth Type**: Select `Basic` +- **User Credentials**: Each user provides their own credentials: +- Username + Password, OR +- Username + Private Key + Passphrase +- **User Configurable Fields**: Same as ExternalOAuth + +**User Setup**: +Users will need to provide: +- **Username**: Snowflake username +- **Password** OR **Private Key + Passphrase** for key-pair authentication + +**Getting Key Pair Credentials** (if using key-pair auth): +1. Generate RSA key pair: +`bash openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub` +2. Upload public key to Snowflake: +`sql ALTER USER myusername SET RSA_PUBLIC_KEY='MIIBIjANBgkqhki...';` + +**Important Notes**: +- Account URL, warehouse, database, schema, and role are inherited from service connection unless user overrides +- Credentials are encrypted before storage + +## Trino Configuration + +Trino supports OAuth authentication only: + +## 2.3.1 ExternalOAuth (Starburst OAuth) + +Use this for OAuth-based authentication with Trino/Starburst clusters. + +**Configuration**: +- **Auth Type**: Select `ExternalOAuth` +- **Host Port**: Auto-populated from service connection (e.g., `ometa.galaxy.starburst.io:443`) +- **OAuth Client Credentials**: Admin must provide: +- **Client ID**: From Azure AD or OAuth provider +- **Client Secret**: From Azure AD or OAuth provider +- **Redirect URL**: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +- **Scope**: Typically `openid profile email` or custom scope + +**Getting OAuth Credentials** (Azure AD example for Starburst): +1. In Azure Portal, go to **Azure Active Directory** → **App registrations** +2. Create or select an application +3. Add redirect URI: `https://your-collate-domain.com/api/v1/queryRunner/oauth/callback` +4. Create client secret in **Certificates & secrets** +5. Copy Application (client) ID and Client Secret + +**1Password (Collate Internal)**: https://share.1password.com/s#vTbmCrSSZQMmn3K-Xp7tQn1xL7E_o8-SRh7GqKia9YA + +**Important Notes**: +- Host and port are automatically pulled from the database service connection +- OAuth endpoints are constructed as: `https://{hostPort}/oauth/v2/authorize` and `https://{hostPort}/oauth/v2/token` + +## Enabling Query Runner + +After configuring authentication: + +1. Check the **Enabled** checkbox +2. Set **Max Result Size** (1-100 rows, default: 100) +3. Select **User Configurable Fields** if you want users to override connection settings +4. Click **Save** + +Users will now see the service in SQL Studio and can connect to it. + +## 3. User Configuration + +Once an administrator has configured Query Runner for a database service, users can establish their own connections and begin querying. + +## Understanding Connection Status + +Your connection to a service can be in one of four states: + +| Status | Indicator | Meaning | Action Required | +| --- | --- | --- | --- | +| **Not Configured** | ⚪ Gray | Default state, no authentication attempted | Authenticate to connect | +| **Pending** | 🟡 Yellow | Authentication completed, test connection in progress | Wait for test connection to complete | +| **Connected** | 🟢 Green | Test connection successful, ready to execute queries | None - you can query | +| **Expired** | 🔴 Red | Tokens expired, connection needs re-authentication | Re-authenticate to reconnect | + +You can view your connection status in the SQL Studio sidebar next to the service name. diff --git a/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/index.md b/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/index.md new file mode 100644 index 000000000..e0d8ad936 --- /dev/null +++ b/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/index.md @@ -0,0 +1,38 @@ +--- +title: Query Runner Overview | Collate Query Runner Guide +description: Learn what Query Runner is, supported services, and how it powers the SQL Studio experience in Collate. +slug: /how-to-guides/query-runner +collate: true +--- + +# Query Runner User Guide + +## 1. Introduction to Query Runner + +Query Runner is a powerful feature in OpenMetadata Collate that allows you to execute SQL queries directly against your configured database services from within the UI. This eliminates the need to switch between different database clients and provides a unified interface for data exploration and analysis. + +### Key Benefits + +- **Unified Interface**: Execute queries across multiple database services from a single interface +- **Secure Authentication**: Support for OAuth, SSO, and Basic Authentication with encrypted credential storage +- **Query Management**: Save, organize, and reuse frequently-used queries +- **Database Explorer**: Browse databases, schemas, and tables visually +- **Audit Trail**: All queries are logged for compliance and security + +### Supported Database Services + +Query Runner currently supports: +- **BigQuery** (Google Cloud Platform) +- **Snowflake** +- **Trino** (Starburst) + +### Query Runner Workflow + +``` +Admin Configuration → User Authentication → SQL Studio → Query Execution +``` + +1. **Admin** configures Query Runner for a database service and sets authentication method +2. **User** authenticates and establishes a connection to the service +3. **User** writes and executes queries in SQL Studio +4. **Results** are displayed with execution time and row counts diff --git a/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/sql-studio.md b/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/sql-studio.md new file mode 100644 index 000000000..f983e2534 --- /dev/null +++ b/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/sql-studio.md @@ -0,0 +1,340 @@ +--- +title: SQL Studio | Collate Query Runner Guide +description: Learn how to write, run, and manage SQL queries in SQL Studio after connecting through Query Runner. +slug: /how-to-guides/query-runner/sql-studio +collate: true +--- + +# SQL Studio User Guide + +Once connected to a database service through Query Runner, you can write and execute SQL queries in the SQL Studio interface. + + +## SQL Studio Layout + +``` +┌──────────────────────────────────────────────────────┐ +│ [Service Selector ▼] [Status: Connected 🟢] │ +├────────────┬─────────────────────────────────────────┤ +│ │ Query Tab 1 Query Tab 2 [+ New] │ +│ Saved │ ┌──────────────────────────────────┐ │ +│ Queries │ │ SELECT * FROM users LIMIT 10 │ │ +│ │ │ │ │ +│ Database │ │ │ │ +│ Explorer │ └──────────────────────────────────┘ │ +│ │ [▶ Run] [💾 Save] [⚙ Settings] │ +│ • project │ ───────────────────────────────────── │ +│ • dataset│ Results │ +│ • table│ ┌──────────────────────────────────┐ │ +│ │ │ name │ email │ │ │ +│ │ │──────────│────────────────────│ │ │ +│ │ │ Alice │ [alice@example.com](mailto:alice@example.com) │ │ │ +│ │ │ Bob │ [bob@example.com](mailto:bob@example.com) │ │ │ +│ │ └──────────────────────────────────┘ │ +│ │ 2 rows • 123ms │ +└────────────┴─────────────────────────────────────────┘ +``` + +## Writing Queries + +### SQL Editor Features + +The SQL Editor provides: + +- **Syntax Highlighting**: SQL keywords, strings, and comments are color-coded +- **Auto-complete**: Press `Ctrl+Space` for suggestions (table names, column names, keywords) +- **Multi-line Editing**: Write complex queries across multiple lines +- **Line Numbers**: Easy reference for debugging +- **Keyboard Shortcuts**: + - `Cmd/Ctrl + Enter`: Execute query + - `Cmd/Ctrl + S`: Save query + - `Tab`: Indent selection + - `Shift + Tab`: Unindent selection + +### Query Best Practices + +1. **Always use LIMIT**: Avoid fetching large datasets + + ```sql + SELECT * FROM large_table LIMIT 100; + ``` + +2. **Filter with WHERE**: Reduce data at the source + + ```sql + SELECT name, email FROM users WHERE created_at > '2024-01-01'; + ``` + +3. **Use fully qualified names**: Specify database/schema/table + + ```sql + -- BigQuery + SELECT * FROM `project.dataset.table` LIMIT 10; + + -- Snowflake + SELECT * FROM database.schema.table LIMIT 10; + + -- Trino + SELECT * FROM catalog.schema.table LIMIT 10; + ``` + +4. **Preview table structure**: Quickly see columns and sample data + + ```sql + SELECT * FROM table LIMIT 5; + ``` + +## Executing Queries + +1. Write your SQL query in the editor +2. Click the **Run** button (▶️) in the toolbar, OR press `Cmd/Ctrl + Enter` +3. Query execution starts: + - Status indicator shows "Running…" + - Execution time counter starts +4. Results appear in the **Results Panel** below: + - **Column headers**: Clickable for sorting (if supported) + - **Data rows**: Up to max result size (typically 100 rows) + - **Footer**: Shows row count and execution time (e.g., "45 rows • 234ms") + +**Query Execution Limits**: + +- **Max Result Size**: Set by admin (typically 100 rows) +- **Timeout**: Queries timeout after a configured duration (typically 5 minutes) +- **Permissions**: You can only query objects you have access to in the database + +**Handling Errors**: + +- Syntax errors: Red underline in editor + error message in results panel +- Permission errors: "Access denied" message with details +- Timeout errors: "Query timed out" message - optimize your query + +## Managing Query Tabs + +Work with multiple queries simultaneously using tabs: + +### Creating a New Tab + +1. Click **+ New Query** button in the tab bar +2. A new empty tab opens with a default name (e.g., "Query 1", "Query 2") + +### Switching Between Tabs + +1. Click on tab names to switch +2. Active tab is highlighted +3. Each tab maintains its own query text and results + +### Renaming a Tab + +1. Double-click on the tab name +2. Enter a new name (e.g., "User Analysis", "Revenue Report") +3. Press `Enter` to save + +### Closing a Tab + +1. Click the **×** icon on the tab +2. Unsaved changes are lost (queries are not auto-saved) +3. At least one tab must remain open + +**Note**: Query tabs are session-based and not persisted. To keep queries, use the **Save Query** feature. + +## Saving Queries + +Save frequently-used queries for quick access: + +### Save a Query + +1. Write your query in the editor +2. Click **Save Query** in the toolbar +3. Enter a meaningful name (e.g., "Daily Active Users", "Revenue by Region") +4. Click **Save** +5. Query appears in the **Saved Queries** section of the sidebar + +### Load a Saved Query + +1. Navigate to **Saved Queries** in the left sidebar +2. Click on the query name +3. Query text loads into the current editor tab +4. Execute or modify as needed + +### Edit a Saved Query + +1. Load the query into the editor +2. Make your changes +3. Click **Save Query** again +4. Choose **Update existing** to overwrite, or **Save as new** to create a copy + +### Delete a Saved Query + +1. Hover over the query in the **Saved Queries** sidebar +2. Click the **Delete** (🗑️) icon +3. Confirm deletion +4. Query is permanently removed + +**Note**: Saved queries are private to you. To share, copy the query text and send to your team. + +## Exploring Database Metadata + +The **Database Explorer** in the sidebar shows the structure of your database: + +### Hierarchy + +Depending on the service, you'll see: + +- **BigQuery**: Projects → Datasets → Tables +- **Snowflake**: Databases → Schemas → Tables +- **Trino**: Catalogs → Schemas → Tables + +### Browsing + +1. Click the **▶** icon next to a database/project/catalog to expand +2. Expand schemas/datasets to view tables +3. Click on a table name to: + - View table metadata (columns, types) + - Insert table name into editor at cursor position + +### Using Table Names + +1. Expand to the table you want to query +2. Double-click the table name +3. Fully-qualified table name is inserted into editor: + - BigQuery: `project.dataset.table` + - Snowflake: `database.schema.table` + - Trino: `catalog.schema.table` +4. Build your query around it + +**Example**: + +```sql +-- Double-click on users table in explorer +-- This gets inserted: +SELECT * FROM `[my-project.analytics](http://my-project.analytics).users` LIMIT 10; +``` + +## Query Results + +### Results Display + +Results appear in a table format with: + +- **Column Headers**: Show column names from your SELECT statement +- **Data Rows**: Up to the configured max result size +- **Scrolling**: Vertical and horizontal scroll for large results +- **Footer**: + - Row count (e.g., "45 rows" or "100 rows (limit reached)") + - Execution time (e.g., "234ms") + +### Interacting with Results + +- **Sort**: Click column headers to sort (if supported) +- **Copy**: Select cells and copy to clipboard +- **Export**: (Future feature) Export to CSV, JSON, or Excel + +### Result Limits + +- Maximum rows returned is set by admin (typically 100) +- If your query returns more rows, results are truncated +- Footer indicates: "100 rows (limit reached)" +- Use `LIMIT` clause in your query to control result size + +**Example**: + +```sql +-- Returns first 10 rows +SELECT * FROM large_table LIMIT 10; + +-- Returns rows 11-20 (pagination) +SELECT * FROM large_table LIMIT 10 OFFSET 10; +``` + +## Service-Specific Query Syntax + +### BigQuery + +**Fully Qualified Names**: + +```sql +SELECT * FROM `project-id.dataset_name.table_name` LIMIT 10; +``` + +**Standard SQL**: + +```sql +SELECT name, COUNT(*) as count +FROM `project.dataset.users` +WHERE created_at > '2024-01-01' +GROUP BY name +ORDER BY count DESC +LIMIT 10; +``` + +**Cross-Project Queries**: + +```sql +SELECT * +FROM `project-1.dataset.table1` t1 +JOIN `project-2.dataset.table2` t2 + ON [t1.id](http://t1.id) = [t2.id](http://t2.id) +LIMIT 10; +``` + +**Common Functions**: + +- `DATE()`, `TIMESTAMP()`: Date/time functions +- `ARRAY_AGG()`: Aggregate into array +- `STRUCT()`: Create structured data + +### Snowflake + +**Fully Qualified Names**: + +```sql +SELECT * FROM database_name.schema_name.table_name LIMIT 10; +``` + +**Using Warehouse/Database/Schema**: + +```sql +USE WAREHOUSE compute_wh; +USE DATABASE analytics; +USE SCHEMA public; +SELECT * FROM users LIMIT 10; +``` + +**Common Functions**: + +- `DATEADD()`, `DATEDIFF()`: Date arithmetic +- `LISTAGG()`: Aggregate to comma-separated list +- `FLATTEN()`: Unnest arrays + +**Role-Based Access**: + +```sql +USE ROLE analyst_role; +SELECT * FROM sensitive_table LIMIT 10; +``` + +### Trino + +**Fully Qualified Names**: + +```sql +SELECT * FROM catalog_name.schema_name.table_name LIMIT 10; +``` + +**Cross-Catalog Queries** (Federated): + +```sql +SELECT + [postgres.public.users.name](http://postgres.public.users.name), + [mysql.analytics.orders.total](http://mysql.analytics.orders.total) +FROM postgres.public.users +JOIN [mysql.analytics](http://mysql.analytics).orders + ON [users.id](http://users.id) = orders.user_id +LIMIT 10; +``` + +**Common Functions**: + +- `date_format()`, `from_unixtime()`: Date functions +- `array_agg()`: Aggregate into array +- `regexp_extract()`: Regex extraction diff --git a/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/user-authentication.md b/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/user-authentication.md new file mode 100644 index 000000000..7e3301475 --- /dev/null +++ b/content/v1.11.x-SNAPSHOT/how-to-guides/query-runner/user-authentication.md @@ -0,0 +1,145 @@ +--- +title: User Authentication | Collate Query Runner Guide +description: Walk through OAuth and basic authentication flows users follow to connect Query Runner to database services. +slug: /how-to-guides/query-runner/user-authentication +collate: true +--- + +### User Authentication Flow + +### Step 1: Access SQL Studio + +1. Log in to OpenMetadata Collate +2. Navigate to **SQL Studio** from the main navigation menu +3. If this is your first time, you’ll see a landing page with available services + +### Step 2: Select a Service + +1. From the SQL Studio sidebar, click the **Service Selection** dropdown +2. Choose the database service you want to connect to +3. If you haven’t configured a connection yet, you’ll see a **Not Connected** status + +### Step 3: Authenticate + +The authentication process depends on the **authentication type** configured by your administrator: + +### OAuth Authentication (CollateSSO or ExternalOAuth) + +If your service is configured for OAuth: + +1. Click **Connect with OAuth** button +2. A popup window opens with the OAuth provider's login page +3. **Sign in** with your credentials: + - **BigQuery CollateSSO**: Use your Google Workspace account (same as Collate login) + - **BigQuery ExternalOAuth**: Use your Google account + - **Snowflake OAuth**: Use your Snowflake credentials + - **Trino OAuth**: Use Azure AD or configured OAuth provider +4. **Grant permissions** when prompted (e.g., "Allow access to BigQuery") +5. The popup closes automatically, and you're redirected back to SQL Studio +6. Connection status changes to **Pending** 🟡 while test connection is triggered +7. Once test connection succeeds, status changes to **Connected** 🟢 + +**OAuth Token Expiration**: +- OAuth tokens typically expire after a few hours or days +- When expired, your status changes to **Expired** 🔴 +- Click **Refresh Token** to renew without re-authenticating +- If refresh fails, click **Reconnect** to go through OAuth flow again + +### Basic Authentication + +If your service is configured for Basic Auth: + +### BigQuery Basic Auth + +1. Click **Configure Connection** or **Connect** +2. A modal opens requesting your GCP Service Account credentials +3. You can either: + - **Upload JSON**: Click **Upload Service Account JSON** and select your downloaded `.json` file + - **Manual Entry**: Enter fields individually: + - **Private Key ID**: From `private_key_id` field in JSON + - **Private Key**: From `private_key` field in JSON (full PEM-formatted key) + - **Client Email**: From `client_email` field in JSON + - **Client ID**: From `client_id` field in JSON +4. (Optional) If admin allowed, override **Dataset** +5. Click **Test Connection** to verify credentials +6. Connection status changes to **Pending** 🟡 while test is in progress +7. If test succeeds, click **Save** to complete setup +8. Connection status changes to **Connected** 🟢 + +**Where to Get Credentials**: +- Download service account JSON from [Google Cloud Console](https://console.cloud.google.com/) → **IAM & Admin** → **Service Accounts** +- Store credentials in your password manager (e.g., 1Password) +- Retrieve credentials: `op://vault/BigQuery-ServiceAccount-{yourname}/credentials.json` + +### Snowflake Basic Auth + +1. Click **Configure Connection** or **Connect** +2. A modal opens requesting your Snowflake credentials +3. Enter your **Username** (required) +4. Choose authentication method: + - **Password-based**: Enter **Password** + - **Key-pair based**: Enter **Private Key** and **Passphrase** +5. (Optional) If admin allowed, override connection settings: + - **Warehouse**: Compute warehouse to use (e.g., `COMPUTE_WH`) + - **Database**: Default database (e.g., `ANALYTICS`) + - **Schema**: Default schema (e.g., `PUBLIC`) + - **Role**: Role for access control (e.g., `ANALYST`) +6. Click **Test Connection** to verify credentials +7. Connection status changes to **Pending** 🟡 while test is in progress +8. If test succeeds, click **Save** to complete setup +9. Connection status changes to **Connected** 🟢 + +**Where to Get Credentials**: +- Username and password: Provided by your Snowflake administrator +- Key pair: Generate using `openssl` and upload public key to Snowflake +- Store credentials in your password manager (e.g., 1Password) + +### Test Connection Flow + +After entering credentials or completing OAuth authentication, the **Test Connection** process is automatically triggered: + +1. Connection status changes to **Pending** 🟡 +2. Backend validates your credentials by: + - Attempting to connect to the database service + - Executing a simple test query (e.g., `SELECT 1`) +3. **If successful**: + - Connection status changes to **Connected** 🟢 + - Last connection timestamp is recorded + - You can now execute queries +4. **If failed**: + - Connection status is Pending 🟡 + - Review your credentials and try again + +**Troubleshooting Test Connection Failures**: +- **Invalid credentials**: Double-check username, password, keys, etc. +- **Insufficient permissions**: Verify you have query permissions in the database (e.g., BigQuery `bigquery.jobs.create`, Snowflake `USAGE` on warehouse) +- **Network issues**: Ensure the backend can reach the database service (firewall rules, VPN) +- **Incorrect configuration**: Verify service connection is properly configured in OpenMetadata + +## 4. Using SQL Studio + +Once connected, you can write and execute SQL queries in SQL Studio. For detailed instructions on using SQL Studio, see the [SQL Studio User Guide](https://www.notion.so/SQL-Studio-User-Guide-8a3d1f6a412248faa0782e630ef494dc?pvs=21). + +## 5. Summary + +Query Runner provides a seamless experience for querying database services directly from OpenMetadata Collate: + +1. **Admin** configures Query Runner with authentication method (CollateSSO, OAuth, Basic) +2. **User** authenticates and establishes connection (OAuth flow or credential entry) +3. **User** writes and executes SQL queries in SQL Studio +4. **Results** are displayed with execution time and row counts +5. **User** saves queries for reuse and explores database metadata + +### Quick Reference + +| Task | Steps | +| --- | --- | +| **Connect to service** | SQL Studio → Select service → Connect (OAuth or Basic) | +| **Execute query** | Write query → Click Run (or Cmd/Ctrl+Enter) | +| **Save query** | Click Save Query → Enter name → Save | +| **Load saved query** | Saved Queries sidebar → Click query name | +| **Explore databases** | Database Explorer sidebar → Expand hierarchy | +| **Refresh OAuth token** | Click Refresh Token button when status is Expired | +| **Disconnect** | Settings icon → Delete Connection | + +Enjoy exploring your data with Query Runner!