Skip to content

feat: capture BigQuery query stats for mktplace metering#12463

Closed
gabrielbressan-tfy wants to merge 211 commits into
mindsdb:mainfrom
Talentify:feat/bigquery-usage-metering
Closed

feat: capture BigQuery query stats for mktplace metering#12463
gabrielbressan-tfy wants to merge 211 commits into
mindsdb:mainfrom
Talentify:feat/bigquery-usage-metering

Conversation

@gabrielbressan-tfy

Copy link
Copy Markdown

Summary

  • Adds query_stats_registry.py: thread-safe in-process registry accumulating QueryJob stats (total_bytes_billed, cache_hit, project_id) keyed by a mktplace_query_id correlation id passed via ctx.params
  • Modifies BigQueryHandler.native_query() to capture stats into the registry after each query execution (via new _record_query_stats helper)
  • Adds GET /api/sql/query_stats/<query_id> endpoint that pops and returns the stats JSON; returns {} for unknown ids

Why

mktplace needs to meter actual BigQuery bytes scanned for public datasources (Talentify pays GCP on-demand). The QueryJob.total_bytes_billed was previously discarded immediately. This PR propagates it back to the mktplace caller without touching the result DataFrame pipeline or DuckDB path.

Depends on

Requires the companion PR in Talentify/mktplace (feat/bigquery-usage-metering) which generates the mktplace_query_id, calls query_with_id, and records metering. Deploy this PR first (mindsdb hot-reloads on file change).

Test plan

  • Registry unit: accumulate / pop / TTL eviction
  • Handler: mock QueryJob with total_bytes_billed=1234567, cache_hit=False; verify registry entry; mock cache_hit=True; verify total_bytes_billed=0
  • Live (mindsdb SDK): POST /api/sql/query with params={"mktplace_query_id":"<uuid>"}; GET /api/sql/query_stats/<uuid> → non-zero bytes; repeat (cache hit) → bytes=0
  • Non-BQ query: GET /api/sql/query_stats/<unused-uuid>{}

🤖 Generated with Claude Code

gabrielsntr and others added 30 commits September 9, 2025 10:37
Alterado para aceitar usar a role pelo ambiente do Kubernetes (serviceaccout)
…agement and improved credential configuration
…w features

- Added support for automatic token refresh and delta query for incremental sync in MSGraphAPIOneDriveClient.
- Implemented large file handling with chunked downloads.
- Introduced new methods for fetching user and drive information.
- Enhanced MSOneDriveHandler to support both modern token injection and legacy code-based authentication.
- Added connection metadata and delta state management with new tables for better tracking and diagnostics.
- Improved error handling and logging throughout the integration.
- Extend supported file formats in S3Handler to include txt, pdf, md, doc, and docx.
- Implement reading capabilities for markdown and Word documents in FileReader.
- Update requirements to include python-docx for handling DOC and DOCX files.
- Introduced `get_metadata_limits` method in `VectorStoreHandler` and `S3VectorsHandler` to retrieve metadata constraints.
- Implemented detection and storage of vector database metadata limits in `KnowledgeBaseTable`.
- Updated `DocumentPreprocessor` to accommodate metadata limits and control timestamp inclusion based on limits.
- Enhanced `BasePreprocessingConfig` to include fields for maximum metadata keys and timestamp inclusion settings.
gabrielbressan-tfy and others added 26 commits April 7, 2026 03:54
…e-ads-integration

Feature/sc 76870/ mindsdb google ads integration
…onector-do-shopify

Feature/sc 77168/melhorias no conector do shopify
…r-google-ads

keywords planner to google ads
* Add Meta Ad Library handler

* fix: corrigir assert, NOT_BETWEEN e coerce_to_list no handler Meta Ad Library

- Substitui `assert self.session is not None` por RuntimeError explícito em
  `_request()` — assert é eliminado com flag -O do Python, deixando um
  AttributeError opaco no lugar de uma mensagem legível
- Remove NOT_BETWEEN do check inicial de `_extract_delivery_date_bounds()`:
  o operador não tem mapeamento para os parâmetros da API da Meta e nunca
  marcava `condition.applied = True`, comportamento correto mas acidental;
  agora cai no branch else, retorna None no _coerce_date e é ignorado
  explicitamente, ficando para filtragem local do MindsDB
- Colapsa os branches idênticos de list/tuple em `_coerce_to_list()` em um
  único isinstance(value, (list, tuple))
- Adiciona test_check_connection_fails_when_no_search_scope: cobre o caminho
  de 400 da API da Meta quando nenhum search_page_ids nem search_terms é
  fornecido na connection_data

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `method` (GET/POST) and `body` (JSON dict) connection args to the
multi_format_api handler. POST requests skip the HEAD pre-check and send
the body as JSON. Body is shallow-merged from connection and query levels,
with query-level keys taking precedence.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…-to-accept-post

feat: add POST method and body support to multi_format_api_handler
fix(meta-ad-library): default reached countries to US
API handlers consume WHERE params (url, start_date, etc.) to call
external APIs and mark them applied=True. When a consumed param name
collides with a column in the API response (e.g. url), SubSelectStep's
DuckDB re-evaluates the condition against the response value, producing
false negatives (0 rows).

Propagate applied column names via DataFrame.attrs from
APIResource.select() to SubSelectStepCall, which strips only those
conditions from WHERE. Non-consumed conditions are preserved for
double-filtering safety.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoid reapplying WHERE in API integration subselects
Adds a thread-safe in-process registry that accumulates QueryJob stats
(total_bytes_billed, cache_hit, project_id) keyed by a caller-supplied
mktplace_query_id passed via ctx.params. The BigQuery handler populates
the registry after each native_query execution; a new GET endpoint
/api/sql/query_stats/<query_id> pops and returns the stats so mktplace
can record usage metering.

Registry auto-evicts TTL-expired entries (5 min) and caps at 10k to
prevent memory leaks from abandoned query ids.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@entelligence-ai-pr-reviews

Copy link
Copy Markdown
Contributor

Automatic Review Skipped

Too many files for automatic review.

If you would still like a review, you can trigger one manually by commenting:

@entelligence review

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 4 committers have signed the CLA.
@gabrielsntr
@gabrielbressan-tfy
@CarvalhoRod
@CarvalhoRodrigo
CarvalhoRodrigo seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@github-actions github-actions Bot locked and limited conversation to collaborators Jun 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants