feat: capture BigQuery query stats for mktplace metering#12463
feat: capture BigQuery query stats for mktplace metering#12463gabrielbressan-tfy wants to merge 211 commits into
Conversation
Alterado para aceitar usar a role pelo ambiente do Kubernetes (serviceaccout)
…onal Fix s3_handler.py
…-optional Fix duck aws key optional
…agement and improved credential configuration
…w features - Added support for automatic token refresh and delta query for incremental sync in MSGraphAPIOneDriveClient. - Implemented large file handling with chunked downloads. - Introduced new methods for fetching user and drive information. - Enhanced MSOneDriveHandler to support both modern token injection and legacy code-based authentication. - Added connection metadata and delta state management with new tables for better tracking and diagnostics. - Improved error handling and logging throughout the integration.
…and example usage
- Extend supported file formats in S3Handler to include txt, pdf, md, doc, and docx. - Implement reading capabilities for markdown and Word documents in FileReader. - Update requirements to include python-docx for handling DOC and DOCX files.
- Introduced `get_metadata_limits` method in `VectorStoreHandler` and `S3VectorsHandler` to retrieve metadata constraints. - Implemented detection and storage of vector database metadata limits in `KnowledgeBaseTable`. - Updated `DocumentPreprocessor` to accommodate metadata limits and control timestamp inclusion based on limits. - Enhanced `BasePreprocessingConfig` to include fields for maximum metadata keys and timestamp inclusion settings.
…e-ads-integration Feature/sc 76870/ mindsdb google ads integration
…onector-do-shopify Feature/sc 77168/melhorias no conector do shopify
…r-google-ads keywords planner to google ads
* Add Meta Ad Library handler * fix: corrigir assert, NOT_BETWEEN e coerce_to_list no handler Meta Ad Library - Substitui `assert self.session is not None` por RuntimeError explícito em `_request()` — assert é eliminado com flag -O do Python, deixando um AttributeError opaco no lugar de uma mensagem legível - Remove NOT_BETWEEN do check inicial de `_extract_delivery_date_bounds()`: o operador não tem mapeamento para os parâmetros da API da Meta e nunca marcava `condition.applied = True`, comportamento correto mas acidental; agora cai no branch else, retorna None no _coerce_date e é ignorado explicitamente, ficando para filtragem local do MindsDB - Colapsa os branches idênticos de list/tuple em `_coerce_to_list()` em um único isinstance(value, (list, tuple)) - Adiciona test_check_connection_fails_when_no_search_scope: cobre o caminho de 400 da API da Meta quando nenhum search_page_ids nem search_terms é fornecido na connection_data Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `method` (GET/POST) and `body` (JSON dict) connection args to the multi_format_api handler. POST requests skip the HEAD pre-check and send the body as JSON. Body is shallow-merged from connection and query levels, with query-level keys taking precedence. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…-to-accept-post feat: add POST method and body support to multi_format_api_handler
fix(meta-ad-library): default reached countries to US
API handlers consume WHERE params (url, start_date, etc.) to call external APIs and mark them applied=True. When a consumed param name collides with a column in the API response (e.g. url), SubSelectStep's DuckDB re-evaluates the condition against the response value, producing false negatives (0 rows). Propagate applied column names via DataFrame.attrs from APIResource.select() to SubSelectStepCall, which strips only those conditions from WHERE. Non-consumed conditions are preserved for double-filtering safety. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoid reapplying WHERE in API integration subselects
feat: improve s3 handler file listing
Adds a thread-safe in-process registry that accumulates QueryJob stats (total_bytes_billed, cache_hit, project_id) keyed by a caller-supplied mktplace_query_id passed via ctx.params. The BigQuery handler populates the registry after each native_query execution; a new GET endpoint /api/sql/query_stats/<query_id> pops and returns the stats so mktplace can record usage metering. Registry auto-evicts TTL-expired entries (5 min) and caps at 10k to prevent memory leaks from abandoned query ids. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Automatic Review Skipped Too many files for automatic review. If you would still like a review, you can trigger one manually by commenting: |
|
I have read the CLA Document and I hereby sign the CLA 0 out of 4 committers have signed the CLA. |
Summary
query_stats_registry.py: thread-safe in-process registry accumulatingQueryJobstats (total_bytes_billed,cache_hit,project_id) keyed by amktplace_query_idcorrelation id passed viactx.paramsBigQueryHandler.native_query()to capture stats into the registry after each query execution (via new_record_query_statshelper)GET /api/sql/query_stats/<query_id>endpoint that pops and returns the stats JSON; returns{}for unknown idsWhy
mktplace needs to meter actual BigQuery bytes scanned for public datasources (Talentify pays GCP on-demand). The
QueryJob.total_bytes_billedwas previously discarded immediately. This PR propagates it back to the mktplace caller without touching the result DataFrame pipeline or DuckDB path.Depends on
Requires the companion PR in
Talentify/mktplace(feat/bigquery-usage-metering) which generates themktplace_query_id, callsquery_with_id, and records metering. Deploy this PR first (mindsdb hot-reloads on file change).Test plan
accumulate/pop/ TTL evictionQueryJobwithtotal_bytes_billed=1234567,cache_hit=False; verify registry entry; mockcache_hit=True; verifytotal_bytes_billed=0/api/sql/querywithparams={"mktplace_query_id":"<uuid>"}; GET/api/sql/query_stats/<uuid>→ non-zero bytes; repeat (cache hit) →bytes=0/api/sql/query_stats/<unused-uuid>→{}🤖 Generated with Claude Code