feat: update BigQueryClient methods #2273

chalmerlowe · 2025-08-28T15:42:31Z

This PR:

Adds several helper functions that enable processing of convenience strings (e.g. project_id.dataset_id
Updates the setters and getters for the *SERVICECLIENT attributes
Updates docstrings and comments across the board
Updates TWO methods associated with the *SERVICECLIENTs to allow discussion about the approach. The methods take in either a request object OR a convenience string as an argument and transmogrify the input so that the underlying service client attribute can be called directly. e.g.:
- get_dataset()
- list_datasets()

NOTES

Getting the approach to these service clients and methods is important because the microgenerator will be updated to create seven clients and 30+ methods.
The GitHub Actions are being used to help ensure that unit tests pass.
The KOKORO tests are failing. This is a known problem and will be dealt with in a separate PR. It should not affect merging into the autogen dev branch.

tests/unit/gapic/bigquery_v2/test_centralized_service.py

tswast · 2025-09-03T14:18:55Z

google/cloud/bigquery_v2/services/centralized_service/_helpers.py

+    if user_request is not None and identifier_value is not None:
+        raise ValueError(
+            f"Provide either a request object or '{identifier_name}', not both."
+        )


IIRC, there are cases where we merge these two in the existing hand-written client. For example, load jobs can take a string as a destination but merge in the job config object to the final request:

python-bigquery/google/cloud/bigquery/client.py

Lines 2577 to 2590 in ef2740a

def load_table_from_file(

self,

file_obj: IO[bytes],

destination: Union[Table, TableReference, TableListItem, str],

rewind: bool = False,

size: Optional[int] = None,

num_retries: int = _DEFAULT_NUM_RETRIES,

job_id: Optional[str] = None,

job_id_prefix: Optional[str] = None,

location: Optional[str] = None,

project: Optional[str] = None,

job_config: Optional[LoadJobConfig] = None,

timeout: ResumableTimeoutType = DEFAULT_TIMEOUT,

) -> job.LoadJob:

I actually haven't thought too much about how the non-query jobs fit into this design, though. I suppose the user needs to specify more than just an identifier for all of the job types, so this method wouldn't apply?

Note: In addition to query jobs, load jobs using jobs.insert REST API will need a bit of handwritten magic to support load from local data via "resumable media uploads" (https://cloud.google.com/bigquery/docs/reference/api-uploads). I imagine we're planning on providing a separate hand-written helper for this, similar to queries? Actually do we know if the GAPICs even support the resumable upload API? AFAIK, it's only used in BigQuery, Cloud Storage, and Google Drive APIs. CC @parthea

How to handle the Query experience is being designed by someone else and is not fully fleshed out for the python libraries.

I would like to defer this as out of scope for the alpha release.

tswast · 2025-09-03T14:20:37Z

google/cloud/bigquery_v2/services/centralized_service/client.py

+DatasetIdentifier = Union[str, dataset_reference.DatasetReference]

+# TODO: This variable is here to simplify prototyping, etc.
+PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")


Note: When using google.auth.default(), it will read this variable for you.

https://github.com/googleapis/google-auth-library-python/blob/c30a6a781d3e385598a0ac28a370a7f4800010cc/google/auth/environment_vars.py#L18

That upgrade is handled in this separate PR.

tswast · 2025-09-03T14:21:43Z

google/cloud/bigquery_v2/services/centralized_service/client.py

+        self._clients: Dict[str, object] = {}
        self._credentials = credentials
        self._client_options = client_options
+        self.project = PROJECT_ID


Might want to upgrade this to a @property so that it can be included in public documentation (and also be suggested to be read-only).

That upgrade is handled in this separate PR.

tswast · 2025-09-03T14:22:32Z

google/cloud/bigquery_v2/services/centralized_service/client.py

+        """
+
+        self._clients: Dict[str, object] = {}
        self._credentials = credentials


We'll want to call credentials, default_project_id = google.auth.default() (https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#google.auth.default) if the credentials aren't set so that we don't repeat the auth flow for each sub-client construction.

That upgrade is handled in this separate PR.

tswast · 2025-09-03T14:23:44Z

google/cloud/bigquery_v2/services/centralized_service/client.py

+            client_options:
+                A dictionary of client options to pass to the underlying
+                service clients.


The type above also allows for a single client_options.

For the alpha, I may leave out the client_options. It will depend on schedule. If we keep it, I will ensure that this nuance is captured in the code.

tswast · 2025-09-03T14:27:37Z

google/cloud/bigquery_v2/services/centralized_service/client.py

    def __init__(
        self,
        *,
        credentials: Optional[auth_credentials.Credentials] = None,


In the past, project is a pretty common input to the client constructor. This is especially important in environments like https://colab.research.google.com that don't have a project associated with the environment.

Likewise, client_info is another important one. That's how applications can amend the user-agent, which is very important for BQ to track usage attributable to open source "connectors" as well as partner company usage. I suppose we could guide such users to use the raw clients then, but if so maybe client_options falls in the same bucket of "advanced" features and should be excluded?

Speaking of commonly used arguments: Fun fact, the Kaggle team was (is?) using _http to provide a custom proxy to BigQuery for their special Kaggle free tier that included BQML, unlike the BQ Sandbox. Might be worth reaching out to them to come up with an alternative if they're still doing that. See: https://www.kaggle.com/discussions/general/50159 Note that 5 TB is 5x more than the BQ Sandbox 1 TB. That post was from 8 years ago, though, so I don't know if that still applies.

Thank you for these insights into what customers commonly pass into the client constructor.
I will see what I can to do include project into the client constructor.
I may reserve client_info til the beta or release candidates due to schedule.

For those who are reading, this may provide some good context:

Right now, if you ran the GAPIC generated code and wanted to get_dataset(), this is what that would look like:

# Create a client from google.cloud.bigquery_v2 import bigquery_v2 client = bigquery_v2.DatasetServiceClient() # default DatasetServiceClient # Initialize request argument(s) request = bigquery_v2.GetDatasetRequest( project_id="project_id_value", dataset_id="dataset_id_value", ) # Make the request # NOTE: due to the protobuf definition, there is no way to # provide a "project_id_value.dataset_id_value" string to this method. response = client.get_dataset(request=request)

As part of this alpha, we are trying to enable one basic transmogrification: allow a user to continue to be able to supply a "project_id_value.dataset_id_value" string to the method (if this proves useful and universally generatable, other convenience transformers will follow).

This is done by injecting several helper functions that can invisibly accept the string and create a *Request object for the user.

# Create a client from google.cloud.bigquery_v2 import bigquery_v2 bqclient = bigquery_v2.BigQueryClient() # my new and improved hotness # Initialize the string value dataset_id = "project_id_value.dataset_id_value" # Make the request # Inside the centralized_client version of get_dataset, it accepts # a "project_id_value.dataset_id_value" string and the helper functions # break it apart and create a bigquery_v2.GetDatasetRequest that is passed # to the dataset_service_client's version of get_dataset. # This happens invisibly to the user. response = bqclient.get_dataset(dataset_id)

Including project is handled in this separate PR.

For the moment, I have not made any attempts to also process client_info.

google/cloud/bigquery_v2/services/centralized_service/client.py

tswast · 2025-09-03T14:55:02Z

google/cloud/bigquery_v2/services/centralized_service/client.py

+        """
+        if isinstance(dataset_id, str):
+            project_id, dataset_id_str = self._parse_dataset_path(dataset_id)
+            return {"project_id": project_id, "dataset_id": dataset_id_str}


Curious to see snake_case here. IIRC the REST API mostly uses camel-case, but per https://protobuf.dev/programming-guides/json/ both should be acceptable.

This is not information that is sent directly to the API.
It's a dictionary shared between helper functions.
As noted above, we send a *Request object per the DatasetServiceClient.get_dataset() method signature. The helper functions accept a string and invisibly create a *Request object for the user.

I will include a note to this effect in the docstring of the helper funcs.

tswast · 2025-09-03T14:55:52Z

google/cloud/bigquery_v2/services/centralized_service/client.py

+        if isinstance(dataset_id, str):
+            project_id, dataset_id_str = self._parse_dataset_path(dataset_id)
+            return {"project_id": project_id, "dataset_id": dataset_id_str}
+        elif isinstance(dataset_id, dataset_reference.DatasetReference):


In this case, can't we use the https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.MessageToDict or the proto-plus equivalent?

This is not intended to be sent to the API. It is internal use only.

tswast · 2025-09-03T14:58:08Z

google/cloud/bigquery_v2/services/centralized_service/client.py

+
+    def _parse_dataset_id_to_dict(self, dataset_id: DatasetIdentifier) -> dict:
+        """
+        Helper to create a request dictionary from a project_id and dataset_id.


By request dictionary, do we mean the canonical JSON representation of the protobuf? https://protobuf.dev/programming-guides/json/

No. Internal use only. Passed between helper functions, will update the docstrings.

Co-authored-by: Tim Sweña (Swast) <[email protected]>

Linchin · 2025-09-05T19:02:33Z

google/cloud/bigquery_v2/services/centralized_service/client.py

+            A tuple of (project_id, dataset_id).
+        """
+        if "." in dataset_path:
+            # Use rsplit to handle legacy paths like `google.com:my-project.my_dataset`.


For my education, is google.com:my-project.my_dataset supposed to yield ['google.com:my-project', 'my_dataset'] or ['my-project', 'my_dataset']?

Linchin · 2025-09-05T19:07:13Z

google/cloud/bigquery_v2/services/centralized_service/client.py

    def list_datasets(
        self,
-        request: Optional[Union[dataset.ListDatasetsRequest, dict]] = None,
+        project_id: Optional[str] = None,


Why is project_id before the star sign and request after? Is there any special consideration about order here?

Linchin · 2025-09-05T19:12:21Z

tests/unit/gapic/bigquery_v2/test_centralized_service.py

Perhaps we can separate the changes in this file related to test setup to a new PR, and use this PR just for things related to the CRUD methods for datasets.

chalmerlowe · 2025-09-11T15:12:39Z

Overtaken by events. Superceded by another PR.

chalmerlowe · 2025-09-11T18:37:04Z

This PR is overtaken by events. Other PRs are going to be used to include this content. Closing.

chalmerlowe added 9 commits August 27, 2025 10:07

add copperpenny client. tests to follow in a separate PR

9250e7b

removes old copperpenny client

e14f965

blackens and lints

9c71e52

removes obsolete test file. new test file to be added in future PR

f9a0953

test: adds test file

cdc4b52

Merge branch 'autogen' into feat-copperpenny-add-test-file

548d7c6

add copperpenny client. tests to follow in a separate PR

6d188e1

blackens and lints

6c9aa08

Updates TODO comment

48ed583

chalmerlowe requested review from a team as code owners August 28, 2025 15:42

chalmerlowe requested review from leahecole and removed request for a team August 28, 2025 15:42

product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery API. labels Aug 28, 2025

blunderbuss-gcf bot assigned agrawal-siddharth Aug 28, 2025

agrawal-siddharth assigned shollyman and unassigned agrawal-siddharth Aug 28, 2025

chalmerlowe added 4 commits September 2, 2025 16:50

Adds unit tests, updates helpers, revises several client attrs

f1d8291

Merge branch 'autogen' into feat-copperpenny-update-client-methods

0833f43

Updates docstring for _make_request

0fb94ea

Removes letter accidentlly added to licence notice

2209257

chalmerlowe commented Sep 3, 2025

View reviewed changes

tests/unit/gapic/bigquery_v2/test_centralized_service.py Show resolved Hide resolved

chalmerlowe added 2 commits September 3, 2025 07:26

Update tests/unit/gapic/bigquery_v2/test_centralized_service.py

32cd996

Updates docstrings, adds Here There Be Dragons

002fbdb

chalmerlowe requested review from tswast and Linchin September 3, 2025 12:00

chalmerlowe assigned chalmerlowe and unassigned shollyman Sep 3, 2025

tswast reviewed Sep 3, 2025

View reviewed changes

chalmerlowe mentioned this pull request Sep 4, 2025

Feat: adds credentials and project properties to BigQueryClient #2276

Closed

chalmerlowe and others added 2 commits September 4, 2025 08:44

Update google/cloud/bigquery_v2/services/centralized_service/client.py

b295d74

Co-authored-by: Tim Sweña (Swast) <[email protected]>

Updates docstring to explain purpose of helper dict

8e55738

Linchin reviewed Sep 5, 2025

View reviewed changes

chalmerlowe closed this Sep 11, 2025

	def load_table_from_file(
	self,
	file_obj: IO[bytes],
	destination: Union[Table, TableReference, TableListItem, str],
	rewind: bool = False,
	size: Optional[int] = None,
	num_retries: int = _DEFAULT_NUM_RETRIES,
	job_id: Optional[str] = None,
	job_id_prefix: Optional[str] = None,
	location: Optional[str] = None,
	project: Optional[str] = None,
	job_config: Optional[LoadJobConfig] = None,
	timeout: ResumableTimeoutType = DEFAULT_TIMEOUT,
	) -> job.LoadJob:

feat: update BigQueryClient methods #2273

feat: update BigQueryClient methods #2273

Uh oh!

Conversation

chalmerlowe commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NOTES

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chalmerlowe Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chalmerlowe commented Sep 11, 2025

Uh oh!

chalmerlowe commented Sep 11, 2025

Uh oh!

Uh oh!

chalmerlowe commented Aug 28, 2025 •

edited

Loading

chalmerlowe Sep 3, 2025 •

edited

Loading

chalmerlowe Sep 3, 2025 •

edited

Loading

chalmerlowe Sep 3, 2025 •

edited

Loading

chalmerlowe Sep 3, 2025 •

edited

Loading

chalmerlowe Sep 3, 2025 •

edited

Loading