-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed request to tables with Object Storage Engines #615
base: antalya
Are you sure you want to change the base?
Distributed request to tables with Object Storage Engines #615
Conversation
This is an automated comment for commit 293ac83 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a first glance, it looks good. This is the first round of review, it includes just a few questions and minor suggestions.
Once these are addressed, I'll dive a bit deeper in the AST creation logic and it should be good to go :).
@@ -247,6 +247,10 @@ class StorageObjectStorage::Configuration | |||
|
|||
virtual void update(ObjectStoragePtr object_storage, ContextPtr local_context); | |||
|
|||
virtual void setFunctionArgs(ASTs & /* args */) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this function needs to be either renamed or documented. Usually, when I read a function named void setSomething(Argument argument)
, I assume the members of the current instance will be modified.
This function is doing the opposite, it is taking the members of the existing instance and populating the argumennt.
Maybe you could construct the arguments object and return it?
ASTPtr getTableFunctionArguments()
{
auto arguments = std::make_shared<ASTExpressionList>();
// populate it
return arguments;
}
and then, upon call site:
auto function_arguments = configuration->getTableFunctionArguments();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, couple more changes or discussions and it is good to be merged
@@ -153,7 +160,7 @@ void StorageObjectStorageCluster::updateQueryForDistributedEngineIfNeeded(ASTPtr | |||
queryToString(query)); | |||
} | |||
|
|||
configuration->setFunctionArgs(arguments->children); | |||
configuration->getTableFunctionArguments(arguments->children); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #615 (comment), I suggested the arguments to be created inside getTableFunctionArguments
instead of populating a parameter. Any specific reason you did it this way?
If there is one, I am fine with this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to avoid copying after function call. children
is a vector, not a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow.
On line 155, you do: auto arguments = std::make_shared<ASTExpressionList>();
Then you are passing children as a reference on 167 and populating it inside the getTableFunctionArguments
function.
On line 169, you read the values.
What I am suggesting is that you create the arguments list inside the getTableFunctionArguments
function, populate and simply return it. It'll be a shared_ptr, there is no deep copy in this process. Then, the rest is the same.
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes not in arguments
but in arguments->children
children
has type ASTs
https://github.com/ClickHouse/ClickHouse/blob/master/src/Parsers/IAST.h#L37
It's a variant of vector, not a single pointer
https://github.com/ClickHouse/ClickHouse/blob/master/src/Parsers/IAST_fwd.h#L14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on slack, let's make the function return it as suggested in the original comment
void StorageAzureConfiguration::getTableFunctionArguments(ASTs & args) const | ||
{ | ||
if (!args.empty()) | ||
{ /// Just check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just remove this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if you implement this function in the way I suggested in #615 (comment), perhaps this whole if statement is no longer necessary?
args.push_back(std::make_shared<ASTLiteral>(auth_settings[S3AuthSetting::session_token].value)); | ||
if (format != "auto") | ||
args.push_back(std::make_shared<ASTLiteral>(format)); | ||
if (!compression_method.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok not to add some other arguments like structure
and headers
as the docs suggest? https://clickhouse.com/docs/en/sql-reference/table-functions/s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this fields are already added in addStructureAndFormatToArgsIfNeeded
methods.
https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/ObjectStorage/S3/Configuration.cpp#L399
These field required for table functions too, when access parameters need to add only for table engine case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: perhaps add a comment saying this is already added somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I rename method to addPathAndAccessKeysToArgs
, to consistency with addStructureAndFormatToArgsIfNeeded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted two more comments which are not mandatory, although I am curious about #615 (comment).
In any case, it can be merged as it is. LGTM
c7b1ad3
to
3fafe6f
Compare
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Distributed object storage table engines
Documentation entry for user-facing changes
Before ClickHouse can made distributed table function requests with option
object_storage_cluster
.This PR expand this to table engined.
Modify your CI run:
NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step
Include tests (required builds will be added automatically):
Exclude tests:
Extra options:
Only specified batches in multi-batch jobs: