-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a config flag tpcds.use-varchar-type
in Presto - Java TPC-DS Connector
#24406
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! Unfortunately there's a problem in RST formatting that simple tables - the way the table was initially formatted - "the first column cells cannot contain multiple lines".
See https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#tables for details.
As a result, adding a second row causes the table to build looking like this:
In a local doc build I recoded the table as a grid table (formatting in the same link above) and a new doc build looked like this:
If it helps, here's a screenshot of the grid table format I used to build the second screenshot in Visual Studio:
If you redo the table in grid table formatting like I did, it should be fine.
ef3e428
to
f2ef87b
Compare
Thanks @steveburnett for the detailed explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
@aditi-pandit @majetideepak Please take a look, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdabre12 : Thanks for this code. Have a question about testing... Would it be possible to run these tests https://github.com/prestodb/presto/blob/master/presto-tpcds/src/test/java/com/facebook/presto/tpcds/TestTpcds.java with the flag turned on and off to validate the correctness of this work.
f2ef87b
to
7b7e54c
Compare
@aditi-pandit Added the test cases, PTAL. |
New release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR.
I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new doc build, looks good.
The updated release note entry looks good, thank you! |
@@ -64,6 +64,8 @@ public ConnectorHandleResolver getHandleResolver() | |||
public Connector create(String catalogName, Map<String, String> config, ConnectorContext context) | |||
{ | |||
int splitsPerNode = getSplitsPerNode(config); | |||
// cast char columns to varchar only in a native cluster | |||
boolean useVarcharType = context.getConnectorSystemConfig().isNativeExecution() && useVarcharType(config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a different way to think about this.
The Java TPCDS connector could work with either char or varchar. But Native requires useVarcharType. So the code should error if isNativeExecution but useVarcharType is not set. We want to support varcharType with Java engine for compatibility tests with native engine.
Would be good to change useVarcharType(config) with the validation for native execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change, thanks.
// The TPCDS config tpcds.use-varchar-type is set to true only for native execution, since char type is | ||
// currently unsupported in Presto native. This function is only used by the Presto java TPCDS connector | ||
// so the argument useVarcharType is false here. | ||
columnTypes.add(getPrestoType(column.getType(), false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the tpcds.use-varchar-type config. Java should support both options.
7b7e54c
to
1e7a0d6
Compare
1e7a0d6
to
1cf3d33
Compare
Co-authored-by: Pramod Satya <[email protected]>
1cf3d33
to
5c41a45
Compare
@@ -161,10 +161,11 @@ public static DistributedQueryRunner createIcebergQueryRunner( | |||
OptionalInt nodeCount, | |||
Optional<BiFunction<Integer, URI, Process>> externalWorkerLauncher, | |||
Optional<Path> dataDirectory, | |||
boolean addStorageFormatToPath) | |||
boolean addStorageFormatToPath, | |||
Map<String, String> tpcdsProperties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdabre12 : Do you want to do a separate input parameter ? Another option could be to add these in extraConnectorProperties and check them when initializing the plugin.. See how they did for Iceberg catalog : https://github.com/prestodb/presto/blob/master/presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergQueryRunner.java#L211
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do that, but would it make sense to have TPCDS connector specific properties in extraConnectorProperties? It seems that is used to pass down iceberg specific properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, guess we would need to filter properties for individual connectors if we mixed them. Alright, lets keep tpcds separated for now. Anyways we might remove this property as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdabre12 : Since you add this config, we could remove all the varchar casting in the table creations here https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeTpcdsQueries.java#L85.
Can you try that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the java runners in this file are used to validate against native workers, should we be setting tpcds.use-varchar-type for them as well ?
Description
Add a config flag
tpcds.use-varchar-type
in Presto - Java TPC-DS ConnectorMotivation and Context
Resolves: #24362
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.