Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a config flag tpcds.use-varchar-type in Presto - Java TPC-DS Connector #24406

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pdabre12
Copy link
Contributor

@pdabre12 pdabre12 commented Jan 21, 2025

Description

Add a config flag tpcds.use-varchar-type in Presto - Java TPC-DS Connector

Motivation and Context

Resolves: #24362

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

TPC-DS Connector Changes
* Add config property ``tpcds.use-varchar-type`` to allow toggling of char columns to varchar columns.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 21, 2025
@prestodb-ci prestodb-ci requested review from a team, czentgr and psnv03 and removed request for a team January 21, 2025 17:11
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! Unfortunately there's a problem in RST formatting that simple tables - the way the table was initially formatted - "the first column cells cannot contain multiple lines".

See https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#tables for details.

As a result, adding a second row causes the table to build looking like this:
Screenshot 2025-01-21 at 12 41 20 PM

In a local doc build I recoded the table as a grid table (formatting in the same link above) and a new doc build looked like this:
Screenshot 2025-01-21 at 12 49 05 PM

If it helps, here's a screenshot of the grid table format I used to build the second screenshot in Visual Studio:
Screenshot 2025-01-21 at 12 49 10 PM

If you redo the table in grid table formatting like I did, it should be fine.

@pdabre12 pdabre12 force-pushed the tpcds-use-varchar-config branch 2 times, most recently from ef3e428 to f2ef87b Compare January 21, 2025 19:27
@pdabre12
Copy link
Contributor Author

Thanks @steveburnett for the detailed explanation.
I changed the table to be similar to other connectors.rst files eg: tpch.rst. It renders correctly locally for me.
Please take a look.

@pdabre12 pdabre12 requested a review from steveburnett January 21, 2025 19:33
steveburnett
steveburnett previously approved these changes Jan 21, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

@pdabre12 pdabre12 marked this pull request as ready for review January 21, 2025 19:58
@pdabre12 pdabre12 requested review from elharo and a team as code owners January 21, 2025 19:58
@pdabre12 pdabre12 requested a review from presto-oss January 21, 2025 19:58
@pdabre12
Copy link
Contributor Author

@aditi-pandit @majetideepak Please take a look, thanks.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12 : Thanks for this code. Have a question about testing... Would it be possible to run these tests https://github.com/prestodb/presto/blob/master/presto-tpcds/src/test/java/com/facebook/presto/tpcds/TestTpcds.java with the flag turned on and off to validate the correctness of this work.

@pdabre12
Copy link
Contributor Author

@aditi-pandit Added the test cases, PTAL.
Thanks.

@pdabre12 pdabre12 requested a review from aditi-pandit January 31, 2025 23:31
@steveburnett
Copy link
Contributor

New release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR.

:pr:`12345`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

@pdabre12 pdabre12 requested a review from steveburnett February 3, 2025 17:15
steveburnett
steveburnett previously approved these changes Feb 3, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new doc build, looks good.

@steveburnett
Copy link
Contributor

The updated release note entry looks good, thank you!

@@ -64,6 +64,8 @@ public ConnectorHandleResolver getHandleResolver()
public Connector create(String catalogName, Map<String, String> config, ConnectorContext context)
{
int splitsPerNode = getSplitsPerNode(config);
// cast char columns to varchar only in a native cluster
boolean useVarcharType = context.getConnectorSystemConfig().isNativeExecution() && useVarcharType(config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a different way to think about this.

The Java TPCDS connector could work with either char or varchar. But Native requires useVarcharType. So the code should error if isNativeExecution but useVarcharType is not set. We want to support varcharType with Java engine for compatibility tests with native engine.

Would be good to change useVarcharType(config) with the validation for native execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change, thanks.

// The TPCDS config tpcds.use-varchar-type is set to true only for native execution, since char type is
// currently unsupported in Presto native. This function is only used by the Presto java TPCDS connector
// so the argument useVarcharType is false here.
columnTypes.add(getPrestoType(column.getType(), false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the tpcds.use-varchar-type config. Java should support both options.

@pdabre12 pdabre12 requested a review from a team as a code owner February 12, 2025 22:08
@@ -161,10 +161,11 @@ public static DistributedQueryRunner createIcebergQueryRunner(
OptionalInt nodeCount,
Optional<BiFunction<Integer, URI, Process>> externalWorkerLauncher,
Optional<Path> dataDirectory,
boolean addStorageFormatToPath)
boolean addStorageFormatToPath,
Map<String, String> tpcdsProperties)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12 : Do you want to do a separate input parameter ? Another option could be to add these in extraConnectorProperties and check them when initializing the plugin.. See how they did for Iceberg catalog : https://github.com/prestodb/presto/blob/master/presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergQueryRunner.java#L211

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do that, but would it make sense to have TPCDS connector specific properties in extraConnectorProperties? It seems that is used to pass down iceberg specific properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, guess we would need to filter properties for individual connectors if we mixed them. Alright, lets keep tpcds separated for now. Anyways we might remove this property as well.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdabre12 : Since you add this config, we could remove all the varchar casting in the table creations here https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeTpcdsQueries.java#L85.

Can you try that ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the java runners in this file are used to validate against native workers, should we be setting tpcds.use-varchar-type for them as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a config flag to use varchar type in Presto - Java TPC-DS Connector
4 participants