Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce usage of DataSourceAnalysis in interfaces #17724

Merged
merged 19 commits into from
Feb 19, 2025

Conversation

kgyrtkirk
Copy link
Member

  • change signature to use TableDataSource in TimelineServerView#getTimeline and SegmentManager#getTimeline
  • analysis.getBaseDataSource() throws instead of returning an Optional

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Feb 13, 2025
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, had a couple of questions.

* datasource is a {@link UnionDataSource} of {@link TableDataSource}.
*/
public Optional<TableDataSource> getBaseTableDataSource()
public TableDataSource getBaseTableDataSource()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also add a method hasBaseTableDatasource(), otherwise some callers need to do an instanceof check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the instanceof from Queries ; as I think that was aiming as well on an exception which could be the one this method produces as well.

@@ -182,7 +182,7 @@ public <T> QueryRunner<T> getQueryRunnerForSegments(final Query<T> query, final
final DataSourceAnalysis analysis = dataSourceFromQuery.getAnalysis();

// Sanity check: make sure the query is based on the table we're meant to handle.
if (!analysis.getBaseTableDataSource().filter(ds -> dataSource.equals(ds.getName())).isPresent()) {
if (!analysis.getBaseTableDataSource().getName().equals(dataSource)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!analysis.getBaseTableDataSource().getName().equals(dataSource)) {
if (!analysis.hasBaseTableDataSource() || !analysis.getBaseTableDataSource().getName().equals(dataSource)) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that won't be much different:
I think:

  • earlier the return type for getBaseTableDataSource was an Optional ; if it returned empty - it returned the below error; the new getBaseTableDataSource is not an Optional and has a check inside it; this leaves that case to that.
  • if there were no match to the filter it become empty that way and returned the next error - now this become the equals check

final Task task = runningItem.getTask();
final TableDataSource queryTable = query.getDataSourceAnalysis().getBaseTableDataSource();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code was also checking if getBaseTableDataSource() is present.
If it is absent, the original code was no-op but now we will throw an exception.
Is this okay?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this runner was tasked with running that particular query in this case...I don't think it may back-out silently with a noop if it doesn't like the task... I think that may possibly leading to missing results

The part submitting these tasks should have been able to decide if this a good idea or not...but there are no test failures - so I think it was doing the right thing.

Could you imagine a valid usecase for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I am not sure myself. But I guess if there are no test failures, we should be good.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@github-actions github-actions bot added the GHA label Feb 17, 2025
@kgyrtkirk kgyrtkirk merged commit 11be3a9 into apache:master Feb 19, 2025
75 checks passed
@kgyrtkirk
Copy link
Member Author

Thank you for reviewing the changes @kfaraz!

GWphua pushed a commit to GWphua/druid that referenced this pull request Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying GHA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants