Skip to content

Conversation

@mdayakar
Copy link
Contributor

@mdayakar mdayakar commented Sep 4, 2025

HIVE-29181: SELECT query on VIEW with IS NOT NULL operator producing unexpected result

What changes were proposed in this pull request?

Here when the data is selected from a view with (([1008753865](tel:1008753865))/(0)) IS NOT NULL filter condition which actually results to FALSE should not return any rows but actually it us returning rows, where as with same filter condition selecting data from a table working fine, not giving any rows as a result.

This is due to HiveFilterProjectTransposeRule rule which is the first rule in the list is getting applied for view query and finally the filter condition is getting removed so it is returning the rows where as for table query ReduceExpressionsRule.FilterReduceExpressionsRule is getting applied and the plan is getting changed to HiveValues(tuples=[[]]) so no rows are getting fetched.

So to fix the issue ReduceExpressionsRule.FilterReduceExpressionsRule can be added before HiveFilterProjectTransposeRule so that ReduceExpressionsRule.FilterReduceExpressionsRule will get applied for view query also and gives proper output.

Why are the changes needed?

To fix the issue mentioned in HIVE-29181

Does this PR introduce any user-facing change?

No

How was this patch tested?

Using q file tests
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=view_with_where_exp.q -pl itests/qtest -Pitests

@github-actions
Copy link

github-actions bot commented Sep 15, 2025

@check-spelling-bot Report

🔴 Please review

See the files view or the action log for details.

Unrecognized words (3)

bucketedtables
languagemanual
teradatabinaryserde

Previously acknowledged words that are now absent aarry bytecode cwiki HIVEFETCHOUTPUTSERDE timestamplocal yyyy
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:mdayakar/hive.git repository
on the HIVE-29181_SelectViewIssue branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
-H "Content-Type: application/json" \
"https://api.github.com/repos/apache/hive/issues/comments/3291011158" > "$comment_json"
comment_body=$(mktemp)
jq -r ".body // empty" "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")

patch_add=$(perl -e '$/=undef; $_=<>; if (m{Unrecognized words[^<]*</summary>\n*```\n*([^<]*)```\n*</details>$}m) { print "$1" } elsif (m{Unrecognized words[^<]*\n\n((?:\w.*\n)+)\n}m) { print "$1" };' < "$comment_body")

update_files
rm $comment_body
git add -u
If the flagged items do not appear to be text

If items relate to a ...

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

  • binary file.

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

@sonarqubecloud
Copy link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify the test case, please? E.g., by removing unnecessary columns, using simpler constants. The following test case reproduces the issue:

CREATE TABLE IF NOT EXISTS t0(col DOUBLE);

CREATE VIEW v0 AS (SELECT ALL (t0.col) AS col FROM t0);

INSERT INTO t0(col) VALUES(0.1);

explain cbo SELECT t0.col from t0 WHERE ((1000)/(0)) IS NOT NULL;
SELECT t0.col from t0 WHERE ((1000)/(0)) IS NOT NULL;

explain cbo SELECT v0.col FROM v0 WHERE ((1000)/(0)) IS NOT NULL;
SELECT v0.col FROM v0 WHERE ((1000)/(0)) IS NOT NULL;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also curious to know if the datatype (i.e., DOUBLE) is important for reproducing the problem. IF not then probably I would pick something more straightforward like an INT or STRING.

Reducer 3 llap
File Output Operator [FS_16]
Select Operator [SEL_15] (rows=24 width=100)
Select Operator [SEL_15] (rows=21 width=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the row count are just estimations, not the real row count of the executed query?

@zabetak
Copy link
Member

zabetak commented Sep 17, 2025

The description mentions that the filter condition is removed at some point. Who does the removal and why? It feels wrong to remove a filter that is always false so I want to understand a bit better what happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also curious to know if the datatype (i.e., DOUBLE) is important for reproducing the problem. IF not then probably I would pick something more straightforward like an INT or STRING.

TableScan
alias: src
filterExpr: (((value < 'val_50') or (key > '2')) and (((key > '20') and (key < '4')) or ((key > '4') and (key < '400')))) (type: boolean)
filterExpr: (((value < 'val_50') or key is not null) and (((key > '20') and (key < '4')) or ((key > '4') and (key < '400')))) (type: boolean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the simplification of key > '2' to key is not null valid?

Comment on lines +5309 to +5310
Filter Operator
predicate: UDFToDouble(_col0) is not null (type: boolean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather minor but it seems that now some filter operators cannot be merged together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants