- 
                Notifications
    
You must be signed in to change notification settings  - Fork 4.8k
 
HIVE-29181: SELECT query on VIEW with IS NOT NULL operator producing unexpected result #6060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…unexpected result
          @check-spelling-bot Report🔴 Please reviewSee the files view or the action log for details. Unrecognized words (3)bucketedtables Previously acknowledged words that are now absentaarry bytecode cwiki HIVEFETCHOUTPUTSERDE timestamplocal yyyyTo accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands... in a clone of the [email protected]:mdayakar/hive.git repository If the flagged items do not appear to be textIf items relate to a ... 
  | 
    
          
 | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we simplify the test case, please? E.g., by removing unnecessary columns, using simpler constants. The following test case reproduces the issue:
CREATE TABLE IF NOT EXISTS t0(col DOUBLE);
CREATE VIEW v0 AS (SELECT ALL (t0.col) AS col FROM t0);
INSERT INTO t0(col) VALUES(0.1);
explain cbo SELECT t0.col from t0 WHERE ((1000)/(0)) IS NOT NULL;
SELECT t0.col from t0 WHERE ((1000)/(0)) IS NOT NULL;
explain cbo SELECT v0.col FROM v0 WHERE ((1000)/(0)) IS NOT NULL;
SELECT v0.col FROM v0 WHERE ((1000)/(0)) IS NOT NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also curious to know if the datatype (i.e., DOUBLE) is important for reproducing the problem. IF not then probably I would pick something more straightforward like an INT or STRING.
| Reducer 3 llap | ||
| File Output Operator [FS_16] | ||
| Select Operator [SEL_15] (rows=24 width=100) | ||
| Select Operator [SEL_15] (rows=21 width=100) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the row count are just estimations, not the real row count of the executed query?
| 
           The description mentions that the filter condition is removed at some point. Who does the removal and why? It feels wrong to remove a filter that is always   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also curious to know if the datatype (i.e., DOUBLE) is important for reproducing the problem. IF not then probably I would pick something more straightforward like an INT or STRING.
| TableScan | ||
| alias: src | ||
| filterExpr: (((value < 'val_50') or (key > '2')) and (((key > '20') and (key < '4')) or ((key > '4') and (key < '400')))) (type: boolean) | ||
| filterExpr: (((value < 'val_50') or key is not null) and (((key > '20') and (key < '4')) or ((key > '4') and (key < '400')))) (type: boolean) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the simplification of key > '2' to key is not null valid?
| Filter Operator | ||
| predicate: UDFToDouble(_col0) is not null (type: boolean) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather minor but it seems that now some filter operators cannot be merged together.



HIVE-29181: SELECT query on VIEW with IS NOT NULL operator producing unexpected result
What changes were proposed in this pull request?
Here when the data is selected from a view with
(([1008753865](tel:1008753865))/(0)) IS NOT NULLfilter condition which actually results to FALSE should not return any rows but actually it us returning rows, where as with same filter condition selecting data from a table working fine, not giving any rows as a result.This is due to
HiveFilterProjectTransposeRulerule which is the first rule in the list is getting applied for view query and finally the filter condition is getting removed so it is returning the rows where as for table queryReduceExpressionsRule.FilterReduceExpressionsRuleis getting applied and the plan is getting changed toHiveValues(tuples=[[]])so no rows are getting fetched.So to fix the issue
ReduceExpressionsRule.FilterReduceExpressionsRulecan be added beforeHiveFilterProjectTransposeRuleso thatReduceExpressionsRule.FilterReduceExpressionsRulewill get applied for view query also and gives proper output.Why are the changes needed?
To fix the issue mentioned in HIVE-29181
Does this PR introduce any user-facing change?
No
How was this patch tested?
Using q file tests
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=view_with_where_exp.q -pl itests/qtest -Pitests