-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29231: Fix flaky TestEmbeddedHiveMetaStore.testDatabase by enforcing deterministic database lookup #6107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…cing deterministic database lookup The test org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testDatabase(standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java) passed using normal maven-test, but showed Non-deterministic behavior under NonDex(https://github.com/TestingResearchIllinois/NonDex) and thus failed. NonDex is a tool for detecting hidden assumptions in code by exploring non-deterministic behaviors of specifications that allow multiple valid implementations. Some of the error messages are: ``` Build Failure observed: java.lang.AssertionError: first database is not testdb1 in list: [] ``` Steps to reproduce: ``` mvn -pl standalone-metastore/metastore-server edu.illinois:nondex-maven-plugin:2.2.1:nondex -Dtest=org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore#testDatabase ``` Root cause: The test asserted database existence using getDatabases(".*") which relies on cached database lists that may return stale results when nondex randomizes test execution. This led to nondeterministic test outcomes where databases existed but were not found in the cached list. Fix: Replace getDatabases(".*") with direct getDatabase(name) calls in TestHiveMetaStore class's testDatabase method. This ensures consistent behavior regardless of test shuffling. Additionally, I believe this improve the harness of tests without altering and damaging the core database operations being tested(add, find, drop). Confirm Fixed: re-run ``` mvn -pl standalone-metastore/metastore-server edu.illinois:nondex-maven-plugin:2.2.1:nondex -Dtest=org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore#testDatabase ``` And the result is Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, BUILD SUCCESS
|
|
||
assertTrue("first database is not " + TEST_DB1_NAME, dbs.contains(TEST_DB1_NAME)); | ||
assertTrue("second database is not " + TEST_DB2_NAME, dbs.contains(TEST_DB2_NAME)); | ||
Database verifyDb1 = client.getDatabase(TEST_DB1_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't follow why this change is necessary. client.getDatabases(".*") should return list containing both TEST_DB1_NAME & TEST_DB2_NAME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about the same. The ticket description says:
The test asserted database existence using getDatabases(".*") which relies on cached database lists that may return stale results when nondex randomizes test execution. This led to nondeterministic test outcomes where databases existed but were not found in the cached list.
It is interesting. As I see, getDatabase uses cache as well:
@Override public List<String> getDatabases(String catName, String pattern) throws MetaException {
if (!sharedCache.isDatabaseCachePrewarmed() || (canUseEvents && rawStore.isActiveTransaction())) {
return rawStore.getDatabases(catName, pattern);
}
return sharedCache.listCachedDatabases(catName, pattern);
}
@Override public Database getDatabase(String catName, String dbName) throws NoSuchObjectException {
// in case of event based cache update, cache will be updated during commit. So within active transaction, read
// directly from rawStore to avoid reading stale data as the data updated during same transaction will not be
// updated in the cache.
if (!sharedCache.isDatabaseCachePrewarmed() || (canUseEvents && rawStore.isActiveTransaction())) {
return rawStore.getDatabase(catName, dbName);
}
dbName = dbName.toLowerCase();
Database db = sharedCache
.getDatabaseFromCache(StringUtils.normalizeIdentifier(catName), StringUtils.normalizeIdentifier(dbName));
if (db == null) {
throw new NoSuchObjectException();
}
return db;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, seems cache is not the cause. Nondex appears to be changing the pattern matching behavior. We are still investigating why Nondex breaks this test.
What changes were proposed in this pull request?
HIVE-29231
This PR fixes a nondeterministic behavior in
TestEmbeddedHiveMetaStore.testDatabase
Previously, the test used
getDatabases(".*")
to verify database existence. However, this method relies on cached lists which may become stale when test execution order is randomized by NonDex, which is a tool for detecting hidden assumptions in code by exploring non-deterministic behaviors of specifications that allow multiple valid implementations.The fix is to replace getDatabases(".*") with direct getDatabase(name) calls in the test. This ensures consistent behavior regardless of test shuffling and avoids relying on potentially stale cached results.
Why are the changes needed?
The test was failing nondeterministically under NonDex with errors like:
Root cause: Cached database lists may not reflect the actual state when randomized execution occurs.
Does this PR introduce any user-facing change?
No. I believe this improve the harness of tests without altering and damaging the core database operations being tested(add, find, drop).
How was this patch tested?
Failure observed with running NonDex:
Under environment: