Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix failure ITs and syntax error in CI workflow #3655

Merged

Conversation

zane-neo
Copy link
Collaborator

@zane-neo zane-neo commented Mar 14, 2025

Description

fix failure ITs and syntax error in CI workflow

Related Issues

In this PR: #3243, the CI checks on Linux and Windows are passing but docker is failing and if check carefully, it's caused by missing tool: CatIndexTool. This PR is to fix the failure ITs in docker env.
In this PR: https://github.com/opensearch-project/ml-commons/actions/runs/13847649530/job/38749255141?pr=3652 the CI workflow has syntax issue, this PR also fix this.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env March 14, 2025 03:22 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env March 14, 2025 03:22 — with GitHub Actions Inactive
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 03:22 — with GitHub Actions Failure
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 05:31 — with GitHub Actions Failure
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 07:02 — with GitHub Actions Failure
@zane-neo zane-neo force-pushed the fix-failure-ITs-ListIndexTool branch from c807b63 to 3917652 Compare March 14, 2025 07:04
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 07:05 — with GitHub Actions Failure
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 07:05 — with GitHub Actions Failure
Signed-off-by: zane-neo <[email protected]>
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 09:26 — with GitHub Actions Failure
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 09:26 — with GitHub Actions Failure
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env March 14, 2025 15:43 — with GitHub Actions Inactive
@mingshl
Copy link
Collaborator

mingshl commented Mar 14, 2025

rerunning flaky test

> Task :opensearch-ml-plugin:test


REPRODUCE WITH: gradlew ':opensearch-ml-plugin:test' --tests "org.opensearch.ml.action.prediction.PredictionITTests.testPredictionWithDataFrame_BatchRCF" -Dtests.seed=D8DD9FE5DB7C334E -Dtests.security.manager=false -Dtests.locale=en-MY -Dtests.timezone=Pacific/Samoa -Druntime.java=23
PredictionITTests > testPredictionWithDataFrame_BatchRCF FAILED
    java.util.ConcurrentModificationException
        at __randomizedtesting.SeedInfo.seed([D8DD9FE5DB7C334E:DBE8E67137087E05]:0)
        at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1793)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:807)
        at java.base/java.util.stream.ReferencePipeline$7$1FlatMap.accept(ReferencePipeline.java:294)
        at java.base/java.util.WeakHashMap$ValueSpliterator.forEachRemaining(WeakHashMap.java:1224)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:636)
        at org.apache.logging.log4j.core.LoggerContext.updateLoggers(LoggerContext.java:776)
        at org.apache.logging.log4j.core.LoggerContext.updateLoggers(LoggerContext.java:766)
        at org.opensearch.common.logging.Loggers.removeAppender(Loggers.java:176)
        at org.opensearch.test.OpenSearchTestCase.removeHeaderWarningAppender(OpenSearchTestCase.java:411)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
        at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:258)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at java.base/java.lang.Thread.run(Thread.java:1575)

  1> [2025-03-13T22:59:44,358][INFO ][o.o.t.OpenSearchTestClusterRule] [testPredictionWithoutDataset_KMeans] [PredictionITTests#testPredictionWithoutDataset_KMeans]: setting up test

  1> [2025-03-13T22:59:44,386][INFO ][o.o.t.InternalTestCluster] [testPredictionWithoutDataset_KMeans] Setup InternalTestCluster [SUITE-TEST_WORKER_VM=[6]-CLUSTER_SEED=[-8582208545748883846]-HASH=[255888FC19C]-cluster] with seed [88E5DECC5250E67A] using [3] dedicated cluster-managers, [2] (data) nodes and [1] coord only nodes (min_cluster_manager_nodes are [auto-managed])
Suite: Test class org.opensearch.ml.action.prediction.PredictionITTests
  1> [2025-03-13T22:59:44,396][INFO ][o.o.n.Node               ] [testPredictionWithoutDataset_KMeans] version[3.0.0-alpha1-SNAPSHOT], pid[2368], build[unknown/127501789334d6deb19d206bf76d8475a9e27c54/2025-03-13T18:56:26.378517163Z], OS[Windows Server 2022/10.0/amd64], JVM[Azul Systems, Inc./OpenJDK 64-Bit Server VM/23.0.2/23.0.2+7]
  1> [2025-03-13T22:59:44,402][INFO ][o.o.n.Node               ] [testPredictionWithoutDataset_KMeans] JVM home [C:\hostedtoolcache\windows\jdk\23.0.2\x64]
  1> [2025-03-13T22:59:44,402][DEPRECATION][o.o.d.n.Node             ] [testPredictionWithoutDataset_KMeans] no-jdk distributions that do not bundle a JDK are deprecated and will be removed in a future release
  1> [2025-03-13T22:59:44,402][INFO ][o.o.n.Node               ] [testPredictionWithoutDataset_KMeans] JVM arguments [-Dgradle.dist.lib=C:\Users\runneradmin\.gradle\wrapper\dists\gradle-8.11.1-bin\bpt9gzteqjrbo1mjrsomdt32c\gradle-8.11.1\lib, -Dgradle.user.home=C:\Users\runneradmin\.gradle, -Dgradle.worker.jar=C:\Users\runneradmin\.gradle/caches/8.11.1/workerMain/gradle-worker.jar, -Dio.netty.noKeySetOptimization=true, -Dio.netty.noUnsafe=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Djava.awt.headless=true, -Djava.locale.providers=SPI,CLDR, -Djava.security.manager=allow, -Djna.nosys=true, -Dopensearch.scripting.update.ctx_in_params=false, -Dopensearch.search.rewrite_sort=true, -Dopensearch.transport.cname_in_publish_address=true, -Dorg.gradle.internal.worker.tmpdir=D:\a\ml-commons\ml-commons\plugin\build\tmp\test\work, -Dtests.artifact=opensearch-ml-plugin, -Dtests.gradle=true, -Dtests.logger.level=WARN, -Dtests.security.manager=false, -Dtests.seed=D8DD9FE5DB7C334E, -Dtests.task=:opensearch-ml-plugin:test, -Dtests.timeoutSuite=1800000!, -XX:+HeapDumpOnOutOfMemoryError, -esa, -javaagent:D:\a\ml-commons\ml-commons\plugin\build\tmp\.cache\expanded\zip_ff7bf8f04f99dc306508d1c81e47a68b\jacocoagent.jar=destfile=../../jacoco/test.exec,append=true,inclnolocationclasses=false,dumponexit=true,output=file,jmx=false, -XX:HeapDumpPat

@ylwu-amzn
Copy link
Collaborator

From this PR, several places still using CatIndexTool, that will cause CI failed.
Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

@dhrubo-os
Copy link
Collaborator

From this PR, several places still using CatIndexTool, that will cause CI failed. Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

+1

@mingshl
Copy link
Collaborator

mingshl commented Mar 14, 2025

From this PR, several places still using CatIndexTool, that will cause CI failed. Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

this PR is intended to change the existing test using CatIndexTool to ListIndexTool. @zane-neo did you scan through all the codes using CatIndexTool?

@mingshl
Copy link
Collaborator

mingshl commented Mar 15, 2025

local reproduce the integ test run success, approved

@zane-neo
Copy link
Collaborator Author

From this PR, several places still using CatIndexTool, that will cause CI failed. Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

Like I mentioned in the description, in that PR, CI checks on Linux and Windows are passing but docker is failing, usually this case I consider it's flaky tests(I assume others too), so merging the PR. We need to check on this as well as it might give others false signal in future PRs.

@zane-neo
Copy link
Collaborator Author

From this PR, several places still using CatIndexTool, that will cause CI failed. Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

this PR is intended to change the existing test using CatIndexTool to ListIndexTool. @zane-neo did you scan through all the codes using CatIndexTool?

Yes, you can see in that PR I changed several places of the CatIndexTool in tests, linux and windows checks passed as well, seems linux & windows check are different with dockers.

@zane-neo zane-neo merged commit c8d1988 into opensearch-project:main Mar 15, 2025
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants