fix failure ITs and syntax error in CI workflow #3655

zane-neo · 2025-03-14T02:42:11Z

Description

fix failure ITs and syntax error in CI workflow

Related Issues

In this PR: #3243, the CI checks on Linux and Windows are passing but docker is failing and if check carefully, it's caused by missing tool: CatIndexTool. This PR is to fix the failure ITs in docker env.
In this PR: https://github.com/opensearch-project/ml-commons/actions/runs/13847649530/job/38749255141?pr=3652 the CI workflow has syntax issue, this PR also fix this.

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

.github/workflows/CI-workflow.yml

Signed-off-by: zane-neo <[email protected]>

mingshl · 2025-03-14T15:43:26Z

rerunning flaky test

> Task :opensearch-ml-plugin:test


REPRODUCE WITH: gradlew ':opensearch-ml-plugin:test' --tests "org.opensearch.ml.action.prediction.PredictionITTests.testPredictionWithDataFrame_BatchRCF" -Dtests.seed=D8DD9FE5DB7C334E -Dtests.security.manager=false -Dtests.locale=en-MY -Dtests.timezone=Pacific/Samoa -Druntime.java=23
PredictionITTests > testPredictionWithDataFrame_BatchRCF FAILED
    java.util.ConcurrentModificationException
        at __randomizedtesting.SeedInfo.seed([D8DD9FE5DB7C334E:DBE8E67137087E05]:0)
        at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1793)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:807)
        at java.base/java.util.stream.ReferencePipeline$7$1FlatMap.accept(ReferencePipeline.java:294)
        at java.base/java.util.WeakHashMap$ValueSpliterator.forEachRemaining(WeakHashMap.java:1224)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:636)
        at org.apache.logging.log4j.core.LoggerContext.updateLoggers(LoggerContext.java:776)
        at org.apache.logging.log4j.core.LoggerContext.updateLoggers(LoggerContext.java:766)
        at org.opensearch.common.logging.Loggers.removeAppender(Loggers.java:176)
        at org.opensearch.test.OpenSearchTestCase.removeHeaderWarningAppender(OpenSearchTestCase.java:411)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
        at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:258)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
        at java.base/java.lang.Thread.run(Thread.java:1575)

  1> [2025-03-13T22:59:44,358][INFO ][o.o.t.OpenSearchTestClusterRule] [testPredictionWithoutDataset_KMeans] [PredictionITTests#testPredictionWithoutDataset_KMeans]: setting up test

  1> [2025-03-13T22:59:44,386][INFO ][o.o.t.InternalTestCluster] [testPredictionWithoutDataset_KMeans] Setup InternalTestCluster [SUITE-TEST_WORKER_VM=[6]-CLUSTER_SEED=[-8582208545748883846]-HASH=[255888FC19C]-cluster] with seed [88E5DECC5250E67A] using [3] dedicated cluster-managers, [2] (data) nodes and [1] coord only nodes (min_cluster_manager_nodes are [auto-managed])
Suite: Test class org.opensearch.ml.action.prediction.PredictionITTests
  1> [2025-03-13T22:59:44,396][INFO ][o.o.n.Node               ] [testPredictionWithoutDataset_KMeans] version[3.0.0-alpha1-SNAPSHOT], pid[2368], build[unknown/127501789334d6deb19d206bf76d8475a9e27c54/2025-03-13T18:56:26.378517163Z], OS[Windows Server 2022/10.0/amd64], JVM[Azul Systems, Inc./OpenJDK 64-Bit Server VM/23.0.2/23.0.2+7]
  1> [2025-03-13T22:59:44,402][INFO ][o.o.n.Node               ] [testPredictionWithoutDataset_KMeans] JVM home [C:\hostedtoolcache\windows\jdk\23.0.2\x64]
  1> [2025-03-13T22:59:44,402][DEPRECATION][o.o.d.n.Node             ] [testPredictionWithoutDataset_KMeans] no-jdk distributions that do not bundle a JDK are deprecated and will be removed in a future release
  1> [2025-03-13T22:59:44,402][INFO ][o.o.n.Node               ] [testPredictionWithoutDataset_KMeans] JVM arguments [-Dgradle.dist.lib=C:\Users\runneradmin\.gradle\wrapper\dists\gradle-8.11.1-bin\bpt9gzteqjrbo1mjrsomdt32c\gradle-8.11.1\lib, -Dgradle.user.home=C:\Users\runneradmin\.gradle, -Dgradle.worker.jar=C:\Users\runneradmin\.gradle/caches/8.11.1/workerMain/gradle-worker.jar, -Dio.netty.noKeySetOptimization=true, -Dio.netty.noUnsafe=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Djava.awt.headless=true, -Djava.locale.providers=SPI,CLDR, -Djava.security.manager=allow, -Djna.nosys=true, -Dopensearch.scripting.update.ctx_in_params=false, -Dopensearch.search.rewrite_sort=true, -Dopensearch.transport.cname_in_publish_address=true, -Dorg.gradle.internal.worker.tmpdir=D:\a\ml-commons\ml-commons\plugin\build\tmp\test\work, -Dtests.artifact=opensearch-ml-plugin, -Dtests.gradle=true, -Dtests.logger.level=WARN, -Dtests.security.manager=false, -Dtests.seed=D8DD9FE5DB7C334E, -Dtests.task=:opensearch-ml-plugin:test, -Dtests.timeoutSuite=1800000!, -XX:+HeapDumpOnOutOfMemoryError, -esa, -javaagent:D:\a\ml-commons\ml-commons\plugin\build\tmp\.cache\expanded\zip_ff7bf8f04f99dc306508d1c81e47a68b\jacocoagent.jar=destfile=../../jacoco/test.exec,append=true,inclnolocationclasses=false,dumponexit=true,output=file,jmx=false, -XX:HeapDumpPat

ylwu-amzn · 2025-03-14T15:50:31Z

From this PR, several places still using CatIndexTool, that will cause CI failed.
Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

dhrubo-os · 2025-03-14T15:55:08Z

From this PR, several places still using CatIndexTool, that will cause CI failed. Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

+1

mingshl · 2025-03-14T16:08:00Z

From this PR, several places still using CatIndexTool, that will cause CI failed. Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

this PR is intended to change the existing test using CatIndexTool to ListIndexTool. @zane-neo did you scan through all the codes using CatIndexTool?

mingshl · 2025-03-15T01:01:36Z

local reproduce the integ test run success, approved

zane-neo · 2025-03-15T01:32:25Z

From this PR, several places still using CatIndexTool, that will cause CI failed. Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

Like I mentioned in the description, in that PR, CI checks on Linux and Windows are passing but docker is failing, usually this case I consider it's flaky tests(I assume others too), so merging the PR. We need to check on this as well as it might give others false signal in future PRs.

zane-neo · 2025-03-15T01:33:54Z

From this PR, several places still using CatIndexTool, that will cause CI failed. Why the PR #3243 was merged without the CI passing? In the future, let's ensure that all tests pass before merging to maintain code quality and prevent regressions.

this PR is intended to change the existing test using CatIndexTool to ListIndexTool. @zane-neo did you scan through all the codes using CatIndexTool?

Yes, you can see in that PR I changed several places of the CatIndexTool in tests, linux and windows checks passed as well, seems linux & windows check are different with dockers.

zane-neo requested review from b4sjoo, dhrubo-os, mingshl, jngz-es, model-collapse, rbhavna, ylwu-amzn, Zhangxunmt, austintlee, HenryL27 and xinyual as code owners March 14, 2025 02:42

opensearch-trigger-bot bot added the infra label Mar 14, 2025

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 02:43 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 02:43 — with GitHub Actions Failure

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 02:43 — with GitHub Actions Error

peterzhuamazon reviewed Mar 14, 2025

View reviewed changes

.github/workflows/CI-workflow.yml Outdated Show resolved Hide resolved

zane-neo temporarily deployed to ml-commons-cicd-env March 14, 2025 03:22 — with GitHub Actions Inactive

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 03:22 — with GitHub Actions Failure

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 03:22 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 05:31 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 05:31 — with GitHub Actions Failure

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 07:02 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 07:02 — with GitHub Actions Failure

fix failure ITs and syntax error in CI workflow

3917652

Signed-off-by: zane-neo <[email protected]>

zane-neo force-pushed the fix-failure-ITs-ListIndexTool branch from c807b63 to 3917652 Compare March 14, 2025 07:04

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 07:05 — with GitHub Actions Failure

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 07:05 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 07:05 — with GitHub Actions Failure

remove extra &&

f8d4f2e

Signed-off-by: zane-neo <[email protected]>

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 09:26 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 09:26 — with GitHub Actions Failure

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 09:26 — with GitHub Actions Error

zane-neo temporarily deployed to ml-commons-cicd-env March 14, 2025 15:43 — with GitHub Actions Inactive

peterzhuamazon mentioned this pull request Mar 14, 2025

[RELEASE] Release version 3.0.0 opensearch-project/opensearch-build#3747

Open

66 tasks

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 17:24 — with GitHub Actions Failure

zane-neo had a problem deploying to ml-commons-cicd-env March 14, 2025 18:55 — with GitHub Actions Failure

dhrubo-os approved these changes Mar 14, 2025

View reviewed changes

mingshl approved these changes Mar 15, 2025

View reviewed changes

zane-neo merged commit c8d1988 into opensearch-project:main Mar 15, 2025
8 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix failure ITs and syntax error in CI workflow #3655

fix failure ITs and syntax error in CI workflow #3655

zane-neo commented Mar 14, 2025 •

edited

Loading

mingshl commented Mar 14, 2025

ylwu-amzn commented Mar 14, 2025

dhrubo-os commented Mar 14, 2025

mingshl commented Mar 14, 2025

mingshl commented Mar 15, 2025

zane-neo commented Mar 15, 2025

zane-neo commented Mar 15, 2025

fix failure ITs and syntax error in CI workflow #3655

fix failure ITs and syntax error in CI workflow #3655

Conversation

zane-neo commented Mar 14, 2025 • edited Loading

Description

Related Issues

Check List

mingshl commented Mar 14, 2025

ylwu-amzn commented Mar 14, 2025

dhrubo-os commented Mar 14, 2025

mingshl commented Mar 14, 2025

mingshl commented Mar 15, 2025

zane-neo commented Mar 15, 2025

zane-neo commented Mar 15, 2025

zane-neo commented Mar 14, 2025 •

edited

Loading