-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The PostCommit Python Arm job is flaky #30760
Comments
@tvalentyn do we have a good owner for this ? |
I actually can't find a single green run since this test suite was created (back in September) |
You may be right, thanks for correction, @ahmedabu98
cc: @damccorm - do you remember if this suite never worked or the above error is an artifact of GHA migration? We can reclassify this as part part of ARM backlog work. |
Looks like it went flaky then permared around then |
Ahh my apologies, I was looking at it through a |
So by removing I get the test to move along but its still failing on my fork due to some permission with the Healthcare api. |
@volatilemolotov could you put up a PR to make that change? Definitely seems like it is getting further. @svetakvsundhar do you know what scope is missing? Given the normal postcommit python isn't failing, it might just be an issue with your service account specifically? |
Sure, here it is |
Thanks - merged, lets see what the result on master is |
+1, it could be a service account specific issue. I'd want to see a couple of more runs of this to see if it's actually an issue. If so, a thought might be to add |
Great, thanks @volatilemolotov Looks like we're still flaky - https://github.com/apache/beam/actions/runs/8843342204/job/24283441647 - but that's an improvement and it looks like a test flake instead of infra |
Permared now |
Reopening since the workflow is still flaky |
Fixed by #32530 |
Reopening since the workflow is still flaky |
This is failing because of Dataflow issues, not because of Beam. Dataflow is requesting arm machines in regions where there are none, failing the job. I reopened an internal bug (id 352725422) |
The following tests appear to be consistently failing:
Errors are smilar: @tvalentyn @damccorm any ideas? |
Python Postcommits are broken for the same reason: https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python.yml |
I'm guessing this is from https://github.com/apache/beam/pull/33658/files where we updated the numpy version to 2.x. See https://stackoverflow.com/questions/40845304/runtimewarning-numpy-dtype-size-changed-may-indicate-binary-incompatibility It is coming from unpickling the model here -
|
where are the instructions to retrain the model? The model training should be part of tests to avoid maintaining this in the future if the training is cheap. |
I do not think we have the original script to train these models. But https://dmkothari.github.io/Machine-Learning-Projects/SVM_with_MNIST.html should be simple enough for us to update these tests. @Amar3tto |
I agree - this seems like the right fix. I imagine retraining should be reasonably cheap |
I tried retraining the model, but the error |
I think we should either disable this test for Py39 and Py310 or create a Python venv with the numpy 1.xx to run it. |
The PostCommit Python Arm is failing over 50% of the time
Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_Arm.yml?query=is%3Afailure+branch%3Amaster to see the logs.
The text was updated successfully, but these errors were encountered: