Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(statemachine): do not raise from state.run #1115

Merged
merged 8 commits into from
Feb 19, 2025

Conversation

emizzle
Copy link
Contributor

@emizzle emizzle commented Feb 14, 2025

Prevents the state machine's run function from raising exceptions, ensuring there are no exceptions leaking out, and that each state can handle exceptions on an individual basis, instead of handling them all the same at the statemachine level.

- re-raise CancelledError
- don't return State on CatchableError
- expect the Proofs_InvalidProof custom error instead of checking a string
This was swallowing a KeyError in one of the tests (fixed in the previous commit)
Copy link
Member

@AuHau AuHau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you encounter some problem related to this? Not fan of the verbosity, but I guess overall is an improvement 😅

Just to double-check, I see lots of swallowing of CancelledErrors - that is because the states runs are asyncSpawned right?

An idea - maybe as to elevate some of the added verbosity, we could allow CancelledError in the raises and it could be left to propagate to the Machine.scheduler() where it is already now discarded anyhow?

Comment on lines 49 to 50
ended.cancelSoon()
return some State(PurchaseFailed())
else:
failed.cancelSoon()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be then cancelled in the CatchableError branch as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point and I honestly don't know the answer. My initial assumption was that once the reference to the run proc was GC'd (during an exception), that the futures (ended and failed) would be GC'd, but i spent a bit of time looking at the values in the heap (using -d:nimTypeNames and dumpNumberOfInstances()) and honestly I can't tell if those futures are GC'd or not. So I've asked Eugene: https://discord.com/channels/613988663034118151/616299964242460682/1341328004923658251

Copy link
Member

@markspanbroek markspanbroek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @emizzle! It is indeed much nicer to let states handle their own errors.


return some State(SalePayout())
except CancelledError as e:
trace "SalePreparing.run onCleanUp was cancelled", error = e.msgDetail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trace "SalePreparing.run onCleanUp was cancelled", error = e.msgDetail
trace "SaleProving.run onCleanup was cancelled", error = e.msgDetail

Same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, sorry mark, I missed this one. IMO, onCleanup should be removed entirely from that trace log. Can do in a future PR.

@emizzle
Copy link
Contributor Author

emizzle commented Feb 18, 2025

@AuHau

Did you encounter some problem related to this? Not fan of the verbosity, but I guess overall is an improvement 😅

Yes, overall, I have come back to fixing this at least 3 times because I've spent so much time debugging it. When there's CancelledErrors leaking out, it becomes very difficult to reason about what is happening from the log output, especially when they're propagating the stack invisibly. Changing to async: (raises: []) provides a bit of debugging relief, and allows the log output to be much more easily followed.

Just to double-check, I see lots of swallowing of CancelledErrors - that is because the states runs are asyncSpawned right?

Yes, that's correct. If they are not swallowed, the asyncSpawn causes a Defect

An idea - maybe as to elevate some of the added verbosity, we could allow CancelledError in the raises and it could be left to propagate to the Machine.scheduler() where it is already now discarded anyhow?

Yes, that's a good idea, which as you said is how we have it now, but in my mind, it's easier to reason about run not raising anything at all. When CancelledErrors are raised, debugging becomes a pain in the ass (see above).

@emizzle emizzle added this pull request to the merge queue Feb 19, 2025
Merged via the queue into master with commit 87590f4 Feb 19, 2025
19 checks passed
@emizzle emizzle deleted the fix/statemachine/propagate-cancelled-errors branch February 19, 2025 01:28
2-towns pushed a commit that referenced this pull request Feb 24, 2025
* fix(statemachine): do not raise from state.run

* fix rebase

* fix exception handling in SaleProvingSimulated.prove

- re-raise CancelledError
- don't return State on CatchableError
- expect the Proofs_InvalidProof custom error instead of checking a string

* asyncSpawn salesagent.onCancelled

This was swallowing a KeyError in one of the tests (fixed in the previous commit)

* remove error handling states in asyncstatemachine

* revert unneeded changes

* formatting

* PR feedback, logging updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants