Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry Integration #903

Merged
merged 14 commits into from
Mar 19, 2025
Merged

Sentry Integration #903

merged 14 commits into from
Mar 19, 2025

Conversation

josephjclark
Copy link
Collaborator

@josephjclark josephjclark commented Mar 11, 2025

Short Description

This PR adds basic Sentry integration for:

  • General server errors (these should be incredibly rare?)
  • Errors on Worker-Lightning events (this is the key bit)

Fixes #603

Implementation Details

I've had a horrible time with this.

Apparently Sentry is fine if you want to catch exceptions at the top level of your koa web app. But its documentation is dreadful if you want to do anything outside of its standard lanes.

I'm basically worried that when we trap errors, they don't include any useful diagnostic information.

After hours of trial and error I've managed to get to this for an error on run:start

image

image

image

See the OG entry report

One specific problem I have is that you can't just pass an Error into the capture function and expect a decent report. Oh no. They also don't give any documentation (that I can find anyway) on what a nodejs error should look like to get good reporting.

Events refactor

I've refactored the sendEvent utility function which is directly responsible for comms with lightning.

Now IT is directly responsible for reporting errors from lightning to sentry, so this stuff happens at a nice low level.

I've got unit tests around this too (took ages to realise I had to "sleep" to wait for async events to process - another bit of awful Sentry docs)

TODO

  • Sourcemapping (or disable stack traces)
  • Can we do anything about OOM? Likely a new issue
  • Testing (how can we mock sentry and ensure we call it it in the right place? I just wanna test the hooks)
  • I am concerned that the scope stuff isn't right. If the worker is running two runs concurrently, is it going to report the correct state? I presume that each koa route will have its own scope (no docs say this anywhere), but I don't know if that works for web sockets. Writing this out I think I just need to force create a new scope for each execute block

Error Data

It would be nice to append the payload to every errror event. So that if a run:start or log is rejected, we can see why.

But we can't really do that as these payloads are likely to be large or sensitive. I don't want to just send them to sentry.

A good future step wold be to manually curate each payload and:
a) log non-sensitive stuff
b) For anything sensitive like state or a log message, maybe report a schema rather than the actual data

QA Notes

List any considerations/cases/advice for testing/QA here.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

@josephjclark josephjclark marked this pull request as draft March 11, 2025 17:01
@josephjclark

This comment was marked as resolved.

@josephjclark
Copy link
Collaborator Author

At the moment, we're not sending sentry sourcemaps, so the stack traces will all report wrong positions.

We can set up a plugin which I believe will upload sourcemaps to sentry's servers. We would need to do this at build time in CI.

Thinking about it, I don't actually have a problem with seeing the minified source in errors. Just show me the compiled JS< it's fine, It's not even minified. So why are we seeing completely wrong positions? I think it's trying to be smart.

I think this is something to do with our typescript loader and how it reports source positions. In production, we should be running the js directly and so get more accurate error positions

So I'm gonna skip sourcemapping for now. I'm really not keen on it and I don't think its necessary.

@josephjclark josephjclark marked this pull request as ready for review March 14, 2025 15:50
@josephjclark josephjclark changed the base branch from main to release/next March 14, 2025 15:52
@josephjclark

This comment was marked as resolved.

@josephjclark
Copy link
Collaborator Author

Hey @doc-han, do you think you can give this a review sometime tomorrow? Don't look at my history 😂 Just take a look at where and how sentry is integrated and if you have any thoughts, I'd love to hear them

@josephjclark josephjclark merged commit ce5022a into release/next Mar 19, 2025
9 checks passed
@josephjclark josephjclark deleted the sentry branch March 19, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Worker: Sentry Integration
2 participants