Sentry Integration #903

josephjclark · 2025-03-11T16:58:58Z

Short Description

This PR adds basic Sentry integration for:

General server errors (these should be incredibly rare?)
Errors on Worker-Lightning events (this is the key bit)

Fixes #603

Implementation Details

I've had a horrible time with this.

Apparently Sentry is fine if you want to catch exceptions at the top level of your koa web app. But its documentation is dreadful if you want to do anything outside of its standard lanes.

I'm basically worried that when we trap errors, they don't include any useful diagnostic information.

After hours of trial and error I've managed to get to this for an error on run:start

See the OG entry report

One specific problem I have is that you can't just pass an Error into the capture function and expect a decent report. Oh no. They also don't give any documentation (that I can find anyway) on what a nodejs error should look like to get good reporting.

Events refactor

I've refactored the sendEvent utility function which is directly responsible for comms with lightning.

Now IT is directly responsible for reporting errors from lightning to sentry, so this stuff happens at a nice low level.

I've got unit tests around this too (took ages to realise I had to "sleep" to wait for async events to process - another bit of awful Sentry docs)

TODO

~~Sourcemapping (or disable stack traces)~~
Can we do anything about OOM? Likely a new issue
Testing (how can we mock sentry and ensure we call it it in the right place? I just wanna test the hooks)
I am concerned that the scope stuff isn't right. If the worker is running two runs concurrently, is it going to report the correct state? I presume that each koa route will have its own scope (no docs say this anywhere), but I don't know if that works for web sockets. Writing this out I think I just need to force create a new scope for each execute block

Error Data

It would be nice to append the payload to every errror event. So that if a run:start or log is rejected, we can see why.

But we can't really do that as these payloads are likely to be large or sensitive. I don't want to just send them to sentry.

A good future step wold be to manually curate each payload and:
a) log non-sensitive stuff
b) For anything sensitive like state or a log message, maybe report a schema rather than the actual data

QA Notes

List any considerations/cases/advice for testing/QA here.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

josephjclark · 2025-03-12T19:01:30Z

At the moment, we're not sending sentry sourcemaps, so the stack traces will all report wrong positions.

We can set up a plugin which I believe will upload sourcemaps to sentry's servers. We would need to do this at build time in CI.

Thinking about it, I don't actually have a problem with seeing the minified source in errors. Just show me the compiled JS< it's fine, It's not even minified. So why are we seeing completely wrong positions? I think it's trying to be smart.

I think this is something to do with our typescript loader and how it reports source positions. In production, we should be running the js directly and so get more accurate error positions

So I'm gonna skip sourcemapping for now. I'm really not keen on it and I don't think its necessary.

josephjclark · 2025-03-18T15:36:04Z

Hey @doc-han, do you think you can give this a review sometime tomorrow? Don't look at my history 😂 Just take a look at where and how sentry is integrated and if you have any thoughts, I'd love to hear them

josephjclark added 3 commits March 11, 2025 13:57

worker: plug sentry in

bc39bda

worker: better sentry tracking on events

12874ba

tidy up

9fc6876

josephjclark marked this pull request as draft March 11, 2025 17:01

This comment was marked as resolved.

Sign in to view

josephjclark added 2 commits March 12, 2025 17:52

worker: refactor send-event stuff and add unit tests around sentry

f192a6c

remove unused tests

e585c96

fix tests

204a620

josephjclark marked this pull request as ready for review March 14, 2025 15:50

josephjclark changed the base branch from main to release/next March 14, 2025 15:52

josephjclark added 4 commits March 14, 2025 17:03

hook sentry up to credential and data clip errors

049d92e

remove get-with-reply

f7a14a5

add isolation scope

0e2b5be

engine: tweak test

6d84db9

This comment was marked as resolved.

Sign in to view

josephjclark added 3 commits March 18, 2025 09:23

engine: fix flaky test

1ba8c63

tests: fix test

c6f5894

actually fix the test this time

d12d4fd

josephjclark assigned doc-han Mar 18, 2025

josephjclark unassigned doc-han Mar 19, 2025

changeset

1f7d393

josephjclark merged commit ce5022a into release/next Mar 19, 2025
9 checks passed

josephjclark deleted the sentry branch March 19, 2025 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentry Integration #903

Sentry Integration #903

josephjclark commented Mar 11, 2025 •

edited

Loading

This comment was marked as resolved.

josephjclark commented Mar 12, 2025

This comment was marked as resolved.

josephjclark commented Mar 18, 2025

Sentry Integration #903

Sentry Integration #903

Conversation

josephjclark commented Mar 11, 2025 • edited Loading

Short Description

Implementation Details

Events refactor

TODO

Error Data

QA Notes

AI Usage

This comment was marked as resolved.

josephjclark commented Mar 12, 2025

This comment was marked as resolved.

josephjclark commented Mar 18, 2025

josephjclark commented Mar 11, 2025 •

edited

Loading