Skip to content

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Aug 27, 2025

Remove sentinel context where we log

Prerequisite for #18868

We shouldn't ever be logging something from the server with the sentinel logging context. We want to know which server generated the logs (c.f. #18868).

Logging with the sentinel context was previously used as a crutch and get out of jail free card in some tricky scenarios. This PR aims to fix all of the cases where we see the sentinel context being used for a log. The sentinel context is now only used to ensure we're not leaking contexts when yielding control back to the reactor which would erroneously be picked up by other requests and tasks going on in the reactor.

Testing strategy

  1. Run Synapse normally and with daemonize: true: poetry run synapse_homeserver --config-path homeserver.yaml
  2. Execute some requests
  3. Shutdown the server
  4. Look for any bad log entries in your homeserver logs:
    • Expected logging context sentinel but found main
    • Expected logging context main was lost
    • Expected previous context
    • utime went backwards!/stime went backwards!
    • Called stop on logcontext POST-0 without recording a start rusage
  5. Look for any logs coming from the sentinel context

Dev notes

Whenever we yield to the Twisted reactor (event loop), we need to set the sentinel log context so log contexts don't leak and apply to the next task.


Synapse log context docs: docs/log_contexts.md



Expected logging context sentinel but found main
Expected logging context main was lost
backwards

This is part of tracing though:

There was no active span when trying to log. Did you forget to start one or did a context slip?

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

```
2025-08-26 18:40:27,996 - my.synapse.linux.server - synapse.app.homeserver - 187 - WARNING - main - Starting daemon.
2025-08-26 18:40:27,996 - my.synapse.linux.server - synapse.app.homeserver - 181 - WARNING - atexit - Stopping daemon.
```
Running a normal server (`daemonize: false`):

```
poetry run synapse_homeserver --config-path homeserver.yaml
```

Bad logs being seen:

```
PreserveLoggingContext: Expected logging context sentinel but found main
```

```
LoggingContext: Expected logging context main was lost
```
Resulting in bad logs being seen:

```
PreserveLoggingContext: Expected logging context sentinel but found main
```

```
LoggingContext: Expected logging context main was lost
```
Comment on lines -188 to -191
# make sure that we run the reactor with the sentinel log context,
# otherwise other PreserveLoggingContext instances will get confused
# and complain when they see the logcontext arbitrarily swapping
# between the sentinel and `run` logcontexts.
Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was added in 067b00d#diff-6e21d6a61b2f6b6f2d4ce961991ba7f27e83605f927eaa4b19d2c46a975a96c1R460-R463 (matrix-org/synapse#2027)

I can't tell exactly what's it's referring to but the one spot I was seeing Expected logging context sentinel but found main around run_as_background_process(...) usage has been fixed (#18870 (comment)).

Reproduction:

  1. poetry run synapse_homeserver --config-path homeserver.yaml
  2. Ctrl + C to stop the server
  3. Notice LoggingContext: Expected logging context main was lost in the logs

Comment on lines -193 to -195
# We also need to drop the logcontext before forking if we're daemonizing,
# otherwise the cputime metrics get confused about the per-thread resource usage
# appearing to go backwards.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was added in matrix-org/synapse#5609 and is accurately describing a real problem.

But we can be way more precise about what do to here. We only need to stop the current log context right before forking the process and start it again right after. This can be accomplished with the PreserveLoggingContext being moved strictly around the os.fork() call (search for os.fork() below).


Previously, this caused the sentinel context to be used for a whole bunch of logs (basically anything outside of a request). We now get the main context used 💪

# Note that we return a Deferred here so that it can be used in a
# looping_call and other places that expect a Deferred.
return defer.ensureDeferred(run())
return make_deferred_yieldable(defer.ensureDeferred(run()))
Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As self-described in the doc comment here, previously, this didn't follow the Synapse log context rules which caused a whole bunch of downstream havoc. Previously, this seemed to be masked because we had everything run in the sentinel context.

By downstream havoc, I mean Expected logging context sentinel but found main and Expected logging context main was lost type warnings which means we're leaking and losing contexts which are being erroneously used by the next thing the reactor does.

Even though it was called out explicitly for not following the Synapse log context rules, I don't think that has any specific benefit and makes sense to update it in any case.


Reproduction:

  1. poetry run synapse_homeserver --config-path homeserver.yaml
  2. Ctrl + C to stop the server
  3. Notice LoggingContext: Expected logging context main was lost in the logs

Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This distinction actually seems to be important because (TODO)

It makes things get stuck:

  1. poetry run synapse_homeserver --config-path homeserver.yaml
  2. This works fine: curl http://localhost:8008/_matrix/client/versions
  3. But this endpoint just never responds for example: curl 'http://localhost:8008/_matrix/client/v3/rooms/!FlgJYGQKAIvAscfBhq:my.synapse.linux.server/state/m.room.create/' -H 'Authorization: Bearer syt_bG9jYWwtZ3V5_mfWAFyxPnCnTRQjLiAyx_11rr8F'

Reproduction instructions:

 1. `poetry run synapse_homeserver --config-path homeserver.yaml`
 1. `curl http://localhost:8008/_matrix/client/versions`
 1. Stop Synapse (`Ctrl + c`)

Notice the bad log:

```
synapse.logging.context - WARNING - sentinel - LoggingContext: Expected logging context main was lost
```
…text rules"

This reverts commit 675d94a.

Things get stuck with this, see #18870 (comment)
Reproduction:

 1. `poetry run synapse_homeserver --config-path homeserver.yaml`
 1. Ctrl + C to stop the server
 1. Notice `LoggingContext: Expected logging context main was lost` in the logs
# We also need to drop the logcontext before forking if we're daemonizing,
# otherwise the cputime metrics get confused about the per-thread resource usage
# appearing to go backwards.
with PreserveLoggingContext():
Copy link
Contributor Author

@MadLittleMods MadLittleMods Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to a quick pairing session with @erikjohnston, the key insight was that removing PreserveLoggingContext here was causing problems down the line 😇

I removed PreserveLoggingContext here in order to maintain the main logging context further into Synapse starting up and running. But this also meant leaking the main logging context into the Twisted reactor event loop which causes it to be applied to the next task. This appeared like the desired result because the next task was start but also meant it got messed up later down the line (I assume this is what was happening).

I was on the right line of thinking by removing PreserveLoggingContext here since a) we do have some logs here that should have the correct context and b) has some bad interactions with daemonize_process(...) as I was finding. And while we don't need PreserveLoggingContext here specifically, instead, we only need it around the run_command() which is reactor.run() by default. See c1996b6

To get the main context further into the process, we probably need to re-establish the logging context where we register_start/start. Still need to explore this thread ⏩

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant