Skip to content

Refactor cache metrics to be homeserver-scoped #18604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Jun 27, 2025

Refactor cache metrics to be homeserver-scoped (add server_name label to cache metrics).

Part of #18592

This can be reviewed commit by commit to skip over some of the bulk refactor but there are some fixes down the line and I'd prefer to keep the history than clean it all up in a rebase.

Testing strategy

See behavior of previous metrics listener

  1. Add the metrics listener in your homeserver.yaml
    listeners:
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
  2. Start the homeserver: poetry run synapse_homeserver --config-path homeserver.yaml
  3. Fetch http://localhost:9323/metrics
  4. Observe response includes the cache metrics (synapse_util_caches_cache_size, synapse_util_caches_cache_hits, synapse_util_caches_cache_evicted_size, etc)

See behavior of the http metrics resource

  1. Add the metrics resource to a new or existing http listeners in your homeserver.yaml
    listeners:
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
  2. Start the homeserver: poetry run synapse_homeserver --config-path homeserver.yaml
  3. Fetch http://localhost:9322/_synapse/metrics (it's just a GET request so you can even do in the browser)
  4. Observe response includes the cache metrics (synapse_util_caches_cache_size, synapse_util_caches_cache_hits, synapse_util_caches_cache_evicted_size, etc): example, example from develop

Dev notes

LruCache/@cached, CacheMetric

register_cache(


ExpiringCache(

ResponseCache(

StreamChangeCache(

TTLCache(
	WellKnownResolver( -> MatrixFederationAgent(

LruCache(
	DeferredCache( -> DeferredCacheDescriptor( -> _CachedFunctionDescriptor( -> cached(
	AsyncLruCache(
	DictionaryCache(

Todo

  • Update @cached
  • Ensure scripts-dev/mypy_synapse_plugin.py works correctly with cached functions
    • This was more relevant when I was thinking I needed to change @cached more but should be fine with how we've done it.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

```
synapse/replication/tcp/streams/_base.py:568: error: Cannot determine type of "_device_list_id_gen"  [has-type]
synapse/storage/databases/main/event_push_actions.py:256: error: Cannot determine type of "server_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/event_push_actions.py:256: error: Cannot determine type of "server_name" in base class "EventsWorkerStore"  [misc]
synapse/storage/databases/main/event_push_actions.py:256: error: Cannot determine type of "_instance_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/metrics.py:64: error: Cannot determine type of "server_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/metrics.py:64: error: Cannot determine type of "server_name" in base class "EventsWorkerStore"  [misc]
synapse/storage/databases/main/metrics.py:64: error: Cannot determine type of "_instance_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/push_rule.py:118: error: Cannot determine type of "_instance_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/push_rule.py:118: error: Cannot determine type of "server_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/push_rule.py:118: error: Cannot determine type of "server_name" in base class "EventsWorkerStore"  [misc]
synapse/storage/databases/main/account_data.py:60: error: Cannot determine type of "_instance_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/account_data.py:60: error: Cannot determine type of "server_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/account_data.py:60: error: Cannot determine type of "server_name" in base class "EventsWorkerStore"  [misc]
synapse/storage/databases/main/__init__.py:114: error: Cannot determine type of "server_name" in base class "PresenceStore"  [misc]
synapse/storage/databases/main/__init__.py:114: error: Cannot determine type of "server_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/__init__.py:114: error: Cannot determine type of "server_name" in base class "ClientIpWorkerStore"  [misc]
synapse/storage/databases/main/__init__.py:114: error: Cannot determine type of "server_name" in base class "DeviceInboxWorkerStore"  [misc]
synapse/storage/databases/main/__init__.py:114: error: Cannot determine type of "server_name" in base class "EventsWorkerStore"  [misc]
synapse/storage/databases/main/__init__.py:114: error: Cannot determine type of "_instance_name" in base class "ReceiptsWorkerStore"  [misc]
synapse/storage/databases/main/__init__.py:114: error: Cannot determine type of "_instance_name" in base class "DeviceInboxWorkerStore"  [misc]
synapse/app/generic_worker.py:117: error: Cannot determine type of "_instance_name" in base class "DeviceInboxWorkerStore"  [misc]
synapse/app/generic_worker.py:117: error: Cannot determine type of "_instance_name" in base class "ReceiptsWorkerStore"  [misc]
Found 22 errors in 7 files (checked 937 source files)
```
@@ -31,6 +31,7 @@
from prometheus_client.core import Gauge

from synapse.config.cache import add_resizable_cache
from synapse.metrics import INSTANCE_LABEL_NAME
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics being refactored to be homeserver scoped are in this file.

The rest of the changes are to support that change and supply the server_name to the instance label.

Comment on lines +157 to +162
class HasServerName(Protocol):
server_name: str
"""
The homeserver name that this cache is associated with (used to label the metric)
(`hs.hostname`).
"""
Copy link
Contributor Author

@MadLittleMods MadLittleMods Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is copied from Measure

class HasClock(Protocol):
clock: Clock

(The Measure pattern is also updated in #18601)

@MadLittleMods MadLittleMods marked this pull request as ready for review July 1, 2025 00:59
@MadLittleMods MadLittleMods requested a review from a team as a code owner July 1, 2025 00:59
Conflicts:
	synapse/http/federation/matrix_federation_agent.py
	synapse/http/federation/well_known_resolver.py
	synapse/storage/_base.py
	synapse/storage/controllers/state.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant