Skip to content

thermal-service: Separate interface from implementation#777

Open
kurtjd wants to merge 1 commit intoOpenDevicePartnership:feature/relay-splitfrom
kurtjd:thermal-interface
Open

thermal-service: Separate interface from implementation#777
kurtjd wants to merge 1 commit intoOpenDevicePartnership:feature/relay-splitfrom
kurtjd:thermal-interface

Conversation

@kurtjd
Copy link
Copy Markdown
Contributor

@kurtjd kurtjd commented Apr 7, 2026

This PR overhauls the thermal-service a bit. Essentially, the thermal interface has been split from the implementation into a trait, IPC has been removed and replaced with direct async calls, and a relay service handler wrapper has been introduced.

Instead of one monolithic "thermal service", there are really 3 services here: sensor service, fan service, and thermal service, each with their own interfaces. The thermal service now mainly exists as the relay handler, since it can keep track of the different spawned sensor and fan services by ID, therefore knowing which sensors and fans to talk to when a request comes from the host.

Also split into a thermal service relay handler type which is intended to wrap any thermal service implementation.

There are some things I will likely want to tweak in the future based off some work Robert has been doing such as better use of some of the event sender/receiver traits and use of the Lockable trait (I heavily use interior mutability with direct dep on embassy Mutex, not sure if I can somehow make this generic over the Lockable trait?).

With the removal of IPC, need to also think about how to handle heterogeneous collections of sensors and fans. Currently this design just assumes the user will need to handle that on their own, by introducing enum dispatch or something behind the scenes, but will investigate if there are ways to make that easier (such as using the enum_dispatch crate and marking these traits).

This is of course a large breaking change so will announce on Zulip after merge.

Resolves #756
Resolves #781

@kurtjd kurtjd self-assigned this Apr 7, 2026
@kurtjd kurtjd added the enhancement New feature or request label Apr 7, 2026
Copilot AI review requested due to automatic review settings April 7, 2026 17:43
@github-project-automation github-project-automation bot moved this to In progress in ODP Backlog Apr 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the thermal subsystem by splitting the public interface into a dedicated crate, moving MCTP relay/serialization into a separate relay crate, and refactoring the thermal-service implementation to use direct async calls instead of IPC.

Changes:

  • Introduces thermal-service-interface traits for sensor/fan/thermal services and updates thermal-service to implement them.
  • Adds thermal-service-relay implementing MCTP request/response types + a relay handler wrapper around a ThermalService.
  • Refactors thermal-service sensor/fan services to remove IPC channels and use async method calls + event senders.

Reviewed changes

Copilot reviewed 22 out of 25 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
uart-service/Cargo.toml Switches dependency from legacy thermal messages crate to the new interface crate.
thermal-service/src/utils.rs Minor doc/comment cleanup; removes Celsius/Kelvin conversion helpers from this crate.
thermal-service/src/sensor.rs Replaces IPC-based sensor device/service with an async trait-based sensor service + event broadcasting.
thermal-service/src/fan.rs Replaces IPC-based fan device/service with an async trait-based fan service + event broadcasting.
thermal-service/src/lib.rs Turns thermal-service into a registry/handle over sensor+fan service slices implementing interface traits.
thermal-service/src/mptf.rs Removes in-crate MPTF message handling (moved to relay crate).
thermal-service/src/context.rs Removes IPC/event-queue-based context layer (no longer needed).
thermal-service/src/mock/* Updates mocks to implement new interface driver traits; adds helper configs; removes old convenience wrappers.
thermal-service/Cargo.toml Replaces thermal-service-messages with thermal-service-interface + thermal-service-relay.
thermal-service-interface/* New crate defining ThermalService, SensorService, FanService, driver traits, and event/error types.
thermal-service-relay/src/* New crate providing MPTF serialization + ThermalServiceRelayHandler<T> wrapper.
thermal-service-relay/Cargo.toml Defines relay crate deps/features.
embedded-service/src/relay/mod.rs Relaxes RelayServiceHandler::process_request future bound (drops Send).
embedded-service/src/event.rs Adds Sender/Receiver trait impls for embassy_sync::channel sender/receiver types.
examples/std/src/bin/thermal.rs Updates example to spawn separate sensor/fan services, consume events, and access via thermal service handle.
examples/std/Cargo.toml / Cargo.lock Updates example deps to new interface/relay crates.
Cargo.toml / Cargo.lock Adds new workspace members and dependency mappings for interface/relay crates.
Comments suppressed due to low confidence (1)

thermal-service-relay/Cargo.toml:14

  • This crate declares a dependency on time-alarm-service-messages, but there are no references to it in thermal-service-relay sources. With workspace warnings = deny, an unused dependency can break builds depending on lint settings, and it increases the dependency surface unnecessarily.

Remove this dependency if it’s not needed, or add the missing usage if it is required.

@kurtjd kurtjd moved this from In progress to In review in ODP Backlog Apr 7, 2026
@kurtjd kurtjd force-pushed the thermal-interface branch from f4c5a92 to 36a10a6 Compare April 7, 2026 19:25
Copilot AI review requested due to automatic review settings April 7, 2026 19:27
@kurtjd kurtjd force-pushed the thermal-interface branch from 36a10a6 to c88f3c3 Compare April 7, 2026 19:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 25 changed files in this pull request and generated 1 comment.

@kurtjd kurtjd force-pushed the thermal-interface branch from c88f3c3 to a60d7e6 Compare April 8, 2026 20:49
@kurtjd kurtjd marked this pull request as ready for review April 8, 2026 20:54
@kurtjd kurtjd requested a review from a team as a code owner April 8, 2026 20:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 24 changed files in this pull request and generated 4 comments.

@kurtjd kurtjd force-pushed the thermal-interface branch from a60d7e6 to bf19365 Compare April 8, 2026 21:09
driver: Mutex::new(init_params.driver),
state: Mutex::new(State::default()),
en_signal: Signal::new(),
event_senders: Mutex::new(init_params.event_senders),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to put event_senders behind a mutex? It is only called by broadcast_event().

Copy link
Copy Markdown
Contributor Author

@kurtjd kurtjd Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so basically ServiceInner has to have all its field use interior mutability because none of the methods can take &mut self because both Service and Runner need to hold references to ServiceInner simultaneously in the current runnable service design.

If I instead make ServiceInner methods act on &mut self and make Service and Runner hold Mutex instead, this is no good because the Runner will end up holding the lock indefinitely as it runs a never ending async method.

broadcast_event() uses the send() method on event_senders which expects an &mut self so the field needs to be behind a Mutex.

This heavy use of interior mutability is something I want to think more about in the future, but don't see an immediately obvious way to mitigate it. @williampMSFT do you have any ideas on how we can make ServiceInner types work with &mut in the current design, or is this a limitation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have basically three cases for state mutation:

  1. If you have state that needs to be modified on both your worker task and through your control handle, I think you're kinda stuck using interior mutability / mutexes.

  2. If you have state that only needs to be modified on your worker task but can't be modified through your control handle, you can have that state be a member variable of the Runner struct rather than containing it in the ServiceInner struct to avoid needing to mutex it. You could potentially use a Channel or Signal or something to have a function on the control handle signal the runner to change its state to sort of promote something from case (1) to case (2), but Signal and Channel use interior mutability themselves, so you're not actually getting away from using interior mutability, you're just spelling it differently.

  3. If you have state that needs to be modified on your control handle but not on your worker task, then if you're willing to not have the control handle be cloneable, you could maybe do something where the Resources type contains two Options, one for shared state and one for control-handle-only state, and then have your control handle take a mutable borrow on the control-handle-only state. However, in the case where you're writing a service that needs a relay service to send messages across espi/i2c/whatever, that relay service will need to either take exclusive ownership over your control handle (since it can't be both cloneable and contain mutable references), or you'll need to use an external mutex to synchronize access to the control handle between the relay handler and whatever else needs to call stuff on the control handle. This means introducing another StaticCell to contain the control handle, so the ergonomics aren't great, and you're still stuck using a mutex, so it's not clear to me that this buys you anything.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see because the different async tasks holding references, so we have to utilize interior mutability. I wonder if there is a way around this. Let me ponder on this for a bit. Let's not the block the PR for it.

Copy link
Copy Markdown
Contributor

@williampMSFT williampMSFT Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case, it looks like this only gets used by your worker task, so this might actually be an instance of (2) - move this state to be on the runner?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in this specific case I don't think you even need to put it in the resources, you can just have it be a stack variable on the worker thread?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm following your other comments which gave me some ideas, but I don't think I follow what you mean making it a stack variable. The event_senders list needs to be passed in by the user but the ServiceRunner::run() method only takes self, so we'd need to store the list somewhere in one of these structs (like the Runner struct which I'm working on doing).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, sorry, I think I was looking at the wrong file when I wrote this - I was talking about the state variable in sensor.rs, which appears to only be used in async fn check_thresholds(&self, temp: DegreesCelsius), which appears to only be called by async fn handle_sampling(&self). In that case, state can be a stack variable in handle_sampling and you can get rid of the mutex around it. Will take another look at event_senders, it's possible there's nothing to be done for that one

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Right now I'm trying to split some of ServiceInner up into fields and methods that might only explicitly belong with Runner and explicitly only with Service and keep ServiceInner only for what needs to be common between them. Hopefully should figure out a way to minimize some of this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, took another look at both fan.rs and sensor.rs, and in both of them, it looks like the event_senders is used exclusively by the worker thread, so you can make it either member state of the Runner do that snippet a couple comments back where you split the Resources into two different Options and have the runner take a mutable borrow against the runner-only piece. You're right that any state that needs to be passed in by the caller of new() can't be a stack variable; it only works for sensor.rs -> state because that has a Default::default() implementation that's the only way it gets constructed

Copilot AI review requested due to automatic review settings April 9, 2026 17:47
@kurtjd kurtjd force-pushed the thermal-interface branch from bf19365 to 9af9a62 Compare April 9, 2026 17:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 23 changed files in this pull request and generated 4 comments.

@kurtjd kurtjd force-pushed the thermal-interface branch from 9af9a62 to 90886e2 Compare April 9, 2026 20:42
Copilot AI review requested due to automatic review settings April 9, 2026 20:43
@kurtjd kurtjd force-pushed the thermal-interface branch from 90886e2 to fb6baf5 Compare April 9, 2026 20:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 23 changed files in this pull request and generated 1 comment.

@kurtjd kurtjd force-pushed the thermal-interface branch from fb6baf5 to 2fca3ff Compare April 9, 2026 21:01
Copilot AI review requested due to automatic review settings April 9, 2026 21:06
@kurtjd kurtjd force-pushed the thermal-interface branch from 2fca3ff to e514842 Compare April 9, 2026 21:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 23 changed files in this pull request and generated 4 comments.

@kurtjd kurtjd force-pushed the thermal-interface branch from e514842 to 667132a Compare April 9, 2026 21:22
/// Fan error.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[cfg_attr(feature = "defmt", derive(defmt::Format))]
pub enum Error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably add a non_exhaustive here. Otherwise, adding a new variant in the future will be a breaking change.

/// Fan event.
#[derive(Debug, PartialEq, Clone, Copy)]
#[cfg_attr(feature = "defmt", derive(defmt::Format))]
pub enum Event {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

/// Fan is on and running at its minimum speed.
Min,
/// Fan is ramping up or down along a curve in response to a temperature change.
Ramping,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Ramping(upsize) or something similar where the usize carries the target set point?

Copy link
Copy Markdown
Contributor Author

@kurtjd kurtjd Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I might be misunderstanding what you're saying, but Ramping state here means the fan is setting its RPM according to a curve based on temperature readings. It doesn't necessarily represent a target RPM that we are trying to hit.

Ramping might be a confusing name, I really just chose this name since it was used in some of the earlier MPTF docs. A better name might be CurveResponse or something.

Though that is an additional feature I want to add at some point, where instead of immediately setting the RPM, we set a target RPM and slowly ramp up or down to it to prevent jerkiness. But that is a separate idea from here.

}

/// Fan service interface trait.
pub trait FanService: Copy {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Copy? Everything takes a reference to self. It's not immediately clear why that's required.

/// Sensor error.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[cfg_attr(feature = "defmt", derive(defmt::Format))]
pub enum Error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non_exhaustive?

Comment on lines +12 to +13
/// Sensor encountered a hardware failure.
Hardware,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carry a kind internally, like embedded-Hal traits?

/// though internally we still use Celsius for ease of use.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[cfg_attr(feature = "defmt", derive(defmt::Format))]
pub struct DeciKelvin(pub u32);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not f32? I don't think anyone is trying to run an EC without an FPU. Could be wrong though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is the host-to-EC interface expects integer decikelvin units. This is just how the interface is. Internally we use f32 Celsisus.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Comment on lines +32 to +47
/// The critical temperature threshold of a sensor.
pub const CRT_TEMP: uuid::Bytes = uuid::uuid!("218246e7-baf6-45f1-aa13-07e4845256b8").to_bytes_le();
/// The prochot temperature threshold of a sensor.
pub const PROC_HOT_TEMP: uuid::Bytes = uuid::uuid!("22dc52d2-fd0b-47ab-95b8-26552f9831a5").to_bytes_le();
/// The temperature threshold at which a fan should turn on and begin running at its minimum RPM.
pub const FAN_MIN_TEMP: uuid::Bytes = uuid::uuid!("ba17b567-c368-48d5-bc6f-a312a41583c1").to_bytes_le();
/// The temperature threshold at which a fan should start ramping up.
pub const FAN_RAMP_TEMP: uuid::Bytes = uuid::uuid!("3a62688c-d95b-4d2d-bacc-90d7a5816bcd").to_bytes_le();
/// The temperature threshold at which a fan should be at max speed.
pub const FAN_MAX_TEMP: uuid::Bytes = uuid::uuid!("dcb758b1-f0fd-4ec7-b2c0-ef1e2a547b76").to_bytes_le();
/// The minimum RPM a fan is capable of running at reliably.
pub const FAN_MIN_RPM: uuid::Bytes = uuid::uuid!("db261c77-934b-45e2-9742-256c62badb7a").to_bytes_le();
/// The maximum RPM a fan is capable of running at reliably.
pub const FAN_MAX_RPM: uuid::Bytes = uuid::uuid!("5cf839df-8be7-42b9-9ac5-3403ca2c8a6a").to_bytes_le();
/// The current RPM of a fan.
pub const FAN_CURRENT_RPM: uuid::Bytes = uuid::uuid!("adf95492-0776-4ffc-84f3-b6c8b5269683").to_bytes_le();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need an MPTF crate which defines these and provides no_std helpers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be a good idea. Because these are used by the ec-test-app. Was planning to have ec-test-app use thermal-service-relay as a dep, but might be better to instead split these into a separate crate that can be sued here and by ec-test-app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

5 participants