thermal-service: Separate interface from implementation#777
thermal-service: Separate interface from implementation#777kurtjd wants to merge 1 commit intoOpenDevicePartnership:feature/relay-splitfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR restructures the thermal subsystem by splitting the public interface into a dedicated crate, moving MCTP relay/serialization into a separate relay crate, and refactoring the thermal-service implementation to use direct async calls instead of IPC.
Changes:
- Introduces
thermal-service-interfacetraits for sensor/fan/thermal services and updates thermal-service to implement them. - Adds
thermal-service-relayimplementing MCTP request/response types + a relay handler wrapper around aThermalService. - Refactors thermal-service sensor/fan services to remove IPC channels and use async method calls + event senders.
Reviewed changes
Copilot reviewed 22 out of 25 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
uart-service/Cargo.toml |
Switches dependency from legacy thermal messages crate to the new interface crate. |
thermal-service/src/utils.rs |
Minor doc/comment cleanup; removes Celsius/Kelvin conversion helpers from this crate. |
thermal-service/src/sensor.rs |
Replaces IPC-based sensor device/service with an async trait-based sensor service + event broadcasting. |
thermal-service/src/fan.rs |
Replaces IPC-based fan device/service with an async trait-based fan service + event broadcasting. |
thermal-service/src/lib.rs |
Turns thermal-service into a registry/handle over sensor+fan service slices implementing interface traits. |
thermal-service/src/mptf.rs |
Removes in-crate MPTF message handling (moved to relay crate). |
thermal-service/src/context.rs |
Removes IPC/event-queue-based context layer (no longer needed). |
thermal-service/src/mock/* |
Updates mocks to implement new interface driver traits; adds helper configs; removes old convenience wrappers. |
thermal-service/Cargo.toml |
Replaces thermal-service-messages with thermal-service-interface + thermal-service-relay. |
thermal-service-interface/* |
New crate defining ThermalService, SensorService, FanService, driver traits, and event/error types. |
thermal-service-relay/src/* |
New crate providing MPTF serialization + ThermalServiceRelayHandler<T> wrapper. |
thermal-service-relay/Cargo.toml |
Defines relay crate deps/features. |
embedded-service/src/relay/mod.rs |
Relaxes RelayServiceHandler::process_request future bound (drops Send). |
embedded-service/src/event.rs |
Adds Sender/Receiver trait impls for embassy_sync::channel sender/receiver types. |
examples/std/src/bin/thermal.rs |
Updates example to spawn separate sensor/fan services, consume events, and access via thermal service handle. |
examples/std/Cargo.toml / Cargo.lock |
Updates example deps to new interface/relay crates. |
Cargo.toml / Cargo.lock |
Adds new workspace members and dependency mappings for interface/relay crates. |
Comments suppressed due to low confidence (1)
thermal-service-relay/Cargo.toml:14
- This crate declares a dependency on
time-alarm-service-messages, but there are no references to it inthermal-service-relaysources. With workspacewarnings = deny, an unused dependency can break builds depending on lint settings, and it increases the dependency surface unnecessarily.
Remove this dependency if it’s not needed, or add the missing usage if it is required.
f4c5a92 to
36a10a6
Compare
36a10a6 to
c88f3c3
Compare
c88f3c3 to
a60d7e6
Compare
a60d7e6 to
bf19365
Compare
| driver: Mutex::new(init_params.driver), | ||
| state: Mutex::new(State::default()), | ||
| en_signal: Signal::new(), | ||
| event_senders: Mutex::new(init_params.event_senders), |
There was a problem hiding this comment.
Do we need to put event_senders behind a mutex? It is only called by broadcast_event().
There was a problem hiding this comment.
Yeah, so basically ServiceInner has to have all its field use interior mutability because none of the methods can take &mut self because both Service and Runner need to hold references to ServiceInner simultaneously in the current runnable service design.
If I instead make ServiceInner methods act on &mut self and make Service and Runner hold Mutex instead, this is no good because the Runner will end up holding the lock indefinitely as it runs a never ending async method.
broadcast_event() uses the send() method on event_senders which expects an &mut self so the field needs to be behind a Mutex.
This heavy use of interior mutability is something I want to think more about in the future, but don't see an immediately obvious way to mitigate it. @williampMSFT do you have any ideas on how we can make ServiceInner types work with &mut in the current design, or is this a limitation?
There was a problem hiding this comment.
I think we have basically three cases for state mutation:
-
If you have state that needs to be modified on both your worker task and through your control handle, I think you're kinda stuck using interior mutability / mutexes.
-
If you have state that only needs to be modified on your worker task but can't be modified through your control handle, you can have that state be a member variable of the Runner struct rather than containing it in the ServiceInner struct to avoid needing to mutex it. You could potentially use a Channel or Signal or something to have a function on the control handle signal the runner to change its state to sort of promote something from case (1) to case (2), but Signal and Channel use interior mutability themselves, so you're not actually getting away from using interior mutability, you're just spelling it differently.
-
If you have state that needs to be modified on your control handle but not on your worker task, then if you're willing to not have the control handle be cloneable, you could maybe do something where the Resources type contains two Options, one for shared state and one for control-handle-only state, and then have your control handle take a mutable borrow on the control-handle-only state. However, in the case where you're writing a service that needs a relay service to send messages across espi/i2c/whatever, that relay service will need to either take exclusive ownership over your control handle (since it can't be both cloneable and contain mutable references), or you'll need to use an external mutex to synchronize access to the control handle between the relay handler and whatever else needs to call stuff on the control handle. This means introducing another StaticCell to contain the control handle, so the ergonomics aren't great, and you're still stuck using a mutex, so it's not clear to me that this buys you anything.
There was a problem hiding this comment.
I see because the different async tasks holding references, so we have to utilize interior mutability. I wonder if there is a way around this. Let me ponder on this for a bit. Let's not the block the PR for it.
There was a problem hiding this comment.
In this specific case, it looks like this only gets used by your worker task, so this might actually be an instance of (2) - move this state to be on the runner?
There was a problem hiding this comment.
Actually, in this specific case I don't think you even need to put it in the resources, you can just have it be a stack variable on the worker thread?
There was a problem hiding this comment.
I think I'm following your other comments which gave me some ideas, but I don't think I follow what you mean making it a stack variable. The event_senders list needs to be passed in by the user but the ServiceRunner::run() method only takes self, so we'd need to store the list somewhere in one of these structs (like the Runner struct which I'm working on doing).
There was a problem hiding this comment.
Whoops, sorry, I think I was looking at the wrong file when I wrote this - I was talking about the state variable in sensor.rs, which appears to only be used in async fn check_thresholds(&self, temp: DegreesCelsius), which appears to only be called by async fn handle_sampling(&self). In that case, state can be a stack variable in handle_sampling and you can get rid of the mutex around it. Will take another look at event_senders, it's possible there's nothing to be done for that one
There was a problem hiding this comment.
Got it. Right now I'm trying to split some of ServiceInner up into fields and methods that might only explicitly belong with Runner and explicitly only with Service and keep ServiceInner only for what needs to be common between them. Hopefully should figure out a way to minimize some of this.
There was a problem hiding this comment.
Yeah, took another look at both fan.rs and sensor.rs, and in both of them, it looks like the event_senders is used exclusively by the worker thread, so you can make it either member state of the Runner do that snippet a couple comments back where you split the Resources into two different Options and have the runner take a mutable borrow against the runner-only piece. You're right that any state that needs to be passed in by the caller of new() can't be a stack variable; it only works for sensor.rs -> state because that has a Default::default() implementation that's the only way it gets constructed
bf19365 to
9af9a62
Compare
9af9a62 to
90886e2
Compare
90886e2 to
fb6baf5
Compare
fb6baf5 to
2fca3ff
Compare
2fca3ff to
e514842
Compare
e514842 to
667132a
Compare
| /// Fan error. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| #[cfg_attr(feature = "defmt", derive(defmt::Format))] | ||
| pub enum Error { |
There was a problem hiding this comment.
You should probably add a non_exhaustive here. Otherwise, adding a new variant in the future will be a breaking change.
| /// Fan event. | ||
| #[derive(Debug, PartialEq, Clone, Copy)] | ||
| #[cfg_attr(feature = "defmt", derive(defmt::Format))] | ||
| pub enum Event { |
| /// Fan is on and running at its minimum speed. | ||
| Min, | ||
| /// Fan is ramping up or down along a curve in response to a temperature change. | ||
| Ramping, |
There was a problem hiding this comment.
Should this be Ramping(upsize) or something similar where the usize carries the target set point?
There was a problem hiding this comment.
Hm I might be misunderstanding what you're saying, but Ramping state here means the fan is setting its RPM according to a curve based on temperature readings. It doesn't necessarily represent a target RPM that we are trying to hit.
Ramping might be a confusing name, I really just chose this name since it was used in some of the earlier MPTF docs. A better name might be CurveResponse or something.
Though that is an additional feature I want to add at some point, where instead of immediately setting the RPM, we set a target RPM and slowly ramp up or down to it to prevent jerkiness. But that is a separate idea from here.
| } | ||
|
|
||
| /// Fan service interface trait. | ||
| pub trait FanService: Copy { |
There was a problem hiding this comment.
Why Copy? Everything takes a reference to self. It's not immediately clear why that's required.
| /// Sensor error. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| #[cfg_attr(feature = "defmt", derive(defmt::Format))] | ||
| pub enum Error { |
| /// Sensor encountered a hardware failure. | ||
| Hardware, |
There was a problem hiding this comment.
Carry a kind internally, like embedded-Hal traits?
| /// though internally we still use Celsius for ease of use. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| #[cfg_attr(feature = "defmt", derive(defmt::Format))] | ||
| pub struct DeciKelvin(pub u32); |
There was a problem hiding this comment.
Why not f32? I don't think anyone is trying to run an EC without an FPU. Could be wrong though.
There was a problem hiding this comment.
My understanding is the host-to-EC interface expects integer decikelvin units. This is just how the interface is. Internally we use f32 Celsisus.
| /// The critical temperature threshold of a sensor. | ||
| pub const CRT_TEMP: uuid::Bytes = uuid::uuid!("218246e7-baf6-45f1-aa13-07e4845256b8").to_bytes_le(); | ||
| /// The prochot temperature threshold of a sensor. | ||
| pub const PROC_HOT_TEMP: uuid::Bytes = uuid::uuid!("22dc52d2-fd0b-47ab-95b8-26552f9831a5").to_bytes_le(); | ||
| /// The temperature threshold at which a fan should turn on and begin running at its minimum RPM. | ||
| pub const FAN_MIN_TEMP: uuid::Bytes = uuid::uuid!("ba17b567-c368-48d5-bc6f-a312a41583c1").to_bytes_le(); | ||
| /// The temperature threshold at which a fan should start ramping up. | ||
| pub const FAN_RAMP_TEMP: uuid::Bytes = uuid::uuid!("3a62688c-d95b-4d2d-bacc-90d7a5816bcd").to_bytes_le(); | ||
| /// The temperature threshold at which a fan should be at max speed. | ||
| pub const FAN_MAX_TEMP: uuid::Bytes = uuid::uuid!("dcb758b1-f0fd-4ec7-b2c0-ef1e2a547b76").to_bytes_le(); | ||
| /// The minimum RPM a fan is capable of running at reliably. | ||
| pub const FAN_MIN_RPM: uuid::Bytes = uuid::uuid!("db261c77-934b-45e2-9742-256c62badb7a").to_bytes_le(); | ||
| /// The maximum RPM a fan is capable of running at reliably. | ||
| pub const FAN_MAX_RPM: uuid::Bytes = uuid::uuid!("5cf839df-8be7-42b9-9ac5-3403ca2c8a6a").to_bytes_le(); | ||
| /// The current RPM of a fan. | ||
| pub const FAN_CURRENT_RPM: uuid::Bytes = uuid::uuid!("adf95492-0776-4ffc-84f3-b6c8b5269683").to_bytes_le(); |
There was a problem hiding this comment.
Looks like we need an MPTF crate which defines these and provides no_std helpers.
There was a problem hiding this comment.
That could be a good idea. Because these are used by the ec-test-app. Was planning to have ec-test-app use thermal-service-relay as a dep, but might be better to instead split these into a separate crate that can be sued here and by ec-test-app.

This PR overhauls the thermal-service a bit. Essentially, the thermal interface has been split from the implementation into a trait, IPC has been removed and replaced with direct async calls, and a relay service handler wrapper has been introduced.
Instead of one monolithic "thermal service", there are really 3 services here: sensor service, fan service, and thermal service, each with their own interfaces. The thermal service now mainly exists as the relay handler, since it can keep track of the different spawned sensor and fan services by ID, therefore knowing which sensors and fans to talk to when a request comes from the host.
Also split into a thermal service relay handler type which is intended to wrap any thermal service implementation.
There are some things I will likely want to tweak in the future based off some work Robert has been doing such as better use of some of the event sender/receiver traits and use of the Lockable trait (I heavily use interior mutability with direct dep on embassy Mutex, not sure if I can somehow make this generic over the Lockable trait?).
With the removal of IPC, need to also think about how to handle heterogeneous collections of sensors and fans. Currently this design just assumes the user will need to handle that on their own, by introducing enum dispatch or something behind the scenes, but will investigate if there are ways to make that easier (such as using the
enum_dispatchcrate and marking these traits).This is of course a large breaking change so will announce on Zulip after merge.
Resolves #756
Resolves #781