Skip to content

feat: Add persistent device database for offline/degraded mode operation#548

Open
maxfield-allison wants to merge 6 commits into
wez:mainfrom
maxfield-allison:feature/lan-fallback-device-database
Open

feat: Add persistent device database for offline/degraded mode operation#548
maxfield-allison wants to merge 6 commits into
wez:mainfrom
maxfield-allison:feature/lan-fallback-device-database

Conversation

@maxfield-allison

Copy link
Copy Markdown

Summary

This PR implements the persistent device database foundation we discussed in #76 and #537. It enables govee2mqtt to gracefully degrade to LAN-only mode when Govee APIs are unavailable, rather than crashing.

Problem

When all Govee APIs fail (authentication error, rate limiting, abnormal activity detection, or network issues) and the SQLite cache is empty or cleared, govee2mqtt crashes with the ISSUE_76_EXPLANATION error. This prevents LAN control even though LAN devices are reachable on the local network.

Solution

A persistent JSON device database that:

  • Stores device metadata (id, sku, name, room) learned from Platform and Undoc APIs
  • Survives cache clears and container restarts (separate from SQLite cache)
  • Enables graceful degradation to LAN-only mode
  • Preserves device names for Home Assistant entity stability
  • Human-editable JSON format as you requested

Implementation Details

Following the architecture you outlined in #537:

Storage (src/device_database.rs)

#[derive(Serialize, Deserialize)]
pub struct DeviceDatabase {
    pub version: u32,
    pub devices: BTreeMap<String, PersistedDevice>,
}
  • Location: /data/devices.json (HA addon) or ~/.cache/govee2mqtt/devices.json
  • Atomic writes: temp file + rename pattern for crash safety
  • User override fields: user_name and user_room for future editing capability

Startup Flow (src/commands/serve.rs)

  1. Detect startup mode: Fresh, Upgrade, or Normal
  2. Load existing device database
  3. Attempt API discovery
  4. On API failure with cached data → continue with warnings
  5. Populate in-memory state from database in degraded mode
  6. Start LAN discovery

Fallback Logic

  • API succeeds → database updated with latest metadata
  • API fails + database exists → use cached data, log warnings
  • API fails + no database → fail with ISSUE_76 (unchanged behavior for fresh installs)

Changes

File Changes
src/device_database.rs New - Device database implementation
src/commands/serve.rs Startup integration, fallback logic
src/service/device.rs cached_name/cached_room fields
src/service/state.rs Device database handle in state
src/main.rs Module declaration
Cargo.toml Dependencies (chrono, dirs, tempfile)

Testing

Tested on a Docker Swarm deployment with 7 Govee devices:

Scenario Result
Valid credentials ✅ Database populated, all devices work
Invalid API key + SQLite cache exists ✅ Cache returns stale data, service runs
Invalid credentials + NO SQLite cache Loads from devices.json, LAN control works
Fresh install + no credentials ✅ Fails as expected (no data to use)

Key Test Output (Scenario 3)

Platform API discovery failed: status 401 Unauthorized
Continuing with 7 cached devices from persistent database
Undoc API discovery failed: Incorrect user name or password
Continuing with 7 cached devices from persistent database
Populating device state from persistent database...
Loaded 7 devices into memory from persistent database
Running in degraded mode - device metadata may be stale
LAN-capable devices should still be controllable
Starting LAN discovery
...
Using LAN API to set Frame Front Accent power state

Non-Breaking

  • ✅ All existing tests pass (cargo test)
  • ✅ Existing SQLite cache behavior unchanged
  • ✅ Database is purely an additive resilience layer
  • ✅ Fresh installs without API access still fail appropriately

Future Work (Not in this PR)

This is intentionally a focused foundational PR. Future enhancements could include:

  1. Entity ID stability - Lock entity names on first registration
  2. Web UI management - View/edit device database from the web interface
  3. Per-device API preferences - Override which API to use per device
  4. Explicit LAN IP configuration - Manual IP entry for VLAN/discovery issues

Related


Happy to address any feedback or make adjustments to the approach!

This implements the device database foundation that Wez requested for
resolving Issue wez#76 (LAN-only fallback when APIs are unavailable).

## Problem Solved

When all Govee APIs fail (authentication error, rate limiting, network
issues) and the SQLite cache is empty/cleared, govee2mqtt would crash
with the ISSUE_76_EXPLANATION error. This prevented LAN control even
though LAN devices were reachable.

## Solution

A persistent JSON device database that:
- Stores device metadata (id, sku, name, room) learned from APIs
- Survives cache clears and container restarts
- Enables graceful degradation to LAN-only mode
- Preserves device names for Home Assistant entity stability

## Changes

### New: src/device_database.rs
- JSON storage at /data/devices.json (HA addon) or ~/.cache/govee2mqtt/
- Atomic writes (temp file + rename) for crash safety
- StartupMode detection: Fresh, Upgrade, or Normal
- User override fields for future editing capability

### Modified: src/commands/serve.rs
- Load device database on startup
- Fallback to cached devices when APIs fail (with warnings)
- Populate in-memory state from database in degraded mode
- Update database on successful API/LAN discovery

### Modified: src/service/device.rs
- Added cached_name/cached_room fields
- name() and room_name() fall back to cached values

## Tested Scenarios

1. Valid credentials: Database populated, all devices work
2. Invalid creds + SQLite cache: Cache returns stale data
3. Invalid creds + NO cache: Loads from devices.json, LAN works
4. Fresh install + no creds: Fails as expected (no data)

## Non-Breaking

- All existing tests pass
- Existing SQLite cache behavior unchanged
- Database is purely additive resilience layer
@inventor7777

inventor7777 commented Dec 16, 2025

Copy link
Copy Markdown

Whoa that was fast!! Looking forward to per-device LAN API only mode :)

@maxfield-allison

maxfield-allison commented Dec 16, 2025

Copy link
Copy Markdown
Author

Whoa that was fast!! Looking forward to per-device LAN API only mode :)

I really just want to not worry about getting locked out for abnormal activity ever again if i have connectivity issues or bounce my services multiple times while messing around lol

@maxfield-allison

Copy link
Copy Markdown
Author

@inventor7777 not to say i don't also want enhanced features, lol, im sure I'll contribute more as time permits. (mister thumbs down... XD)

@inventor7777

inventor7777 commented Dec 19, 2025

Copy link
Copy Markdown

OOOPS I'm sorry i didn't mean to do that 😂😂
(GitHub sometimes behaves weirdly on mobile)

@wez wez left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for this! Sorry for the delay in responding; it's a busy time of year!

Comment thread src/commands/serve.rs Outdated
Comment thread src/device_database.rs Outdated
Comment thread src/device_database.rs Outdated
Comment thread src/device_database.rs Outdated
Comment thread src/device_database.rs Outdated
Comment thread src/commands/serve.rs Outdated
Comment thread src/commands/serve.rs Outdated
Comment thread src/commands/serve.rs Outdated
Comment thread src/service/device.rs Outdated
Comment thread src/commands/serve.rs Outdated
@maxfield-allison

Copy link
Copy Markdown
Author

Thank you again for this! Sorry for the delay in responding; it's a busy time of year!

I completely understand. We hope for side project time but life always seems to find a way to make it difficult!

@maxfield-allison

maxfield-allison commented Dec 27, 2025

Copy link
Copy Markdown
Author

one more commit incoming shortly to address the remaining items from review. or i can close and push one clean if you prefer, apologies.

- Avoid unnecessary clone in device iteration (use reference)
- Use defensive slicing with .get().unwrap_or() to prevent panics
- Fix TOCTOU race condition in database load
- Use NamedTempFile for robust atomic writes
- Rename cached_name/cached_room to name/room (database is source of truth)
- Restructure discovery flow: always load database first
- Remove ISSUE_76 crash - fresh installs use generated names
@maxfield-allison maxfield-allison force-pushed the feature/lan-fallback-device-database branch from e755006 to a3838fd Compare December 27, 2025 14:55
@maxfield-allison

Copy link
Copy Markdown
Author

need to correct database saves on LAN status query

maxfield-allison and others added 3 commits December 27, 2025 09:55
The save was being called on every LAN status response, causing multiple
writes per minute. Database saves should only happen after API enumeration,
not on every status update.
This was referenced Dec 27, 2025
@AlgoClaw

Copy link
Copy Markdown

Not sure if this helps, but there is similar functionality in my fork (AlgoClaw/govee2mqtt) after creating a feature request in this issue (which I closed shortly after).

You can map the internal directory /JSONs as a persistent volume which can store saved scene commands for LAN control.

For example, if you save H61A8_final.json in /JSONs, the Govee2MQTT bridge will use the scenes defined in the the JSON file.

Screenshot_20251231_122128

What's is cool too is that you can keep old/deleted scenes Govee removed.

I am a total noob when it comes to PRs and git stuff. So, I do not know how to help there.

@inventor7777

Copy link
Copy Markdown

Nice thinking.

that's another pain point I have with the current implementation - it doesn't pull everything (I have hundreds of DIYs and it only pulls half or less)

@mercertom

Copy link
Copy Markdown

When can we get this? I can't use adaptive lighting until this is fixed. My service just insta-pings over the daily or minute or second or hour limit, and then my lights are stuck at some arbitrary brightness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants