Hi — while refactoring the orynq-ai-auditability ability for #249, we hit a durability gap in the file-storage API that probably affects every community ability doing JSON persistence.
The gap
check_if_file_exists, read_file, write_file, and delete_file are the only storage primitives documented in docs/OpenHome_SDK_Reference.md §8. There is no atomic-replace:
- No
replace_file(src, dst) / rename equivalent
- No
atomic_write_file(path, contents) helper
- No fsync-after-write guarantee
The docs prescribe delete-then-write or mode="w" for overwrites. Both have a crash window: if the device loses power, the app is killed, or the write fails partway through, the file ends up empty / truncated / corrupt, and the ability boots with zero state on next run.
For abilities that maintain append-only audit chains, monotonic counters, or any state where "lose half of it" is worse than "have the old version," that window is a correctness issue, not a performance one.
What we did as a workaround (PR #249 commit e404687)
Forward-journal pattern on top of the four existing primitives:
- Write new contents to
<path>.tmp
- Read back and verify (size check + JSON parse)
delete_file(<path>)
write_file(<path>, contents)
delete_file(<path>.tmp)
- On load: if
<path> is missing/corrupt and <path>.tmp parses, recover from .tmp and promote on next write
This closes the window for our ability, but:
- Steps 3–4 still have a brief crash window where the real file doesn't exist on disk. Only a true rename is fully atomic.
- Every ability author doing JSON persistence now has to re-implement this pattern.
- Most won't — they'll follow the docs' "delete-then-write" prescription and eat the tearing risk silently.
Proposed fix
A single primitive in the SDK:
sdk.atomic_write_file(path: str, contents: str | bytes, in_ability_directory: bool = True) -> None
Implementation is standard write-to-tmp + fsync + os.rename (or os.replace — atomic on both POSIX and NTFS). ~5 lines plus error handling.
That one helper closes the class of bug across every ability without each author having to know POSIX filesystem semantics. The existing four primitives can stay; this is additive.
Alternatively (cheaper)
If atomic rename isn't trivial given the SDK's deployment model (sandboxed FS, remote storage, etc.), a clearly-flagged write_file_with_backup that does copy real -> .bak; write real; delete .bak would at least give ability authors a recoverable path. Our forward-journal pattern is close to this shape but journals forward instead of backward. Either direction is strictly better than the current docs-prescribed flow.
Scope
Happy to open a follow-up PR with the implementation if the core team agrees on the shape. We're already carrying the workaround in community/orynq-ai-auditability/ and can migrate it back to the SDK-provided primitive once it lands.
Hi — while refactoring the
orynq-ai-auditabilityability for #249, we hit a durability gap in the file-storage API that probably affects every community ability doing JSON persistence.The gap
check_if_file_exists,read_file,write_file, anddelete_fileare the only storage primitives documented indocs/OpenHome_SDK_Reference.md§8. There is no atomic-replace:replace_file(src, dst)/ rename equivalentatomic_write_file(path, contents)helperThe docs prescribe delete-then-write or
mode="w"for overwrites. Both have a crash window: if the device loses power, the app is killed, or the write fails partway through, the file ends up empty / truncated / corrupt, and the ability boots with zero state on next run.For abilities that maintain append-only audit chains, monotonic counters, or any state where "lose half of it" is worse than "have the old version," that window is a correctness issue, not a performance one.
What we did as a workaround (PR #249 commit e404687)
Forward-journal pattern on top of the four existing primitives:
<path>.tmpdelete_file(<path>)write_file(<path>, contents)delete_file(<path>.tmp)<path>is missing/corrupt and<path>.tmpparses, recover from.tmpand promote on next writeThis closes the window for our ability, but:
Proposed fix
A single primitive in the SDK:
Implementation is standard write-to-tmp + fsync +
os.rename(oros.replace— atomic on both POSIX and NTFS). ~5 lines plus error handling.That one helper closes the class of bug across every ability without each author having to know POSIX filesystem semantics. The existing four primitives can stay; this is additive.
Alternatively (cheaper)
If atomic rename isn't trivial given the SDK's deployment model (sandboxed FS, remote storage, etc.), a clearly-flagged
write_file_with_backupthat doescopy real -> .bak; write real; delete .bakwould at least give ability authors a recoverable path. Our forward-journal pattern is close to this shape but journals forward instead of backward. Either direction is strictly better than the current docs-prescribed flow.Scope
Happy to open a follow-up PR with the implementation if the core team agrees on the shape. We're already carrying the workaround in
community/orynq-ai-auditability/and can migrate it back to the SDK-provided primitive once it lands.