[NS 1] Add sub-module mounts support + codegen#5167
Conversation
a5dfaf5 to
c8cac1d
Compare
Co-authored-by: Jason Larabie <jason@clockworklabs.io> Signed-off-by: Alessandro Asoni <alessandro@clockworklabs.io>
gefjon
left a comment
There was a problem hiding this comment.
I'm really unhappy with the codegen changes in both TypeScript and C++ here. The TypeScript changes seem very complicated and finnickey, and the C++ changes are an obviously brittle special case. I don't understand why we don't just move to generating a single file for the whole module, if generating individual files and importing across them is causing so many problems.
| r#"// THIS FILE IS AUTOMATICALLY GENERATED BY SPACETIMEDB. EDITS TO THIS FILE | ||
| // WILL NOT BE SAVED. MODIFY TABLES IN YOUR MODULE SOURCE CODE INSTEAD. | ||
|
|
||
| // This was generated using spacetimedb codegen. |
| // (1) its `namespace` field is a C++ keyword; (2) its `module` field creates a | ||
| // circular include chain through RawModuleDefV10 → RawModuleDefV10Section. | ||
| // We break both with a forward declaration and shared_ptr. | ||
| if name.to_string() == "RawModuleMountV10" { |
There was a problem hiding this comment.
This feels brittle and gross. I'm really not a fan. For one thing, we're assigning special semantics to the name RawModuleMountV10 but not doing any work to reserve that name or ensure users don't provide it. For another thing, everything else about this.
|
|
||
| SPACETIMEDB_INTERNAL_PRODUCT_TYPE(RawModuleMountV10) { | ||
| std::string namespace_; // renamed: 'namespace' is a C++ keyword | ||
| std::shared_ptr<SpacetimeDB::Internal::RawModuleDefV10> module; // shared_ptr breaks infinite-size recursion |
There was a problem hiding this comment.
How is it possible that the type definition works in Rust, but you have to manually insert this shared_ptr in the C++ definition?
There was a problem hiding this comment.
The Rust definition has the RawModuleMountV10 in a Vec, which I assume we'd generate as std::vector, so additional indirection shouldn't be necessary.
| /// BFS to find all type refs reachable from `start` in the type-dependency graph. | ||
| fn reachable_from( | ||
| typespace: &spacetimedb_schema::type_for_generate::TypespaceForGenerate, | ||
| start: AlgebraicTypeRef, | ||
| ) -> BTreeSet<AlgebraicTypeRef> { | ||
| let mut visited = BTreeSet::new(); | ||
| let mut stack = vec![start]; | ||
| while let Some(r) = stack.pop() { | ||
| if !visited.insert(r) { | ||
| continue; | ||
| } | ||
| if let Some(def) = typespace.get(r) { | ||
| for neighbor in direct_refs_of_def(def) { | ||
| stack.push(neighbor); | ||
| } | ||
| } | ||
| } | ||
| visited | ||
| } | ||
|
|
||
| /// Get all strongly connected components within the provided ModuleDef types. | ||
| /// Used to compute circular dependencies within the provided ModuleDef. | ||
| fn algebraic_type_scc(module: &ModuleDef) -> BTreeSet<AlgebraicTypeRef> { | ||
| let Some(at_ref) = iter_types(module) | ||
| .find(|ty| type_ref_name(module, ty.ty) == "AlgebraicType") | ||
| .map(|ty| ty.ty) | ||
| else { | ||
| return BTreeSet::new(); | ||
| }; | ||
|
|
||
| let typespace = module.typespace_for_generate(); | ||
| let from_at = reachable_from(typespace, at_ref); | ||
| from_at | ||
| .iter() | ||
| .filter(|&&r| reachable_from(typespace, r).contains(&at_ref)) | ||
| .copied() | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
If we're going to have to compute SCCs to do codegen (which I really hope we don't), then I think we should at least be doing it in the schema crate in a way that's well-tested and reused across all our codegen languages, rather than ad-hoc here. And we should be doing Tarjan's, Kosaraju the third one (I forget what it's called, path-based or something) once for the module, rather than running it for every type.
| /// Converts an `AlgebraicTypeUse` to a TypeScript type expression for use in explicit type aliases. | ||
| /// Used when generating `export type Foo = { ... }` for recursive types. |
There was a problem hiding this comment.
Why do recursive types require generating type aliases? Once you have the SCCs, can't you just put all of the defs from each component together?
| /// Check that no two modules in the mount tree claim the same lifecycle reducer. | ||
| /// | ||
| /// The host assigns exactly one reducer per lifecycle slot; if both the consumer | ||
| /// and a mounted submodule (or two sibling mounts) declare `__init__` (etc.), the | ||
| /// module must be rejected at publish time. |
There was a problem hiding this comment.
I'm surprised that mounts are allowed to claim lifecycle reducers at all, if we're not going to allow multiple hooks on the same lifecycle event.
| if mount.namespace.len() > 63 { | ||
| return Err(ValidationErrors::from(ValidationError::NamespaceTooLong { |
There was a problem hiding this comment.
Why are we limiting the length of namespace names?
There was a problem hiding this comment.
it was part of the proposal
| }) | ||
| } | ||
|
|
||
| fn validate_mount(mount: RawModuleMountV10) -> Result<(String, ModuleDef)> { |
There was a problem hiding this comment.
This could use a doc comment describing what about the mount it validates.
| })); | ||
| } | ||
|
|
||
| Ok((mount.namespace, validate(mount.module)?)) |
There was a problem hiding this comment.
We should combine errors from validating the mounted module with the above two checks.
| // Flatten mounted modules into the root table/reducer/procedure lists and typespace. | ||
| // Each mount's typespace is appended to the merged types with all AlgebraicTypeRef | ||
| // indices shifted by the current length, keeping internal references valid. | ||
| let root_anon_view_count = views.values().filter(|v| v.is_anonymous).count() as u32; | ||
| let root_non_anon_view_count = views.values().filter(|v| !v.is_anonymous).count() as u32; | ||
| let mut flat_tables: Vec<RawTableDefV9> = to_raw(tables); | ||
| let mut flat_reducers: Vec<RawReducerDefV9> = reducers.into_iter().map(|(_, def)| def.into()).collect(); | ||
| let mut flat_misc: Vec<RawMiscModuleExportV9> = column_defaults | ||
| .into_iter() | ||
| .chain(procedures.into_iter().map(|(_, def)| def.into())) | ||
| .chain(views.into_iter().map(|(_, def)| def.into())) | ||
| .collect(); | ||
| let mut merged_types = typespace.types; | ||
| let mut anon_view_offset = root_anon_view_count; | ||
| let mut view_offset = root_non_anon_view_count; | ||
| collect_v9_mounts( | ||
| &mounts, | ||
| "", | ||
| &mut flat_tables, | ||
| &mut flat_reducers, | ||
| &mut flat_misc, | ||
| &mut merged_types, | ||
| &mut anon_view_offset, | ||
| &mut view_offset, | ||
| ); |
There was a problem hiding this comment.
This shouldn't be necessary. Notice that, for all other features that post-date the V9 module def format, this method just ignores them. I don't see why mounts would be any different.
There was a problem hiding this comment.
yeah true, I can remove this
Description of Changes
Add a new recursive "mounts" field to add submodules to module def. Handle code generation for each language.
Handle identifiers with "/" or "." in the name to handle namespaced reducers (e.g. lib/reducer) and namespaced tables (lib.table).
Handle code generation for the recursive type. This needed some special handling in code generation for typescript and c++.
Typescript codegen in particular is quite complex as it tries to handle circular dependency generically. C++ on the other hand is a lot simpler because it hard-codes a special handling of the V10 definition but doesn't solve circular dependencies in general.
I would advice against solving circular dependencies in a generic way for C++ however we could consider modifying the typescript code gen to just have special handling for the V10 recursive definition which would simplify the code quite a lot. I went down the rabbit hole of handling this generically and came out on the other side, but if there is strong opinion to keep the codegen code simple, I am happy to revisit and align to the C++ way.
API and ABI breaking changes
The change is purely additive and newer host versions will accept older module defs. However older host versions will not accept new module defs.
Expected complexity level and risk
5 - While this specific PR is maybe a 4, the overall namespace change is definitely 5. This is a pretty significant change. It's a large diff which touches the module def and changes code that hasn't been touched in a long time (e.g. Identifier).
Testing
Beyond the rust tests defined in this PR, the following tests were done on the full PR sequence once the entire namespace feature was implemented for typescript:
Feature Test Checklist
Module:
Client
CLI
Migration
Commit Log