You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
MuPT still does not provide a native way to save and reload a Primitive representation. This is a real workflow bottleneck for systems that take meaningful effort to build, especially SAAMR-compliant systems used in mupt-examples.
A common workflow is: build locally -> transfer to HPC -> run simulation -> retrieve trajectory -> analyze later
Today, if the original Python session is gone, the MuPT representation is gone as well. That means users lose the tree-like relational hierarchy of Primitive objects and their children, including the information needed to recover MuPT-native analysis pathways such as primitive_to_mdanalysis().
This issue is intended as a concrete implementation-oriented continuation of #11, which discussed canonical forms and serialization at a broader level. The goal here is to define and implement a practical, versioned serialization/deserialization path for Primitive-based representations.
Describe the solution you'd like
Add a versioned serialization/deserialization protocol for Primitive hierarchies.
The main requirement is faithful preservation of the relational structure of a MuPT representation, including:
parent/child hierarchy
child handles / labels
connector information
internal and external connector mappings
topology / graph connectivity
geometry / shape information
element assignments
relevant metadata
Initial support can focus on SAAMR-compliant systems, but the design should ideally extend to more general Primitive trees.
A simple public API would be useful, for example:
save(primitive, path)
load(path) -> Primitive
The on-disk format should be versioned so future schema evolution is explicit and old files can be handled gracefully.
A schema-driven implementation is likely the safest path. In particular, a Pydantic-based schema layer is worth serious consideration, not as a replacement for MuPT core runtime classes, but as a typed serialization boundary:
dedicated serialization DTOs define the persisted structure
MuPT objects are converted to DTOs on save, and DTOs are converted back to MuPT objects on load
This approach is attractive because MuPT needs a rigorously typed, recursive, versionable schema for tree-structured molecular representations. Pydantic is well suited to nested validation, recursive models, explicit serialization rules, and JSON Schema generation. JSON Schema would also give maintainers a concrete contract for .mupt contents.
YAML may also be worth considering as the human-readable wire format if inspectability is a priority, but the schema should remain independent of the encoding choice. JSON would also be acceptable. The key requirement is preserving the full Primitive tree and its relational cross-references in a stable, versioned form.
Describe alternatives you've considered
pickle / dill: easy to prototype, but fragile, unsafe for shared files, and not appropriate for a stable interchange format
direct ad hoc JSON/YAML dumps of runtime objects: possible, but likely brittle without an explicit schema layer
making Primitive and related runtime classes themselves Pydantic models: likely too invasive, since MuPT core objects carry runtime behavior (NodeMixin, UniqueRegistry, networkx graphs, mutable geometry) that should not be conflated with the persisted representation
relying only on downstream exported files: preserves simulation artifacts, but not the MuPT-native hierarchy and relationships
primitive_to_mdanalysis() already exists in mupt/interfaces/mdanalysis/exporters.py, so reloadable MuPT representations would immediately enable downstream analysis workflows
mupt-examples is the main user-facing surface where this pain is visible
preserving the tree-like hierarchy of primitives and children is the core design constraint
a likely implementation detail is that some currently freeform fields, especially metadata: dict[Hashable, Any], may need tighter serialization rules for v1
Acceptance criteria
Add a versioned save/load path for Primitive-based representations
Preserve hierarchy, handles/labels, topology, connectors, and relevant metadata through round-trip serialization
Support SAAMR-compliant systems as the initial target
Use an explicit schema layer rather than relying on direct pickling of runtime objects
Fail clearly on unsupported objects or schema versions
Is your feature request related to a problem? Please describe.
MuPT still does not provide a native way to save and reload a
Primitiverepresentation. This is a real workflow bottleneck for systems that take meaningful effort to build, especially SAAMR-compliant systems used inmupt-examples.A common workflow is:
build locally -> transfer to HPC -> run simulation -> retrieve trajectory -> analyze laterToday, if the original Python session is gone, the MuPT representation is gone as well. That means users lose the tree-like relational hierarchy of
Primitiveobjects and their children, including the information needed to recover MuPT-native analysis pathways such asprimitive_to_mdanalysis().This issue is intended as a concrete implementation-oriented continuation of #11, which discussed canonical forms and serialization at a broader level. The goal here is to define and implement a practical, versioned serialization/deserialization path for
Primitive-based representations.Describe the solution you'd like
Add a versioned serialization/deserialization protocol for
Primitivehierarchies.The main requirement is faithful preservation of the relational structure of a MuPT representation, including:
Initial support can focus on SAAMR-compliant systems, but the design should ideally extend to more general
Primitivetrees.A simple public API would be useful, for example:
save(primitive, path)load(path) -> PrimitiveThe on-disk format should be versioned so future schema evolution is explicit and old files can be handled gracefully.
A schema-driven implementation is likely the safest path. In particular, a Pydantic-based schema layer is worth serious consideration, not as a replacement for MuPT core runtime classes, but as a typed serialization boundary:
Primitive,Connector,TopologicalStructure, shapes, registries) remain unchangedThis approach is attractive because MuPT needs a rigorously typed, recursive, versionable schema for tree-structured molecular representations. Pydantic is well suited to nested validation, recursive models, explicit serialization rules, and JSON Schema generation. JSON Schema would also give maintainers a concrete contract for
.muptcontents.YAML may also be worth considering as the human-readable wire format if inspectability is a priority, but the schema should remain independent of the encoding choice. JSON would also be acceptable. The key requirement is preserving the full
Primitivetree and its relational cross-references in a stable, versioned form.Describe alternatives you've considered
pickle/dill: easy to prototype, but fragile, unsafe for shared files, and not appropriate for a stable interchange formatPrimitiveand related runtime classes themselves Pydantic models: likely too invasive, since MuPT core objects carry runtime behavior (NodeMixin,UniqueRegistry,networkxgraphs, mutable geometry) that should not be conflated with the persisted representationAdditional context
Primitiveis centered inmupt/mupr/primitives.pyprimitive_to_mdanalysis()already exists inmupt/interfaces/mdanalysis/exporters.py, so reloadable MuPT representations would immediately enable downstream analysis workflowsmupt-examplesis the main user-facing surface where this pain is visiblemetadata: dict[Hashable, Any], may need tighter serialization rules for v1Acceptance criteria
Primitive-based representations