Skip to content

Implement versioned serialization for Primitive hierarchies #48

@joelaforet

Description

@joelaforet

Is your feature request related to a problem? Please describe.

MuPT still does not provide a native way to save and reload a Primitive representation. This is a real workflow bottleneck for systems that take meaningful effort to build, especially SAAMR-compliant systems used in mupt-examples.

A common workflow is:
build locally -> transfer to HPC -> run simulation -> retrieve trajectory -> analyze later

Today, if the original Python session is gone, the MuPT representation is gone as well. That means users lose the tree-like relational hierarchy of Primitive objects and their children, including the information needed to recover MuPT-native analysis pathways such as primitive_to_mdanalysis().

This issue is intended as a concrete implementation-oriented continuation of #11, which discussed canonical forms and serialization at a broader level. The goal here is to define and implement a practical, versioned serialization/deserialization path for Primitive-based representations.

Describe the solution you'd like

Add a versioned serialization/deserialization protocol for Primitive hierarchies.

The main requirement is faithful preservation of the relational structure of a MuPT representation, including:

  • parent/child hierarchy
  • child handles / labels
  • connector information
  • internal and external connector mappings
  • topology / graph connectivity
  • geometry / shape information
  • element assignments
  • relevant metadata

Initial support can focus on SAAMR-compliant systems, but the design should ideally extend to more general Primitive trees.

A simple public API would be useful, for example:

  • save(primitive, path)
  • load(path) -> Primitive

The on-disk format should be versioned so future schema evolution is explicit and old files can be handled gracefully.

A schema-driven implementation is likely the safest path. In particular, a Pydantic-based schema layer is worth serious consideration, not as a replacement for MuPT core runtime classes, but as a typed serialization boundary:

  • MuPT runtime objects (Primitive, Connector, TopologicalStructure, shapes, registries) remain unchanged
  • dedicated serialization DTOs define the persisted structure
  • MuPT objects are converted to DTOs on save, and DTOs are converted back to MuPT objects on load

This approach is attractive because MuPT needs a rigorously typed, recursive, versionable schema for tree-structured molecular representations. Pydantic is well suited to nested validation, recursive models, explicit serialization rules, and JSON Schema generation. JSON Schema would also give maintainers a concrete contract for .mupt contents.

YAML may also be worth considering as the human-readable wire format if inspectability is a priority, but the schema should remain independent of the encoding choice. JSON would also be acceptable. The key requirement is preserving the full Primitive tree and its relational cross-references in a stable, versioned form.

Describe alternatives you've considered

  • pickle / dill: easy to prototype, but fragile, unsafe for shared files, and not appropriate for a stable interchange format
  • direct ad hoc JSON/YAML dumps of runtime objects: possible, but likely brittle without an explicit schema layer
  • making Primitive and related runtime classes themselves Pydantic models: likely too invasive, since MuPT core objects carry runtime behavior (NodeMixin, UniqueRegistry, networkx graphs, mutable geometry) that should not be conflated with the persisted representation
  • relying only on downstream exported files: preserves simulation artifacts, but not the MuPT-native hierarchy and relationships
  • folding this entirely into canonicalization work from Canonical forms and serialization for Primitives #11: likely too broad; serialization is valuable on its own and can move first

Additional context

  • Primitive is centered in mupt/mupr/primitives.py
  • primitive_to_mdanalysis() already exists in mupt/interfaces/mdanalysis/exporters.py, so reloadable MuPT representations would immediately enable downstream analysis workflows
  • mupt-examples is the main user-facing surface where this pain is visible
  • Canonical forms and serialization for Primitives #11 should be treated as prior discussion/context; this issue is meant to request an actionable implementation path
  • preserving the tree-like hierarchy of primitives and children is the core design constraint
  • a likely implementation detail is that some currently freeform fields, especially metadata: dict[Hashable, Any], may need tighter serialization rules for v1

Acceptance criteria

  • Add a versioned save/load path for Primitive-based representations
  • Preserve hierarchy, handles/labels, topology, connectors, and relevant metadata through round-trip serialization
  • Support SAAMR-compliant systems as the initial target
  • Use an explicit schema layer rather than relying on direct pickling of runtime objects
  • Fail clearly on unsupported objects or schema versions
  • Add tests covering build -> save -> load -> analyze workflows
  • Reference Canonical forms and serialization for Primitives #11 as prior context in implementation/PR discussion

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestpriority:mediumrepresentationPertaining to how MuPT represents molecular systems internally

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions