Skip to content

Offline Visualization of Inference Traces#47

Open
archbilesherman wants to merge 5 commits into
AI2Science:mainfrom
archbilesherman:offline-viz
Open

Offline Visualization of Inference Traces#47
archbilesherman wants to merge 5 commits into
AI2Science:mainfrom
archbilesherman:offline-viz

Conversation

@archbilesherman
Copy link
Copy Markdown

@archbilesherman archbilesherman commented Mar 5, 2026

Summary

This PR implements the backend reader layer for issue #41, which focuses on lightweight offline visualization of VizFold/OpenFold inference traces. The goal is to let visualization tools inspect saved inference traces without rerunning inference or depending on a heavy live backend.

The main contribution is a working ArchiveReader for standardized Zarr-style trace archives, alongside the existing LegacyTxtReader path for older text-based trace dumps.

What the code does

LegacyTxtReader

The LegacyTxtReader supports the older VizFold text-dump format. This is useful because some existing trace outputs are still stored as plain text files rather than standardized archives. Keeping this reader lets the offline visualization stack continue to support current/legacy outputs while the archive format from issue #39 stabilizes.

ArchiveReader

The new ArchiveReader supports Zarr-based inference trace archives. It can:

  • open .zarr archives
  • open zipped Zarr archives by extracting them to a temporary directory
  • read metadata from archive attributes / metadata groups
  • discover available attention types
  • list available layers
  • list available heads
  • selectively load one attention head
  • load all attention heads for a layer
  • apply top_k filtering to attention connections
  • load single representations
  • load pair representations
  • load structure data
  • convert stored coordinates into basic PDB text as a lightweight visualization fallback

This is meant to provide a stable backend interface for the frontend visualization work. The frontend should be able to call the reader interface without needing to know the exact internal archive layout.

Why both readers exist

There are currently two reader paths because they support two different stages of the project:

This lets issue #41 support existing data while also moving toward the scalable archive format that will be needed for larger protein traces.

How to run locally

python -m pip install streamlit plotly py3Dmol matplotlib pandas zarr numcodecs fsspec
python -m pytest tests/test_archive_reader_contract.py tests/test_legacy_txt_reader.py
python -m streamlit run webui/app.py

@archbilesherman archbilesherman marked this pull request as ready for review April 29, 2026 02:39
@archbilesherman archbilesherman changed the title Add offline inference trace framework for issue #41 Offline Visualization Apr 29, 2026
@archbilesherman archbilesherman changed the title Offline Visualization Offline Visualization of Inference Traces Apr 29, 2026
@archbilesherman
Copy link
Copy Markdown
Author

Updated the PR description with current backend functionality, test coverage, and remaining integration work. The reader now passes local tests and supports working Zarr archive loading, but still needs validation against a real standardized archive from the archive-writing pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant