Implement persistent store for documents #75

jsparber · 2025-05-07T14:07:31Z

This adds storing document and authors presistently on disk.

I didn't do a lot of testing yet, but should be mostly fine to merge.

The aardvark-doc doesn't need/use tokio, since aardvark-node exposes a async-std API.

This ensures that operations are blocked and stored till the node is running. They will be executed in call order once the node is running.

In a future commit the DocumentStore will be written to an sqlite DB, but the document_tx hashmap won't be persistent.

Logically it makes sense that we map documents to a set of authors instead of the other way around. Additionally this makes the implementation of TopicLogMap trait much simpler without the need to look at authors that aren't authors of a specific document.

This probably doesn't fully unsubscribe since p2panda doesn't really implement it, but at least we don't store any updates to the OperationStore. See for more details: p2panda/p2panda#639

jsparber · 2025-05-07T14:08:18Z

I'm seeing a couple of the following errors @adzialocha any idea:

2025-04-26T14:24:31.998642Z ERROR aardvark_node::network: ingesting operation failed: critical storage failure: an error occurred with the sqlite database: error returned from database: (code: 1555) UNIQUE constraint failed: operations_v1.hash

This exposes the PublicKey and PrivateKey structs via boxed glib types so they can be used as GObject properties.

This uses oo7 to store the private key ("identity") to the keyring.

This will be used to store timestamps to the sqlite DB.

Sqlx will be used for the persistent store of operations and other data.

This feature will be used in a future commit to make use of the sqlite operation store.

This uses the sqlite implementation of p2panda's OperationStore instead of the in-memory store. Note that if no path for the DB is set a in-memory sqlite database is created.

We don't use the default feature of the p2panda store anymore since we now use the sqlite OperationStore instead of the memory OperationStore.

This allows combining the our migrations with the migrations from p2panda operation store.

This allows the client code to decide whether to block on shutdown. This also makes shutdown consistent with the run method.

This add types and implements traits needed to write and read documents and authors to and from the sqlite DB.

The DocumentStore keeps persistently track of: - Document ID - Document authors with last seen - Last access to a document - Name of the document

This extracts the name from the first line of the document.

This property keeps track of the last time the document was subscribed to. Subscription will be added in a future commit.

When the subscribe property is `true` the Document is kept in sync with other peers. If it's `false` not changes of the Document will be written to the network nor the local DB.

Since we now require documents to be subscribed to tests are failing.

This makes sure that we unsubscribe form a document when the window is closed or a new document is opened.

The node does only store the OperationStore for now and doesn't reload operations.

Since we require now a data dir for a Service we need to set it in the tests as well.

This allows to load the authors from the DB provided by the node.

This allows setting the authors from the state stored in the local DB.

This adds methods to create a document from an already known document loaded from the local DB.

This creates documents and authors for the existing state loaded from the local DB.

This adds a document list popover to the open button.

This makes sure that we don't create refcycle and the window obj is disposed.

jsparber · 2025-05-08T22:48:09Z

I'm seeing a couple of the following errors @adzialocha any idea:

2025-04-26T14:24:31.998642Z ERROR aardvark_node::network: ingesting operation failed: critical storage failure: an error occurred with the sqlite database: error returned from database: (code: 1555) UNIQUE constraint failed: operations_v1.hash

I think i found out why I see this error. Loro creats probably twice the same snapshot so that we get two operations with the same hash. This also makes the documents get out of sync. I will look into it some more tomorrow.

adzialocha · 2025-05-09T09:21:50Z

I'm seeing a couple of the following errors @adzialocha any idea:
2025-04-26T14:24:31.998642Z ERROR aardvark_node::network: ingesting operation failed: critical storage failure: an error occurred with the sqlite database: error returned from database: (code: 1555) UNIQUE constraint failed: operations_v1.hash
I think i found out why I see this error. Loro creates probably twice the same snapshot so that we get two operations with the same hash. This also makes the documents get out of sync. I will look into it some more tomorrow.

Yes, I agree! It's probably the store inserting the same operation multiple times. I couldn't find the issue yet after a first scan, but it definitely smells like something is done too many times.

We still want to gossip changes even if we can't store the snapshot. There is definitely something wrong but at least this way the document doesn't get out of sync.

jsparber · 2025-05-09T09:31:10Z

I'm seeing a couple of the following errors @adzialocha any idea:
2025-04-26T14:24:31.998642Z ERROR aardvark_node::network: ingesting operation failed: critical storage failure: an error occurred with the sqlite database: error returned from database: (code: 1555) UNIQUE constraint failed: operations_v1.hash
I think i found out why I see this error. Loro creates probably twice the same snapshot so that we get two operations with the same hash. This also makes the documents get out of sync. I will look into it some more tomorrow.
Yes, I agree! It's probably the store inserting the same operation multiple times. I couldn't find the issue yet after a first scan, but it definitely smells like something is done too many times.

Found a workaround for now that makes aardvark work, but definitely we do something wrong. 90ef375

Also we should send the snapshot on update, essentially we need to fix the TODO https://github.com/p2panda/aardvark/blob/main/aardvark-doc/src/document.rs#L225

`p2panda_stream::operation::ingest_operation()` does additional checks we don't need since we can always trust our own operation sequence. This speeds up the operation creation significantly.

We need to make sure that we create operations in sequence and not in parallel.

This reduces the number of snapshots we created.

This fixes some issues when importing remote exports.

This ensures that all signals and changes to the LoroDocument are done on the main thread.

We need signals to be emitted in the main context where the main loop is running. The easiest way is to change the document only from the main context.

This insures that the LoroDocument and the TextBuffer are never in invalid state.

Since all changes that are received from the node are now invoked on the main context we don't need to idle spawn changes to authors.

The gossip system events don't provide information about all authors only about the closest few. This ensures that we add new authors, but the connection state may be wrong.

jsparber · 2025-05-22T16:30:02Z

Pushed the changes i had in jsparber/less_snapshots to this MR. I think this is ready to be merged after a quick review.

jsparber added 6 commits May 5, 2025 14:41

doc: Remove tokio dep

8364a43

The aardvark-doc doesn't need/use tokio, since aardvark-node exposes a async-std API.

chore: Bump p2panda dep to latest commit

3d06eef

node: Block document operation till node is running

d3b7d8a

This ensures that operations are blocked and stored till the node is running. They will be executed in call order once the node is running.

node: Move document_tx hashmap from DocumentStore to Network

784fe69

In a future commit the DocumentStore will be written to an sqlite DB, but the document_tx hashmap won't be persistent.

node: Add unsubscribe for documents

0ccf693

This probably doesn't fully unsubscribe since p2panda doesn't really implement it, but at least we don't store any updates to the OperationStore. See for more details: p2panda/p2panda#639

jsparber mentioned this pull request May 7, 2025

Draft: Make documents and user identity presistent between app starts #69

Closed

adzialocha self-requested a review May 7, 2025 15:19

jsparber added 21 commits May 8, 2025 10:58

doc: Add glib boxed types for identity

9646128

This exposes the PublicKey and PrivateKey structs via boxed glib types so they can be used as GObject properties.

app: Load and store identity

a1145b5

This uses oo7 to store the private key ("identity") to the keyring.

node: Add chrono dep

3995a21

This will be used to store timestamps to the sqlite DB.

node: Add sqlx crate

ea36fb1

Sqlx will be used for the persistent store of operations and other data.

node: Enable sqlite feature for p2panda_store

38d2f5c

This feature will be used in a future commit to make use of the sqlite operation store.

node: Use persistent p2panda's OperationStore

fd7eb7d

This uses the sqlite implementation of p2panda's OperationStore instead of the in-memory store. Note that if no path for the DB is set a in-memory sqlite database is created.

node: Remove default features for p2panda store

5492561

We don't use the default feature of the p2panda store anymore since we now use the sqlite OperationStore instead of the memory OperationStore.

node: Add struct that merges multiple DB migrations

68cde68

This allows combining the our migrations with the migrations from p2panda operation store.

node: Make node shutdown async

93503a6

This allows the client code to decide whether to block on shutdown. This also makes shutdown consistent with the run method.

node: Serialize and deserialize documents for sqlx

3a7ef4e

This add types and implements traits needed to write and read documents and authors to and from the sqlite DB.

node: Implement sqlite DB for DocumentStore

f73ec2e

The DocumentStore keeps persistently track of: - Document ID - Document authors with last seen - Last access to a document - Name of the document

doc: Add name property for document

2dc758d

This extracts the name from the first line of the document.

doc: Add last-accessed property for a Document

ac34dba

This property keeps track of the last time the document was subscribed to. Subscription will be added in a future commit.

doc: Add subscribed property to Document

5a27f16

When the subscribe property is `true` the Document is kept in sync with other peers. If it's `false` not changes of the Document will be written to the network nor the local DB.

doc: Subscribe to documents in tests

b271ef4

Since we now require documents to be subscribed to tests are failing.

app: Unsubscribe from document when window is closed or document changes

176cbef

This makes sure that we unsubscribe form a document when the window is closed or a new document is opened.

doc: Add document list holding all documents known

565767e

doc: Enable persistent database by setting a data dir

935d11d

The node does only store the OperationStore for now and doesn't reload operations.

doc: Set temp data dir for tests

ff2bd3a

Since we require now a data dir for a Service we need to set it in the tests as well.

doc: Add methods to construct an Authors object from state

39a64db

This allows to load the authors from the DB provided by the node.

doc: Make document authors settable

7650c52

This allows setting the authors from the state stored in the local DB.

jsparber added 4 commits May 8, 2025 12:33

doc: Allow loading the state of a document from DB

609c05f

This adds methods to create a document from an already known document loaded from the local DB.

doc: Load documents from local DB on service startup

23dd51d

This creates documents and authors for the existing state loaded from the local DB.

app: Add document list popover

80038a4

This adds a document list popover to the open button.

app: Use clone! macro for closures for AardvarkWindow

795e3e0

This makes sure that we don't create refcycle and the window obj is disposed.

jsparber force-pushed the jsparber/operation_store branch from 33c8a91 to 795e3e0 Compare May 8, 2025 10:34

node: Workaround for duplicated snapshots

90ef375

We still want to gossip changes even if we can't store the snapshot. There is definitely something wrong but at least this way the document doesn't get out of sync.

jsparber added 12 commits May 18, 2025 13:19

node: Don't use ingest method provided by p2panda for new operations

16975ea

`p2panda_stream::operation::ingest_operation()` does additional checks we don't need since we can always trust our own operation sequence. This speeds up the operation creation significantly.

node: Ensure that only one operation is created at a time

38df0d7

We need to make sure that we create operations in sequence and not in parallel.

document: Create snapshot only when idle and every 5s

f042f0a

This reduces the number of snapshots we created.

document: Bump loro dep to 1.5

6a75d0d

This fixes some issues when importing remote exports.

app: Fix crash when last seen of author is unknown

4cab2cc

doc: Consume remote messages from the main context

af41edb

This ensures that all signals and changes to the LoroDocument are done on the main thread.

doc: Ensure all changes to document are made in the main context

51aeee9

We need signals to be emitted in the main context where the main loop is running. The easiest way is to change the document only from the main context.

doc: Propagate error on document insert_text/delete_range

1e2f70b

app: Write text changes immediately to the gtk::TextBuffer

e03442d

This insures that the LoroDocument and the TextBuffer are never in invalid state.

node: Don't use idle spawn for changes to authors

61f2237

Since all changes that are received from the node are now invoked on the main context we don't need to idle spawn changes to authors.

doc: Ensure authors are added when we receive an operation

b90edfb

The gossip system events don't provide information about all authors only about the closest few. This ensures that we add new authors, but the connection state may be wrong.

node: Log when a snapshot or delta is created

fd13adb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement persistent store for documents #75

Implement persistent store for documents #75

Uh oh!

jsparber commented May 7, 2025

Uh oh!

jsparber commented May 7, 2025

Uh oh!

jsparber commented May 8, 2025

Uh oh!

adzialocha commented May 9, 2025

Uh oh!

jsparber commented May 9, 2025

Uh oh!

jsparber commented May 22, 2025

Uh oh!

Uh oh!

Implement persistent store for documents #75

Are you sure you want to change the base?

Implement persistent store for documents #75

Uh oh!

Conversation

jsparber commented May 7, 2025

Uh oh!

jsparber commented May 7, 2025

Uh oh!

jsparber commented May 8, 2025

Uh oh!

adzialocha commented May 9, 2025

Uh oh!

jsparber commented May 9, 2025

Uh oh!

jsparber commented May 22, 2025

Uh oh!

Uh oh!