Sync of ForkedStore #7

drewmccormack · 2024-12-15T10:03:44Z

Once a ForkedStore exists, it should be possible to sync that via any online document storage that can guarantee serial updates (eg CloudKit).

Not 100% clear how this should work — probably by adding timestamps or count variables in the ForkedStore so we know what needs uploading — but there should be some requirements. (Perhaps the count/timestamp would be a peer id for the device, plus a simple int count.)

For a basic syncing server, we can conglomerate a set of changes into a JSON bundle, and upload that
It is likely that the number of bundles grows over time. It would be good if there were some background process that would merge old bundles and reupload them, to reduce the number of bundles in the cloud system.

The conglomeration of bundles means that new devices don't have to download 1000s of records, but just a few. It means that new uploads are small and fast, but later get grouped into larger bundles for faster bootstrapping of new devices.

During the conglomeration stage, redundancy in the data could be removed. Ie older changes to the same value. Note that the bundle should end up being exactly equivalent to the smaller sets of changes that it was produced from. It would just remove redundancy, and make the file downloads more efficient by reducing the number of files.

drewmccormack · 2024-12-15T10:07:23Z

Thinking more about this, maybe it is better just to store the most recent values in the cloud, for each ForkedResource in the store. As long as it is a locking server, we can guarantee this works.

It would probably be good not to store the ForkedStore as one big file, and also not as individual records for every resource, but instead have some partitioning. Eg. the first two characters of the id are the "group", so you end up with around 500 files max.

malhal · 2024-12-16T13:40:17Z

I was wondering how much change data is stored on the server and if it is ever deleted or just grows forever like most sync frameworks do.

drewmccormack · 2024-12-16T14:32:36Z

At the moment, no change data is stored in the cloud. The current CloudKitExchange simply stores one copy of the value in the cloud at a time. So a client may have several copies, for the various common ancestors, but in the cloud, there is just one copy.

For this to work, it has to be a locking server, and CloudKit supports that. We can fetch a record, change it, and save it. If the record has changed since we last fetched it, the save fails.

For a store, this is up in the air. I actually think it would be better just to scale up the existing approach, and keep one copy of value in the cloud. It could be stored as one record to store a single value, as now, or you could consider conglomerating based on id, so that you reduce the total number of records. I think CloudKit struggles a bit when you get into the 10s of thousands of records. It is much more efficient if you cluster the data in those records into, eg, 500 records.

Long way to answer the question, but, in short, there is no build up of data in the cloud. It should be the same size as the values you store.

drewmccormack added the enhancement New feature or request label Dec 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync of ForkedStore #7

Sync of ForkedStore #7

drewmccormack commented Dec 15, 2024

drewmccormack commented Dec 15, 2024

malhal commented Dec 16, 2024

drewmccormack commented Dec 16, 2024

Sync of ForkedStore #7

Sync of ForkedStore #7

Comments

drewmccormack commented Dec 15, 2024

drewmccormack commented Dec 15, 2024

malhal commented Dec 16, 2024

drewmccormack commented Dec 16, 2024