Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync of ForkedStore #7

Open
drewmccormack opened this issue Dec 15, 2024 · 3 comments
Open

Sync of ForkedStore #7

drewmccormack opened this issue Dec 15, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@drewmccormack
Copy link
Owner

Once a ForkedStore exists, it should be possible to sync that via any online document storage that can guarantee serial updates (eg CloudKit).

Not 100% clear how this should work — probably by adding timestamps or count variables in the ForkedStore so we know what needs uploading — but there should be some requirements. (Perhaps the count/timestamp would be a peer id for the device, plus a simple int count.)

  • For a basic syncing server, we can conglomerate a set of changes into a JSON bundle, and upload that
  • It is likely that the number of bundles grows over time. It would be good if there were some background process that would merge old bundles and reupload them, to reduce the number of bundles in the cloud system.

The conglomeration of bundles means that new devices don't have to download 1000s of records, but just a few. It means that new uploads are small and fast, but later get grouped into larger bundles for faster bootstrapping of new devices.

During the conglomeration stage, redundancy in the data could be removed. Ie older changes to the same value. Note that the bundle should end up being exactly equivalent to the smaller sets of changes that it was produced from. It would just remove redundancy, and make the file downloads more efficient by reducing the number of files.

@drewmccormack drewmccormack added the enhancement New feature or request label Dec 15, 2024
@drewmccormack
Copy link
Owner Author

Thinking more about this, maybe it is better just to store the most recent values in the cloud, for each ForkedResource in the store. As long as it is a locking server, we can guarantee this works.

It would probably be good not to store the ForkedStore as one big file, and also not as individual records for every resource, but instead have some partitioning. Eg. the first two characters of the id are the "group", so you end up with around 500 files max.

@malhal
Copy link
Contributor

malhal commented Dec 16, 2024

I was wondering how much change data is stored on the server and if it is ever deleted or just grows forever like most sync frameworks do.

@drewmccormack
Copy link
Owner Author

At the moment, no change data is stored in the cloud. The current CloudKitExchange simply stores one copy of the value in the cloud at a time. So a client may have several copies, for the various common ancestors, but in the cloud, there is just one copy.

For this to work, it has to be a locking server, and CloudKit supports that. We can fetch a record, change it, and save it. If the record has changed since we last fetched it, the save fails.

For a store, this is up in the air. I actually think it would be better just to scale up the existing approach, and keep one copy of value in the cloud. It could be stored as one record to store a single value, as now, or you could consider conglomerating based on id, so that you reduce the total number of records. I think CloudKit struggles a bit when you get into the 10s of thousands of records. It is much more efficient if you cluster the data in those records into, eg, 500 records.

Long way to answer the question, but, in short, there is no build up of data in the cloud. It should be the same size as the values you store.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants