[api] method POST is used when synchronization of data should be done by PUT with explicitly indicated ID. #114

gustawdaniel · 2019-01-27T04:19:14Z

I have a question to

https://crickapi.docs.apiary.io/#reference/crick-api-for-watson/frames/push-(not-yet-synchronized)-user's-frames

To synchronization is proposed POST. When POST is dedicated to creating resources.
For both updating and creating resources PUT should be used.
Full synchronization requires also answer for the following questions:

Lets A be a client (for example Watson CLI)
Lets B be a server (for example app.crick.io api)

Question 1. Which participant of synchronization contains a source of truth?
a) both have the same level
b) client
c) server

if a - both have equal value
Question 2. Which behavior should be considered as correct when two resources have other values but the same id. The data model of time frame does not contain last modification time. Even if, to do correct synchronization we need also background - previous common version. This is an open question. Related with lacking docs about synchronization.
Question 3. Should be allowed data deletion? If A has resource but B has not then synchronization means that resource should be added to B, or deleted from A? If we select adding strategy on how to remove the resource, is deleting strategy, how to add?
These not all questions but I do not have infinite time, so go to next possibility.

if b - client (Watson cli is master)
Question 4. Then it should start synchronization by get data from the server, process it by comparison with data in Watson, then send POST only to them that are not created on the server (not synchronized yet)

and it is a source of this question because we have an endpoint for both get all frames

https://crickapi.docs.apiary.io/#reference/crick-api-for-watson/frames/retrieve-all-user's-frames

and POST lacking frames

https://crickapi.docs.apiary.io/#reference/crick-api-for-watson/frames/push-(not-yet-synchronized)-user's-frames

but it is an incomplete approach. What about update frames that change PUT / PATCH and remove frames that were removed DELETE. And what is a relation among taking the logic of synchronization in Watson CLI in relation to recommendation of @SpotlightKid from

jazzband/Watson#40

that in 2015 typed

To not bloat the Watson distribution with too many sync backends (and their dependencies), I propose to use a plugin framework to load backend implementations and to specify the API that they have to support.

Question 5. What is a scenario when in one backend it connected two clients? O one with data second without. Should synchronization with first create data on the server, and on the second remove? Taking into account that only GET and POST are implemented I suspect that rather, first synchronization creates data on the server, second move them to the second client, but when I remove the frame from the first client and synchronize again this frame rather will occur on the client that will be removed from the server. Should be it considered as a bug?

Actually "Watson deleted frames do not sync with crick"

#111

if c - the server is master, and cli slave
It is rather not probably because of synchronization means in this case that you can create data only on the server. But when I had seen issue

jazzband/Watson#171

I decided to add the next question
Question 6. Who is a person that has to decide voice on this topic? @jmaupetit typed

We must re-consider our synchronization strategy which —at the time of writing— overrides local changes between two sync events.

It is related with my question about integration with external sources of data that uses his own identifiers.

jazzband/Watson#190

It is related with not finished discussion about logic of synchronization there

Syncing with server overrides local changes #171

And lacking documentation there.

jazzband/Watson#165

I can send my propositions. What should I do?

Do research about synchronization protocols [today]
Propose protocol [today]
Wait for an answer for question [1 month]
Wrap everything together and publish a draft of the specification of synchronization [1 week]
Wait for fixes and opinions from community [1 month]
Learn Go + react, I know c, c++, python, vue, so it will be easy [1 month]
Implement this specification [1 month]
Wait for accepting pull request [1 month]

When everything will go great we will have working synchronization in half of 2019 and many issues connected with it will be closed.

So let's start.

Research on synchronization:

https://en.wikipedia.org/wiki/Data_synchronization

We have

file synchronization
version control
distributed filesystems
mirroring

I propose version control.

set reconciliation problem can be solved by

Wholesale transfer
Timestamp synchronization
Mathematical synchronization

I poropose matchematical synchronization

In Error handling paragraph there is a sentence

The simplest approach is to have a single master instance that is the sole source of truth.

But I propose another approach - accept any modification and store list of modifications. When two modifications overlapping, then merge them with "mathematical synchronization" that I will describe later.

Proposed tools

Paxos https://en.wikipedia.org/wiki/Paxos_(computer_science) - in my opinion too complicated "solving consensus in a network of unreliable processors" - is is not our case.
Raft https://en.wikipedia.org/wiki/Raft_(computer_science) - looks nice, it requires to understood concepts of "Leader Election" and "Log Replication", but there is a nice tutorial. I recommend to see it after reading an article on the wiki

http://thesecretlivesofdata.com/raft/

There is PDF

raft.pdf

and finally a list of implementations

https://raft.github.io/#implementations

So props:

has many implementations, are widelly known
works in a distributed network of nodes,

Questions:
should we consider Watson cli like rarf node or client?
Answer:
It could be node only if have a public address, but it is to send a request to them, but this is hard to achieve.
So Watson cli should be a client in this model.

Cons:

it seems to be overengineered.
it needs cluster of servers to works efficiently
we rather looking for simple sollutin like "storage everywhere", "server -> serverless"

I reseatrched some solutions and finally finised on stackoverflow asking this question

https://stackoverflow.com/questions/54385016/simple-synchronization-protocol-for-array-of-objects

This is instantly draft of my proposition how to solve problem of synchronization. It this model Serverless lambda + text file stored anywhere can be replaced by crick backend and postgress, but vision of serverless (that are free today for small number of requests) and static file storage (that is also free for personal users) for me is more attractive than backend that must be served.

gustawdaniel · 2019-01-27T04:30:24Z

Link to draft

https://drive.google.com/file/d/1SB3VJyCdzU5Ggt3Cq9mE8wfLfpKUcxoq/view?usp=sharing

Related #113.

I propose to select one issue for synchronization topics. It can be this one or

Add documentation about using Crick with Watson jazzband/Watson#165

or

Syncing with server overrides local changes jazzband/Watson#171

or

Synchronization backends should be plugable jazzband/Watson#40

But I do not sure if synchronization is more related to Watson or to the crick. Additionally taking into account a number of issues connected with bugs in synchronization I propose to add to documentation that it is an experimental feature and is not stable now.

jmaupetit · 2019-01-28T20:03:08Z

@gustawdaniel thank you for this relevant analysis 🙏. You tackle many aspects of the project. I will need time to answer all of your questions about Watson & Crick. You are definitely pushing hard for this sync to work properly and it's a good thing. Thx.

gustawdaniel · 2019-02-04T12:08:37Z

Update `proposition 2 (simplified)`:

I adjusted my vision to shape of the current solution.

Replace lambda and flat file storage by crick server.
Replace local logs with a collection of requests to do to mirror local changes.
Introduce flag of full and shallow synchronization sync means shallow, sync --full
means synchronization with changes introduced before the last synchronization. Any client should remember when was synced last time. Synchronization means:
a) send any requests from local logs to current crick server (POST / PUT / DELETE)
b) get all time frames from the last sync to now (shallow), get all time frames from beginning to now (full), it can be paginated, and pagination should be supported by a client after get the client should update his last sync date and compare local collection with obtained collection, in this process client should replace all his frames from considered period by frames from server, because of if he sends a time frame before, it should be on server too, if any modifications on this frame was done by another client then it should override this client state.

Bidirectional synchronization with flat file storages / external services

crick server should expose endpoints for webhooks from services like Toggl / Google Calendar ...
crick server should have built-in cron or endpoints for external cron to send data to these services / flat file storages

Pagination is implemented now

https://github.com/TailorDev/crick/blob/00c8e8dd4ff91d9f65facf494d35af278723ffca/apiary.apib

Possible other ways of pagination:

hal+json (HATEOAS) https://en.wikipedia.org/wiki/Hypertext_Application_Language
json-ld (JSON for Linking Data) https://json-ld.org/
Link header like in GitHub v3 api https://developer.github.com/v3/guides/traversing-with-pagination/

Drawbacks:

a central point of failure - crick server
we need to maintain VPS server
possible invalid (or rather nonintuitive) behavior if synchronization not enough frequently
Consider client A create timeframe and synchronize it,
Then client B synchronize and get this timeframe, edit it,
Client A edits this timeframe too
Now this timeframe will be saved which was updated last time,
additionally, if only shallow synchronization will be applied, any client can
operate on his own unsynchronized data. We should describe it in the draft specification that is planned for next month.

Advantages:

more elastic possibilities than serverless (build in cron, speed communication with Postgress)
number of users is small, so scaling does not matter
an approach based on what was build
easy to learn for new supporters (blockchain architecture / complicated versioning not needed)

Summary

I can implement this concept. I probably understand what works wrong from issues of other users. We need to replace POST / DELETE interface by POST / PUT / DELETE with well-documented logic of synchronization.

So logic has three ingredients,

any client should storage his requests to send,
the first step of synchronization is sending create/update/delete requests that change state on server
second step is get time frames and replace local state by state from the server.

Any additional integrations like flat file storages / Toggl / Google Calendar / Zapier ...
should be realized by endpoints that forcing sending requests by Crick and endpoints for receiving webhooks.

Next step after fixing synchronization could be scoping tokens. For example tokens with scope only get, or only create.

Questions

What about questions from issue content? I will try to answer them below:

Question 1. Which participant of synchronization contains a source of truth?

Answer: Crick Server

Always client should have one or zero Crick Servers that are synchronized with them. We should be able to export all data from the server and move them to another server.

I typed that it is not probable. It was my mistake. The server contains one source of truth, but it can be modified by requests from clients. Then if it is possible any change of state on the client should be instantly mirrored on the server, and a local stack of requests to send should be created only if a connection is impossible.

Question 2. The data model of time frame does not contain last modification time.

Answer: We can assume that the last modification time is not necessary. The valid version is this one which one was sent to the server as last.

We can assume also that last modification time can fix the problem with shallow and full synchronization. I propose to add last modification time to data model on server and client, because of it allows synchronizing selected frames in a more intelligent way than selecting all or selecting all from the last synchronization to now.

Proposition: add last modification time property to frame

Question 3. What about deletion?

Answer: By second step of synchronization I now mean replacing all data from client local database by server version. So if any element is deleted then all client time frames that can't be found on the server should be deleted too.

Question 4. Why only POST / GET in API?

Answer: I treat it as a result of not predicted problems with synchronization. We should add PUT and DELETE. That's all.

Question 5. What is a scenario when in one backend it connected two clients, one has a time frame, the second doe's not?

Answer: It does not matter. The only log of requests to send is important, during full synchronization state of both clients should be replaced by the version from the server.

Question 6. Who is a person that has to decide voice on this topic?

Answer: I understood that @jmaupetit.

What now?

You can treat it as a draft of the specification.

Now I think we can go to the fourth point of agenda: "Wait for fixes and opinions from the community".

I hope any fixes or propositions, improvement, and questions will be obtained from @jmaupetit and the rest of the community. I think in March I will learn Go language and React, and in Aprill find a moment to modify, Crick API, Watson and fix synchronization.

gustawdaniel · 2019-02-04T12:47:13Z

Not only frames but also projects should be synchronized. So projects also should have last modified time. Probably other resources too if exists or will be created in the future.

jmaupetit added improvement needs doc labels Jan 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[api] method POST is used when synchronization of data should be done by PUT with explicitly indicated ID. #114

[api] method POST is used when synchronization of data should be done by PUT with explicitly indicated ID. #114

gustawdaniel commented Jan 27, 2019

gustawdaniel commented Jan 27, 2019 •

edited

Loading

Uh oh!

jmaupetit commented Jan 28, 2019

Uh oh!

gustawdaniel commented Feb 4, 2019 •

edited

Loading

Uh oh!

gustawdaniel commented Feb 4, 2019

Uh oh!

[api] method POST is used when synchronization of data should be done by PUT with explicitly indicated ID. #114

[api] method POST is used when synchronization of data should be done by PUT with explicitly indicated ID. #114

Comments

gustawdaniel commented Jan 27, 2019

gustawdaniel commented Jan 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmaupetit commented Jan 28, 2019

Uh oh!

gustawdaniel commented Feb 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update proposition 2 (simplified):

Summary

Questions

Question 1. Which participant of synchronization contains a source of truth?

Question 2. The data model of time frame does not contain last modification time.

Question 3. What about deletion?

Question 4. Why only POST / GET in API?

Question 5. What is a scenario when in one backend it connected two clients, one has a time frame, the second doe's not?

Question 6. Who is a person that has to decide voice on this topic?

What now?

Uh oh!

gustawdaniel commented Feb 4, 2019

Uh oh!

gustawdaniel commented Jan 27, 2019 •

edited

Loading

gustawdaniel commented Feb 4, 2019 •

edited

Loading

Update `proposition 2 (simplified)`: