Skip to content

Track graph update times in a separate modifications graph (Issue 413) #506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 9, 2021

Conversation

lkitching
Copy link
Contributor

No description provided.

@lkitching lkitching changed the title Issue 413 Track graph update times in a separate modifications graph (Issue 413) Feb 25, 2021
@ricroberts
Copy link
Contributor

ricroberts commented May 28, 2021

Some notes following meeting with Rick/Ric/Lee today

  • Whenever you make a change through API to a draftset any graph you touch gets an entry updated in the modified times graph
  • This includes some stasher changes that will conflict with the Cache Race work that @callum-oakley did.
  • There's a modified time on the draftset itself, but changes to draft graphs now get applied to the modified times graph
  • The modified times graph is draft-aware and is maintained while you make changes to a draft. When we publish, we merge and apply changes to the live modified times graph. The times in the live modified times graph aren't necessarily the publish-time of that graph but are guaranteed to be later than they were before if there was a change to that graph.
  • But note that the modified times aren't kept up to date in a draftset's modified times graph as live data changes. This means that someone querying the modified times graph to to know when graphs in the union-live graph set have changed will need to query both live and draftsets's modified times. See issue Modified times aren't kept up to date in a draftset's modified times graph as live data changes.  #535 (outside the scope of this PR)
  • We should also add something to the swagger docs to explain what the modified times graph is (and include the above behaviour).

Summary of dependent PRs:

To review:

@callum-oakley callum-oakley force-pushed the issue_500 branch 2 times, most recently from 28c50aa to e5820aa Compare July 29, 2021 14:32
Base automatically changed from issue_500 to modified-times-integration July 29, 2021 16:47
@callum-oakley
Copy link
Contributor

callum-oakley commented Jul 29, 2021

Original branch pushed to issue_413-original. Muttnik tests fail against the rebase, but in expected (I think) ways (e.g. the tests don't expect the modified times graph to exist). So since this is the last branch, I'm going to merge it in to modified-times-integration, and then make a branch of muttnik which accounts for the API changes in drafter.

lkitching and others added 6 commits July 30, 2021 17:29
Issue #413 - Track graph modification times within a public
system-managed graph. Updates made to graphs through the append data
delete data, delete graph and SPARQL update routes update the
associated dcterms:modified timestamps within the modifications
graph.

Graphs modified within a draftset have their last-modified times
within a draft of the public modifications graph. Unlike other
graphs this graph contains only the changes made within the
draftset and does not clone the live graph if one exists.

Since only graphs with visible draftset changes have entries in the
draft modifications graph, the draft modifications graph can be
created and deleted as changes are made and reverted to graphs within
a draftset.

Publishing the modifications graph is also handled separately since
changes within the draft must be merged into the live graph instead
of being done via a COPY GRAPH like other user graphs within the
draftset.

Add a new drafter.feature.modified-times namespace for handling
changes to and publishing for the modifications graph within a
draftset.

Change how draft graph 'touch' operations are handled within the
draftset UPDATE query handler. Touch operations are now used to
collect all the draft graphs affected by an update within the
update plan. These are then translated into a sequence of updates
to be applied to the draft modifications graph. These are added to
the data updates used to apply the query within the draftset. Update
the spec of the update plan record built by the update queries.

Add user-graph? method to graph manager which indicates whether a
live graph URI is a user graph. Currently all non-protected graphs
are considered to be user graphs.

Create ensure-protected-graph-draft method for the graph manager
which ensures a draft for a protected graph exists within a draftset.

Add draft-graph-deleted! function which creates the modifications
graph if necessary and updates the deleted graph modified timestamp.

Extract the publish operations into their own namespace to avoid a
circular dependency between draftset.operations and draftset.graphs.

Implement the publishing of draft modifications into the live graph
in the modified-times namespace. Update the publish process to
separate the user graphs from the draft modifications graph and
migrate them to live separately.

A number of tests within drafter and drafter-client assert the
contents of all the quads within the live or draftset endpoints.
These assertions are no longer valid due to the addition of the
modified times in the system modifiations graph.

Change test-helper functions get-draftset-quads-through-api and
get-draftset-info-through-api to filter any system graphs from
their results and rename them to indicate they now only include
user graphs.

Rename get-draftset-quads to get-draftset-user-quads and update it
to only return quads in user graphs.

Append and delete jobs no longer update state graph timestamps so
remove old tests. These now update the modifications graph for
draft graphs and the draftset timestamp tests have moved into the
modification times tests.

Add user-draftset-info-view which filters any system graphs from
the set of changed graphs. Update tests to convert the returned
draftset info into one which contains only user graph changes.

Fetch from /data endpoint in n-quads format in the reasoning tests
so the non-user graphs can be removed. Convert the data back into
CSV rows after graphs have been filtered and values converted to
strings.

Add get-user-quads function in the drafter-client tests which
removes any statements within non-user graphs returned when fetching
all draftset quads with the client. Add query-user-triples
function which returns all user triples via a SPARQL CONSTRUCT
against a given connection.

Move the three-argument version of draft-exists? into a new
draft-graph-exists-for? function within the draft-management-helpers
namespace. draft-exists? expects a draft graph URI while the new
draft-graph-exists-for? expects a live graph URI and a draftset to
search in.

Update the implementation to use the existing find-draftset-graph
function and remove custom query. Remove the identical function
within modified-times-test and its sparql query file.

Add drafter.generators namespace containing subject, predicate,
object, graph, triple and quad generators along with utility
functions for generating random collections. Move the existing
generators in the update query tests into this namespace and
add new generators for use in the modified times tests.

The new-draft-graph-statements and draft-graph-statements functions
in the draftset.graphs namespace are almost identical. Assume the
draftset is always given, remove draft-graph-statements and update
the other functions to use new-draft-graph-statements. Pass the
draftset-ref to the function in draftset.graphs instead of the
URI.
Issue #413 - Update the modified-state.sparql query used by stasher
to calculate dataset modified times to use the modifications graph(s)
instead of the state graph.

Remove the dcterms:modified timestamp from new draft graphs and
avoid migrating them to the live graph during publishing.

Update the test state-1.trig and associate test file to include live
and draft modifications graphs and remove the state graph timestamps.
Update other tests which use this data to account for the public
modifications graph when asserting the expected live contents.
Issue #413 - Add migration query to move live and draft modified
times from the state graph to live and draft versions of the
modifications graph.

The query:
  * Creates a new live modifications graph
  * Moves modification times for live graphs from the state graph.
    Live graphs without an existing timestamp are given one with
    the current time
  * Creates a draft modifications graph in each draftset with changes
  * Moves modification times for draft graphs from the state graph
    into the corresponding draft modification graphs. Draftset
    modification timestamps are also updated to the current time.
@callum-oakley callum-oakley merged commit dff69d0 into modified-times-integration Aug 9, 2021
@callum-oakley callum-oakley deleted the issue_413 branch August 9, 2021 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants