Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Jelly extension for RDF Patch #11

Open
Ostrzyciel opened this issue Oct 31, 2024 · 5 comments
Open

Proposal: Jelly extension for RDF Patch #11

Ostrzyciel opened this issue Oct 31, 2024 · 5 comments
Assignees
Labels
new protocol feature Discussion about a new feature in the Jelly protocol

Comments

@Ostrzyciel
Copy link
Member

Ostrzyciel commented Oct 31, 2024

The proposal is to create a non-core extension of Jelly that would implement the RDF Patch protocol for communicating changes to RDF datasets in a transactional manner.

Scope:

  • Implement all features of RDF Patch, including transactions and prefixes.
  • Some new messages will need to be defined – corresponding to the different commands (e.g., TX, TC, PA, PD...).
  • For the A and D commands, the inner messages should be RDF triples or quads from the core Jelly.
  • The outer wrapping messages from Jelly core (RdfStreamRow and RdfStreamFrame) should not be reused/extended, to avoid spaghettifying the code and making changes harder in the future. New wrapping messages dedicated to RDF Patch should be defined.
  • Up for discussion:
    • Should the GRAPHS stream type be implemented? It may be problematic... I think just TRIPLES and QUADS will suffice.
    • How should the header look like? Possibly have a new message (PatchStreamOptions) that wraps RdfStreamOptions and adds a new field for the RDF Patch header content?

Implementation:

  • Existing related work: RDF Patch implementation with Thrift in Apache Jena
  • Create a new .proto file with the format specification.
  • Create a separate specification document for Jelly RDF Patch. The spec will use the serialization format spec as its basis in the same way that the gRPC streaming protocol spec does.
  • Implement this in Jelly-JVM. Possible module structure (TBD):
    • jelly-patch-core – depends only on jelly-core, library-agnostic implementation
    • jelly-patch-jena – depends on jelly-patch-core and jelly-jena, implementation fully integrated with Apache Jena
    • ... possible implementation for RDF4J, if this will make sense (not sure if there are APIs for this). TBD

Not in scope:

  • Integrating Jelly RDF Patch with the gRPC streaming protocol spec. The protocol probably would have to implement something like this. If anyone is interested in that, we can open a separate issue.

Any suggestions, ideas, or expressions of interest are welcome.

@Ostrzyciel Ostrzyciel added the new protocol feature Discussion about a new feature in the Jelly protocol label Oct 31, 2024
@Ostrzyciel Ostrzyciel self-assigned this Feb 16, 2025
@Ostrzyciel
Copy link
Member Author

I've started an implementation here: https://github.com/Jelly-RDF/jelly-protobuf/tree/rdf-patch

However, it turns out that implementing this cleanly in Jelly-JVM will require some major refactors in the core. I will start working on that, but it will take a while.

Ostrzyciel added a commit to Jelly-RDF/jelly-jvm that referenced this issue Feb 17, 2025
Related to: Jelly-RDF/jelly-protobuf#11

This introduces a few refactors around the ProtoEncoder to allow us to reuse its code in the core-patch module later. This includes:

- Allowing NodeEncoder to append to anything that can consume lookup entries, via a dedicated interface
- De-inlining protected methods in ProtoEncoder. I don't think it was working anyway. The JVM is smart enough to do inlining by itself, and the inlines were messing with public/private code guarantees.
- Create the core.internal package to group the messier internal classes together and keep the top-level package clean.
Ostrzyciel added a commit to Jelly-RDF/jelly-jvm that referenced this issue Feb 17, 2025
Related to: Jelly-RDF/jelly-protobuf#11

This introduces a few refactors around the ProtoEncoder to allow us to reuse its code in the core-patch module later. This includes:

- Allowing NodeEncoder to append to anything that can consume lookup entries, via a dedicated interface
- De-inlining protected methods in ProtoEncoder. I don't think it was working anyway. The JVM is smart enough to do inlining by itself, and the inlines were messing with public/private code guarantees.
- Create the core.internal package to group the messier internal classes together and keep the top-level package clean.
@Ostrzyciel
Copy link
Member Author

I've completed the refactors. After that, implementing the encoder was rather straightforward.

It compiles, but I have no idea if it works. I'll need to add tests for the encoder and add a complete decoder to make sure that this really makes sense.

Still unresolved is the issue of how to version the RDF Patch extension – should it be tied to Jelly versioning or separate? I'll create a task for this.

Ostrzyciel added a commit to Jelly-RDF/jelly-jvm that referenced this issue Mar 2, 2025
To support extensions like Jelly-Patch, it would be useful to factor out the common ProtoDecoder code to a trait that can be extended.

More info: Jelly-RDF/jelly-protobuf#11

This is analogous to very similar changes in ProtoEncoder here: #278
Ostrzyciel added a commit to Jelly-RDF/jelly-jvm that referenced this issue Mar 2, 2025
To support extensions like Jelly-Patch, it would be useful to factor out the common ProtoDecoder code to a trait that can be extended.

More info: Jelly-RDF/jelly-protobuf#11

This is analogous to very similar changes in ProtoEncoder here: #278
@Ostrzyciel
Copy link
Member Author

I've just realized that we may need an analogue of logical stream types for Jelly-Patch as well. The core question is: is the stream as a whole a single large patch, or is each frame to be treated as a separate patch?

According to the RDF Patch spec, there is a specific layout that a patch should follow. A transaction must be entirely within one patch. Headers must be only at the start of a patch. Both of these matter a lot if we consider a stream of patches vs a single large patch across multiple frames.

The main difference here is that this "logical type" would in fact influence correctness of the stream, whereas the logical stream types in Jelly-RDF are just informative, and not even required. So, I'd suggest using a different name than "logical type" for this.

@Ostrzyciel
Copy link
Member Author

I'd propose to have three types:

  • one frame = one patch
  • entire stream = one patch
  • multiple frames = one patch (punctuated)

Most interesting here are punctuated streams, where the stream contains multiple patches, each possibly spanning multiple frames. This would make sense if the patches were very large (e.g., corresponding to huge transactions). Above a few megabytes per message Protobuf may start having performance issues, and then there is the hard limit of 2GB per message due to the usage of int32 math.

@Ostrzyciel
Copy link
Member Author

I wrote in the last draft that headers must only occur at the start of a patch. I'd relax this, as this really depends on what we are using the headers for. If your specific application requires headers at the start of a patch, then this is an additional requirement on top of the base spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new protocol feature Discussion about a new feature in the Jelly protocol
Projects
None yet
Development

No branches or pull requests

1 participant