-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: support embedded protos for literal serialization #14
Comments
One thing came up when I discussed this with someone else – for this to work, the consumer would have to FIRST read the datatype field in
I'm not sure if it's such a good idea, as this leaves you with the choice of either sacrificing performance or maintainability. One alternative I could think of is to place the binary blob in a subsequent |
Regarding the SOTA of this, I only found this: https://protobuf.dev/programming-guides/proto3/#any https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/any.proto It's a somewhat similar idea, but it would require us to specify the nested message FQN with every literal... which is not very efficient, to say the least. It also suffers from the same field ordering issue that I outlined above. So, as far as I understand the protobuf-java code, the implementation simply reads the entire nested message into a byte buffer and then parses it from memory. Not great. Note that the FQNs in Google's |
Open question: how to handle compatibility negotiation in the gRPC pub/sub protocol? Maybe we could add an optional field to |
We should also gather the information on the FQNs used by the community in the wild and how should they be interpreted. Something similar to this: #15 Some of these that are RDF-specific (e.g., CDTs) could live here in this repo. Others like for example tensors could be simply references to external repos. |
Regarding compactness of this approach – if we have these embedded protos as a We'd basically only need to add 3 more fields, corresponding to the 3 other wire types: VARINT, I32, I64. See: https://protobuf.dev/programming-guides/encoding/#structure One thing to consider is whether we could make pluggable parsers for such fields. No idea, honestly. For nested messages, this is not an issue, as we can simply skip writing the top-level message tag and start writing the contents of the message. This way we are not wasting any bytes. |
Literals in RDF can sometimes store quite a lot of data by themselves, which can bog down the serialization, transmission, and parsing if not done efficiently. Examples of this are:
rdf:JSON
datatype for JSONs.geo:wktLiteral
which typically contains lists of numerical data (potentially very long).xsd:hexBinary
andxsd:base64Binary
datatypes that inefficiently encode binary data as ASCII strings.xsd:double
) possibly could be represented more efficiently as bytes.In all of these cases, the data in question could be represented more efficiently, if a binary format was used. Note that this would not make sense in every possible use case, as Jelly uses the lexical space of literals for reason. Many RDF libraries will simply refuse or make it very hard to work with the value space of the literal directly, instead of the lexical space. Additionally, lexical<->value space conversions are already included in these libraries, and we don't have to reimplement them, which would surely introduce many bugs. This is the reason why Jelly currently doesn't use value encodings for numerical datatypes.
However, in some cases (e.g., when transmitting data from IoT sensors), such specialized encodings would make a lot of sense.
Scope:
RdfDatatypeEntry
would be extended with an optional field that would specify how to parse fields with this datatype. This information could be for example a string with the fully-qualified named of a Protocol Buffer that should be used to read the binary data.lex
field ofRdfLiteral
should be changed to be of typebytes
instead ofstring
.lex
field would be treated as UTF-8 (as normal).RdfLiteral
).Implementation & performance:
computeSerializedSize
just like Protobuf does before the actual serialization.The text was updated successfully, but these errors were encountered: