Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal draft: packed lookup entries #41

Open
Ostrzyciel opened this issue Mar 19, 2025 · 0 comments
Open

Proposal draft: packed lookup entries #41

Ostrzyciel opened this issue Mar 19, 2025 · 0 comments
Labels
new protocol feature Discussion about a new feature in the Jelly protocol

Comments

@Ostrzyciel
Copy link
Member

Currently, when multiple names/prefixes/datatypes must be defined in the stream before a single statement, each falls into a separate RdfStreamRow. For example (from Nanopub Registry):

rows {
  name {
    id: 0
    value: "sig"
  }
}
rows {
  name {
    id: 0
    value: "hasAlgorithm"
  }
}
# and here goes the quad

If the entries are added sequentially (and they often are), we could perhaps squash it into:

rows {
  name {
    id: 0
    value: "sig"
    value: "hasAlgorithm"
  }
}
# and here goes the quad

By changing the type of the value field to repeated.

This would save 4 bytes per each squashed entry (2 for tag and LEN of RdfStreamRow, and 2 for tag and LEN of Rdf(Name|Prefix|Datatype)Entry. Further savings could be achieved if we processed triples in minibatches (maybe introduce such API to ProtoEncoder?), where we'd have to assume that the dictionaries are large enough to hold all needed entries for the minibatch. This should not be a problem for batches of, let's say, 10 statements.

I'd have to run some scripts on the datasets in RiverBench to see what would be the savings, in concrete terms. TODO: test it with different minibatch sizes, starting from 1 up to, let's say, 16.

@Ostrzyciel Ostrzyciel added the new protocol feature Discussion about a new feature in the Jelly protocol label Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new protocol feature Discussion about a new feature in the Jelly protocol
Projects
None yet
Development

No branches or pull requests

1 participant