Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-par concurrent read performance with jena-iri #1470

Open
Aklakan opened this issue Aug 5, 2022 · 5 comments
Open

Sub-par concurrent read performance with jena-iri #1470

Aklakan opened this issue Aug 5, 2022 · 5 comments
Labels

Comments

@Aklakan
Copy link
Contributor

Aklakan commented Aug 5, 2022

Version

4.6.0-SNAPSHOT

What happened?

I started again looking into the issues I had with Jena in Spark settings; related to https://issues.apache.org/jira/browse/JENA-2309

Right now I am investigating some long standing performance issues where concurrent processing time does not scale directly with the number of cores. Concretely, I am comparing our spark+jena4-based tarql re-implementation with original tarql (jena2).

One culprit is the jena-iri package which uses synchronized singleton lexers which introduce locking overhead between the worker threads. A quick fix is to make those lexers thread-local which reduces the overhead. On my notebook in power save and performance mode I get these improvements:

jena-4.6.0-SNAPSHOT:
power save: 68 sec
performance: 21 sec

thread-local-fix:
power save: 54 sec
performance: 19sec

Profiler output (relevant column is the number of waits):
image

A related issue I am currently investigating is that a lot of time is spent in the IRI parsing machinery e.g. via E_IRI. For testing I changed it to return the argument as given which reduced the total processing time (in performance mode) from 19 to 13 seconds - so around 30% - time that is predominantly spent in the jena-iri lexers. I am not yet sure however if there is anything that can be optimized without compromising functionality though.

Are you interested in making a pull request?

Yes

@Aklakan Aklakan added the bug label Aug 5, 2022
@Aklakan Aklakan changed the title Concurrent read with jena-iri Parser Sub-par concurrent read performance with jena-iri Aug 5, 2022
@afs
Copy link
Member

afs commented Aug 5, 2022

Are you calling jena-iri directly?

1/ (repeated from JENA-2309)
IRIx is an abstraction layer for replaceable IRI implementations.

One such IRI3986 implementation is https://github.com/afs/x4ld/tree/main/iri4ld .
Minimal object creation - one object to record the results per parser call and RFC3986.create is thread-safe.

Other implementations can be plugged in.

2/
The parser pipeline uses a cache to avoid duplicate work: that changes IRI processing from being the significant cost to not the primary cost when parsing on a single thread.

https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/system/FactoryRDFCaching.java#L62
which incidentally has the benefit of reducing memory footprint (IIRC by about a 1/3). Maybe that works in E_IRI.

FYI: tarql/tarql#99 upgrades tarql to Apache Jena 4.5.0

@Aklakan
Copy link
Contributor Author

Aklakan commented Aug 6, 2022

Adding a cache to E_IRI/IRIx should be simple and I can check how much this improves.

How does the iri4ld implementation differ from jena's current default one functionality-wise?
In any case, having less (needless) synchronization between threads is always better.

FYI: tarql/tarql#99 upgrades tarql to Apache Jena 4.5.0

Good to know that its possible to compare performance of spark-based tarql to original tarql within jena4! :)
Especially because then the same IRI machinery is used.

In addition, I noticed that E_BNode also causes waits due to synchronization in a SecureRandom instance. This is probably better handled as a separate issue but for now I just wanted to document it here.
My spark job's runtime (using a test mapping without iri()) jumps from ~4.5 to ~10 seconds only by adding a dummy bnode() call:

CONSTRUCT { <urn:example:s> <urn:example:p> ?a, ?b, ?c } # ... 16 columns in total
FROM <file:data.csv>
WHERE { BIND(bnode(?a) AS ?foobar) }

The same job with tarql/jena2 executes somewhere between 50-60 sec where with bnode it seems to tend more towards 60sec - so in single thread processing the effect is less visible. It seems that threads competing for the bnode call is also a bottleneck.

@afs
Copy link
Member

afs commented Aug 6, 2022

How does the iri4ld implementation differ from jena's current default one functionality-wise?

Javadoc has the operations described:
https://github.com/afs/x4ld/blob/main/iri4ld/src/main/java/org/seaborne/rfc3986/RFC3986.java

An Jena IRIProvider:
https://gist.github.com/afs/a0bf740d1bd1fde283eabeab8b4ddb67

It is a java-coded parser for RFC 3986. The parser is a single file (IRI3986), written with efficiency in-mind. No sub-parsers or tokenizers.

jena-iri is a general system for IRIs. It is complicated to build.

iri4ld simple to build and provides the operations needed for linked data. Like jena-iri, it is independent of the Jena RDF codebase. iri4ld has less in the the way of extras not used by Jena.

The parser is IRI3986.java - all URIs (except it works in Java unicode strings so RFC 3987).

It has some additional scheme specific rule support for the common schemes: it covers "http:", "https:", "did:", "file:" "urn:uuid:", "urn:", "uuid:" (which is not official) and "example:" (RFC 7595).

@afs
Copy link
Member

afs commented Aug 6, 2022

The parsers generate blank nodes by allocating a UUID once at the start of a parser run, then xor'ing the label into the random number. Unlabelled blank nodes get a not-writable label (it has a 0 byte in it) allocated from a counter.

@afs
Copy link
Member

afs commented Aug 6, 2022

IRIx is not the place to put a cache. IRIx is general IRI machinery for any purpose.

The session is provided by an FactoryRDF (FactoryRDFCaching extends FactoryRDFStd implements FactoryRDF). The cache is then of NodeURIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants