Optimize prefixes for queries #95

arj03 · 2021-01-22T00:59:19Z

This took a while to wrap my head around. Basically before we were making it easy to create prefix indexes (just insert into tarr), but slow to query them loop. Instead lets optimize for the query because prefix indexes are used for very often used queries (key, votes, hasRoot etc.). The natural choice was to store prefixes as maps instead of arrays. Mapping the lookup key to sequences. Besides a bit slower index creation (25-70%) the size has almost doubled. I did not find a better solution than just storing them JSON stringified, but I'm sure there is a better way because we are just storing numbers here.

Query numbers for already created indexes using perf script:

old

running: key initial
query: 130.071ms
running: key 2
query: 16.11ms
running: key again
query: 14.645ms
running: latest root posts
query: 88.219ms
running: latest posts
query: 166.388ms
running: votes initial
query: 43.506ms
running: votes 2
query: 11.124ms
running: votes again
query: 4.939ms
running: author posts
query: 225.619ms
running: author posts again
query: 39.766ms

new

running: key initial
query: 294.774ms
running: key 2
query: 10.437ms
running: key again
query: 8.192ms
running: latest root posts
query: 77.361ms
running: latest posts
query: 137.998ms
running: votes initial
query: 93.16ms
running: votes 2
query: 0.902ms
running: votes again
query: 0.796ms
running: author posts
query: 414.573ms
running: author posts again
query: 33.201ms

staltz · 2021-01-22T08:58:45Z

Oh, as a Map, it's basically an inverted index now! I'll review and think about this.

arj03 · 2021-01-22T12:19:33Z

After my talk with @staltz I did some tests where I tried to create bitsets on top of the prefix arrays and using the bitsets to narrow down the number of rows we need to run through. Note that modern CPUs are blazingly fast at running through arrays so we are up against 10ms for 1.5million.

The bitsets for one example query looks like this:

1434795 81918
0 8
1434795 81918
0 8
0 8
1434795 81918
0 8
0 8
739229 81918
717741 81918
649757 81918
581403 81918
716629 81918
852737 81918
1166050 81918
0 8
739782 81918
716175 81918
650761 81918
582353 81918
718284 81918
852146 81918
1165671 81918
0 8
740033 81918
717083 81918
650634 81918
582317 81918
718617 81918
851702 81918
1166308 81918

the first number being the number of number of matches, the second the size of the bitset in bytes.

Using the 8 "best" bitsets we get the following results:

found 25
found 27
found 28
found 30
union bitsets: 11.472ms
target prefix union 45058
loopy: 7.105ms

Meaning it used 4 of those bitsets to narrow down the result to 45k instead of 1.5m. Sadly the union operation is about the same speed as just running over the array, so I'm not sure if that idea is worth persuing more.

I'll fix whatever is making the benchmark tests not run and see if we can save this to storage in a better way. There is the possibility of doing this prefix map as another type, so we have the existing prefix also.

arj03 · 2021-01-22T12:31:57Z

Another idea we discussed was not saving the 0 (key) buckets for the map implementation. Going to test that.

staltz · 2021-01-22T14:16:25Z

I actually worked on this branch for 10mins just to put the perf.js inside benchmark/index.js, and it worked. I didn't commit and push because I had to go out, but the code is ready. I'll later rebase onto this branch, so you don't need to do it. :) @arj03

github-actions · 2021-01-22T14:17:36Z

Benchmark results

Part	Duration
Load core indexes	7ms
Query 1 big index (1st run)	1080ms
Query 1 big index (2nd run)	321ms
Query 3 indexes (1st run)	632ms
Query 3 indexes (2nd run)	280ms
Paginate 1 big index	291ms

arj03 · 2021-01-22T14:19:12Z

Sweet. Thank you :)

staltz · 2021-01-22T14:30:19Z

Aaaactually that was on db2, not in jitdb. But still, probably useful.

arj03 · 2021-01-22T14:46:53Z

So turns out the skip 0 is a really good idea for things where not all messages have values and you don't need to query for unknown. So like votes.

Before:

-rw-rw-r-- 1 arj arj 14233563 Jan 22 15:18 ssb/db2/indexes/key.32prefixmap
-rw-rw-r-- 1 arj arj 10661035 Jan 22 15:18 ssb/db2/indexes/value_author.32prefixmap
-rw-rw-r-- 1 arj arj 12015680 Jan 22 15:18 ssb/db2/indexes/value_content_vote_link.32prefixmap

After:

-rw-rw-r-- 1 arj arj 14233563 Jan 22 15:17 ssb/db2/indexes/key.32prefixmap
-rw-rw-r-- 1 arj arj 10661035 Jan 22 15:17 ssb/db2/indexes/value_author.32prefixmap
-rw-rw-r-- 1 arj arj  4634479 Jan 22 15:17 ssb/db2/indexes/value_content_vote_link.32prefixmap

arj03 · 2021-01-22T15:05:17Z

That translates of course into a bit faster initial load on votes if used. I'm not exactly sure where to go next with this. For votes this really makes a lot of sense. The indexes are smaller, and much faster. Not really for author so maybe add a new index type?

Existing prefix:

6,4M	ssb/db2/indexes/value_content_vote_link.32prefix

github-actions · 2021-01-22T15:09:13Z

Benchmark results

Part	Duration
Load core indexes	8ms
Query 1 big index (1st run)	1164ms
Query 1 big index (2nd run)	363ms
Query 3 indexes (1st run)	687ms
Query 3 indexes (2nd run)	305ms
Paginate 1 big index	344ms

staltz · 2021-01-22T17:02:40Z

Yep, I'm all for making different types of indexes and tailoring the operators to use different types of indexes.

github-actions · 2021-01-23T08:22:53Z

Benchmark results

Part	Duration
Load core indexes	10ms
Query 1 big index (1st run)	1038ms
Query 1 big index (2nd run)	377ms
Query 3 indexes (1st run)	660ms
Query 3 indexes (2nd run)	265ms
Paginate 1 big index	289ms

github-actions · 2021-01-23T08:23:27Z

Benchmark results

Part	Duration
Load core indexes	7ms
Query 1 big index (1st run)	885ms
Query 1 big index (2nd run)	275ms
Query 3 indexes (1st run)	517ms
Query 3 indexes (2nd run)	207ms
Paginate 1 big index	225ms

arj03 · 2021-01-23T08:25:51Z

@staltz this should now be ready. I ended up with a new useMap flag that will then create a new file. I tested that going from a ordinary prefix index to a map prefix index works. This should be good for votesFor and hasRoot. I was also thinking that this could be really good for about.

staltz · 2021-01-23T09:52:20Z

Nice! I checked the code and looks neat so far. I have a few more things I'd like to review.

Other question: have you tried BIPF instead of JSON.parse/stringify?

arj03 · 2021-01-23T09:55:44Z

Right I forgot to say that. I did. Sadly its not faster, especially for the saving part and we don't get any of the "I can just get this part", we really need to decode the whole thing.

staltz · 2021-01-23T10:12:49Z

@arj03 I added a commit to benchmark prefix map indexes in CI. If I understood correctly, the 1st run of a prefix map is slightly slower than the 1st run of a prefix non-map index, but the 2nd run is noticeably faster (from ~16ms down to ~4ms). Is that your experience as well?

(CI will report bench results soon)

arj03 · 2021-01-23T10:13:35Z

Exactly

github-actions · 2021-01-23T10:16:32Z

Benchmark results

Part	Duration
Load core indexes	13ms
Query 1 big index (1st run)	865ms
Query 1 big index (2nd run)	264ms
Query 3 indexes (1st run)	529ms
Query 3 indexes (2nd run)	210ms
Paginate 1 big index	224ms
Query a prefix map (1st run)	224ms
Query a prefix map (2nd run)	6ms

index.js

staltz · 2021-01-23T10:35:15Z

index.js

+  function updatePrefixMapIndex(opData, index, buffer, seq, offset) {
+    if (seq > index.count - 1) {
+      const fieldStart = opData.seek(buffer)
+      if (fieldStart) {


@arj03 Thanks. Did you also see my comment about ~fieldStart? Or is this intentionally testing for "field must not be at the beginning of the buffer"?

It was keeping it in line with normal prefix indexes. It seems you are correct in that both have a bug if its at position 0 in the buffer.

Oh, I didn't realize that normal prefix indexes had that too! Oops

staltz · 2021-01-23T10:35:44Z

Note, slowEqual also needs to support useMap

github-actions · 2021-01-23T10:40:24Z

Benchmark results

Part	Duration
Load core indexes	6ms
Query 1 big index (1st run)	899ms
Query 1 big index (2nd run)	272ms
Query 3 indexes (1st run)	520ms
Query 3 indexes (2nd run)	245ms
Paginate 1 big index	232ms
Query a prefix map (1st run)	227ms
Query a prefix map (2nd run)	6ms

operators.js

arj03 · 2021-01-23T10:51:19Z

Pushed up a new commit. I'm away for the rest of the day, but feel free to merge if you want to try this out in manyverse :) And thanks for the review

github-actions · 2021-01-23T10:55:20Z

Benchmark results

Part	Duration
Load core indexes	6ms
Query 1 big index (1st run)	900ms
Query 1 big index (2nd run)	268ms
Query 3 indexes (1st run)	517ms
Query 3 indexes (2nd run)	236ms
Paginate 1 big index	247ms
Query a prefix map (1st run)	240ms
Query a prefix map (2nd run)	7ms

github-actions · 2021-01-23T11:28:14Z

Benchmark results

Part	Duration
Load core indexes	7ms
Query 1 big index (1st run)	1072ms
Query 1 big index (2nd run)	375ms
Query 3 indexes (1st run)	633ms
Query 3 indexes (2nd run)	248ms
Paginate 1 big index	301ms
Query a prefix map (1st run)	269ms
Query a prefix map (2nd run)	8ms

github-actions · 2021-01-23T11:35:12Z

Benchmark results

Part	Duration
Load core indexes	9ms
Query 1 big index (1st run)	1114ms
Query 1 big index (2nd run)	396ms
Query 3 indexes (1st run)	699ms
Query 3 indexes (2nd run)	289ms
Paginate 1 big index	305ms
Query a prefix map (1st run)	286ms
Query a prefix map (2nd run)	8ms

Add useMap flag for inverted (map) prefix indexes

de88057

arj03 force-pushed the prefix-map branch from 3423e94 to de88057 Compare January 23, 2021 08:17

add benchmark for prefix map indexes

5afae5f

staltz reviewed Jan 23, 2021

View reviewed changes

index.js Outdated Show resolved Hide resolved

Minor optimization to prefix map indexes

713303f

staltz reviewed Jan 23, 2021

View reviewed changes

staltz requested changes Jan 23, 2021

View reviewed changes

operators.js Outdated Show resolved Hide resolved

Fix prefix map name & fix useMap in slowEqual

15e14f1

minor fixes to updater of prefix indexes

0559868

staltz approved these changes Jan 23, 2021

View reviewed changes

staltz merged commit bf42131 into master Jan 23, 2021

staltz deleted the prefix-map branch January 23, 2021 11:28

staltz mentioned this pull request Jan 25, 2021

some operators use useMap and prefix ssbc/ssb-db2#137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize prefixes for queries #95

Optimize prefixes for queries #95

arj03 commented Jan 22, 2021

staltz commented Jan 22, 2021

arj03 commented Jan 22, 2021

arj03 commented Jan 22, 2021

staltz commented Jan 22, 2021

github-actions bot commented Jan 22, 2021

arj03 commented Jan 22, 2021

staltz commented Jan 22, 2021

arj03 commented Jan 22, 2021

arj03 commented Jan 22, 2021

github-actions bot commented Jan 22, 2021

staltz commented Jan 22, 2021

github-actions bot commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

arj03 commented Jan 23, 2021 •

edited

Loading

staltz commented Jan 23, 2021

arj03 commented Jan 23, 2021

staltz commented Jan 23, 2021

arj03 commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

staltz Jan 23, 2021

arj03 Jan 23, 2021

staltz Jan 23, 2021

staltz commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

arj03 commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

Optimize prefixes for queries #95

Optimize prefixes for queries #95

Conversation

arj03 commented Jan 22, 2021

old

new

staltz commented Jan 22, 2021

arj03 commented Jan 22, 2021

arj03 commented Jan 22, 2021

staltz commented Jan 22, 2021

github-actions bot commented Jan 22, 2021

Benchmark results

arj03 commented Jan 22, 2021

staltz commented Jan 22, 2021

arj03 commented Jan 22, 2021

arj03 commented Jan 22, 2021

github-actions bot commented Jan 22, 2021

Benchmark results

staltz commented Jan 22, 2021

github-actions bot commented Jan 23, 2021

Benchmark results

github-actions bot commented Jan 23, 2021

Benchmark results

arj03 commented Jan 23, 2021 • edited Loading

staltz commented Jan 23, 2021

arj03 commented Jan 23, 2021

staltz commented Jan 23, 2021

arj03 commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

Benchmark results

staltz Jan 23, 2021

Choose a reason for hiding this comment

arj03 Jan 23, 2021

Choose a reason for hiding this comment

staltz Jan 23, 2021

Choose a reason for hiding this comment

staltz commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

Benchmark results

arj03 commented Jan 23, 2021

github-actions bot commented Jan 23, 2021

Benchmark results

github-actions bot commented Jan 23, 2021

Benchmark results

github-actions bot commented Jan 23, 2021

Benchmark results

arj03 commented Jan 23, 2021 •

edited

Loading