Skip to content

Commit 0904a3a

Browse files
author
Ville Tuulos
committedSep 23, 2011
add a comment about a new possible memory-conserving construction interface for discodb
1 parent e7fa012 commit 0904a3a

File tree

1 file changed

+25
-0
lines changed

1 file changed

+25
-0
lines changed
 

‎contrib/discodb/src/ddb_cons.c

+25
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,31 @@
1414
#include <ddb_delta.h>
1515
#include <ddb_cmph.h>
1616

17+
/* Idea:
18+
DiscoDB's memory footprint can be huge in the worst case. Consider e.g.
19+
20+
DiscoDB((title, str(i)) for i, title in enumerate(file('wikipedia-titles')))
21+
22+
which is pretty much the worst case: all keys and values are unique, so
23+
keys_map and values_map just waste space for nothing. Of course there's no way
24+
DiscoDB could know this in advance.
25+
26+
We could provide an alternative interface where the user can maintain the
27+
key/value -> id mapping and hence use all the domain information to conserve
28+
memory. The interface could look as follows:
29+
30+
uint64_t value_id = ddb_cons_new_value(const struct ddb_entry *value);
31+
uint64_t key_id = ddb_cons_new_key(const struct ddb_entry *key);
32+
int ret = ddb_cons_add_id(struct ddb_cons *db, uint64_t key_id, uint64_t value_id);
33+
34+
In this scenario DiscoDB does not need to maintain internal mappings at all,
35+
only two flat arrays for keys (id -> deltalist) and (id -> key) and one for
36+
values (id -> value).
37+
38+
This would be especially convenient in the situations where keys and/or values
39+
are unique or grouped - neither user nor discodb needs to maintain a mapping,
40+
just a one-time id would suffice.
41+
*/
1742

1843
#define BUFFER_INC (1024 * 1024 * 64)
1944

0 commit comments

Comments
 (0)
Please sign in to comment.