Skip to content

Trie based index for key storage

the-very requested to merge github/fork/ya-goodfella/trie_based_index into master

Created by: ya-goodfella

Trie-based implementation for keys storage. Provides an average complexity O(|key|) for all operations with worst-case performance proportional to the depth of compressed trie. Unfortunately, it can be used only for filterable indexes, but it makes lookup operations much faster. Space requirments remained reasonable comparing to other implementations and depends on dataset:

100K random strings with const prefix:
fixed: 2500008 bytes
variable: 3300012 bytes
trie: 2707918 bytes

64K random 8 byte data:
fixed: 524296 bytes
variable: 1048588 bytes
trie: 1961066 bytes

~100K 4 byte integers from range [20000, 250000000):
fixed: 388180 bytes
variable: 1164528 bytes
trie: 1917660 bytes

Benchmarks below:

Benchmark                                                                                                         Mode   Cnt   Score    Error  Units
ByteArraySortedSetBenchmarks.price_fixed                                                                         thrpt    25  12,137 ±  1,851  ops/s
ByteArraySortedSetBenchmarks.price_trie                                                                          thrpt    25  30,216 ±  1,428  ops/s
ByteArraySortedSetBenchmarks.price_variable                                                                      thrpt    25   8,071 ±  0,871  ops/s
ByteArraySortedSetBenchmarks.random_8bytes_64K_fixed                                                             thrpt    25  18,333 ±  0,523  ops/s
ByteArraySortedSetBenchmarks.random_8bytes_64K_trie                                                              thrpt    25  46,772 ±  2,117  ops/s
ByteArraySortedSetBenchmarks.random_8bytes_64K_variable                                                          thrpt    25  10,669 ±  0,271  ops/s
ByteArraySortedSetBenchmarks.random_prefixed_string_100K_fixed                                                   thrpt    25   5,767 ±  0,175  ops/s
ByteArraySortedSetBenchmarks.random_prefixed_string_100K_trie                                                    thrpt    25  14,527 ±  0,671  ops/s
ByteArraySortedSetBenchmarks.random_prefixed_string_100K_variable                                                thrpt    25   4,304 ±  0,116  ops/s
ByteArraySortedSetBenchmarks.price_fixed                                                                          avgt    25   0,077 ±  0,009   s/op
ByteArraySortedSetBenchmarks.price_trie                                                                           avgt    25   0,035 ±  0,002   s/op
ByteArraySortedSetBenchmarks.price_variable                                                                       avgt    25   0,101 ±  0,002   s/op
ByteArraySortedSetBenchmarks.random_8bytes_64K_fixed                                                              avgt    25   0,054 ±  0,001   s/op
ByteArraySortedSetBenchmarks.random_8bytes_64K_trie                                                               avgt    25   0,023 ±  0,001   s/op
ByteArraySortedSetBenchmarks.random_8bytes_64K_variable                                                           avgt    25   0,096 ±  0,002   s/op
ByteArraySortedSetBenchmarks.random_prefixed_string_100K_fixed                                                    avgt    25   0,180 ±  0,003   s/op
ByteArraySortedSetBenchmarks.random_prefixed_string_100K_trie                                                     avgt    25   0,066 ±  0,002   s/op
ByteArraySortedSetBenchmarks.random_prefixed_string_100K_variable                                                 avgt    25   0,221 ±  0,012   s/op

Additionally I fixed bug in LongArrayBitSet.arraySize() and made possible comparision of empty UnsignedByteArray/Buffer.

Merge request reports