Compressed segment trees and merging sets in O(N logN)

Revision en5, by bicsi, 2020-09-29 18:54:30

Hello!

Today I will present a data structure that I have worked on for my Bachelor thesis a while ago, called (by me) a PreTree (link to thesis). This is a type of "compressed" trie that is arguably easier to implement than a regular compressed trie, and it achieves merging in amortized $$$O(N L)$$$ complexity for $$$N$$$ keys of equal length $$$L$$$. I will present an implementation as a segment tree, though, as it is easier to understand.

Prerequisite: know segment trees, and "sparse" segment trees (a.k.a. bitwise tries).

Main idea

We will keep a regular segment tree, with the twist that the root contains the minimum key. Basically, a PreTree on keys in the universe $$$[b, e)$$$ is a binary tree $$$T$$$ where either $$$T$$$ is empty, or the following conditions hold:

  1. $$$T$$$ holds the lowest key present in the range $$$[b, e)$$$
  2. $$$T$$$'s left son is a PreTree on values $$$[b, \frac{b + e}{2})$$$
  3. $$$T$$$'s right son is a PreTree on values $$$[\frac{b + e}{2}, e)$$$

Because $$$T$$$ is a one-to-one mapping between tree vertices and keys, then the memory complexity of $$$T$$$ is obviously $$$O(N)$$$.

The reason for the name PreTree comes from it being a "pretty tree" (of course, beauty is subjective), and also because their pre-order traversals yield keys in sorted order.

Implementation

In order to make the implementation easier, we will define a single big operation called "merging" two PreTrees. Suppose we have two trees $$$T1$$$ and $$$T2$$$. Merging the two trees would be done as the following recursive procedure:

function Merge(T1, T2, b, e):
  if T1 is empty then return T2
  if T2 is empty then return T1
  
  m = (b + e) / 2
  if T1.k > T2.k then swap T1 and T2
  T1.l = Merge(T1.l, T2.l, b, m)
  T1.r = Merge(T1.r, T2.r, m, e)
  T2.l = T2.r = empty
  
  if T1.k != T2.k:    
    ## We have to insert T2 into the corresponding child
    if T2.k >= m:
      T1.r = Merge(T1.r, T2, m, e)
    else:
      T1.l = Merge(T1.l, T2, b, m)
  return T1

Insertion

In order to insert a key into the tree, just merge it with a singleton tree.

Erase

This is a bit more involved. Notice, however, that any PreTree $$$T$$$ is a min-heap on the keys. Find the key and then pop it as you would do in a heap.

Predecessors

In order to find the greatest key smaller than a given $$$k$$$, one can do the following:

function Smaller(T, k):
  if T is empty or T.k >= k:
    return NOT_FOUND
  ret = Smaller(T.r, k)
  if ret is NOT_FOUND:
    ret = Smaller(T.l, k)
  if ret is NOT_FOUND:
    ret = T
  return ret

Finding the smallest key greater than or equal to a given $$$k$$$ is possible, albeit a little more complicated. The simplest way would be to also keep the maximum on the subtree. I will omit the actual implementation.

Performance

The Merge function seems not too efficient at first glance. However, the amortized complexity of it is bounded by $$$O(N log(U))$$$, where $$$U$$$ is the size of the universe, and $$$N$$$ is the total number of nodes. In order to see why is that, we will have to look at the depths of each node in the tree.

Let's suppose $$$T1$$$ and $$$T2$$$ have distinct keys. Each non-trivial call to the Merge function increases the depth of at least one vertex. The reason that is true is because, after each call, the root node with the higher key will get "demoted" one level downwards, therefore increasing its depth by one. As depths of a node can never exceed $$$O(log(U))$$$, the total complexity cannot exceed $$$O(N log(U))$$$.

Follow-up 1: Prove that this still holds for non-distinct keys!

Follow-up 2: What about erase operations? How do they affect the amortized analysis?

Follow-up 3: What about (Treap) splitting operations? Can you split T into two trees, depending on some (monotonic) predicate? How does that affect complexity?

Applications

One of the most standard applications of this is to improve the $$$O(N log^2(N))$$$ bound on various problems involving the small-to-large trick. It has the additional improvement of having $$$O(N)$$$ memory without making the implementation a lot harder.

Example problems:

Could you tell me about some other problems involving some form of sparse segment trees and other tasks where this could come handy, to add them to the blog?

Tags #segment tree, compressed trie, #data structure

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en6 English bicsi 2020-09-29 21:59:59 2
en5 English bicsi 2020-09-29 18:54:30 178
en4 English bicsi 2020-09-29 14:16:23 316
en3 English bicsi 2020-09-29 13:40:07 4 Tiny change: 'ion/94180781) (here we' -> 'ion/94180797) (here we'
en2 English bicsi 2020-09-29 13:38:38 6
en1 English bicsi 2020-09-29 10:03:23 4478 Initial revision (published)