Compressed segment trees and merging sets in O(N logN)

№	Пользователь	Рейтинг
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

№	Пользователь	Вклад
1	maomao90	173
2	awoo	164
3	adamant	163
4	TheScrasse	160
5	nor	157
6	maroonrk	156
7	-is-this-fft-	152
8	Petr	146
8	orz	146
10	pajenegod	145

Hello!

Today I will present a data structure that I have worked on for my Bachelor thesis a while ago, called (by me) a PreTree (link to thesis). This is a type of "compressed" trie that is arguably easier to implement than a regular compressed trie, and it achieves merging in amortized $$$O(N L)$$$ complexity for $$$N$$$ keys of equal length $$$L$$$. I will present an implementation as a segment tree, though, as it is easier to understand.

Prerequisite: know segment trees, and "sparse" segment trees (a.k.a. bitwise tries).

Main idea

We will keep a regular segment tree, with the twist that the root contains the minimum key. Basically, a PreTree on keys in the universe $$$[b, e)$$$ is a binary tree $$$T$$$ where either $$$T$$$ is empty, or the following conditions hold:

$$$T$$$ holds the lowest key present in the range $$$[b, e)$$$
$$$T$$$'s left son is a PreTree on values $$$[b, \frac{b + e}{2})$$$
$$$T$$$'s right son is a PreTree on values $$$[\frac{b + e}{2}, e)$$$

Because $$$T$$$ is a one-to-one mapping between tree vertices and keys, then the memory complexity of $$$T$$$ is obviously $$$O(N)$$$.

Implementation

In order to make the implementation easier, we will define a single big operation called "merging" two PreTrees. Suppose we have two trees $$$T1$$$ and $$$T2$$$. Merging the two trees would be done as the following recursive procedure:

function Merge(T1, T2, b, e):
  if T1 is empty then return T2
  if T2 is empty then return T1
  
  m = (b + e) / 2
  if T1.k > T2.k then swap T1 and T2
  T1.l = Merge(T1.l, T2.l, b, m)
  T1.r = Merge(T1.r, T2.r, m, e)
  T1.l = T1.r = empty
  
  if T1.k != T2.k:    
    ## We have to insert T1 into the corresponding child
    if T2.k >= m:
      T1.r = Merge(T1.r, T2, m, e)
    else:
      T1.l = Merge(T1.l, T2, b, m)
  return T1

Insertion

In order to insert a key into the tree, just merge it with a singleton tree.

Erase

This is a bit more involved. Notice, however, that any PreTree $$$T$$$ is a min-heap on the keys. Find the key and then pop it as you would do in a heap.

Predecessors

In order to find the greatest key smaller than a given $$$k$$$, one can do the following:

function Smaller(T, k):
  if T is empty or T.k >= k:
    return NOT_FOUND
  ret = Smaller(T.r, k)
  if ret is NOT_FOUND:
    ret = Smaller(T.l, k)
  if ret is NOT_FOUND:
    ret = T
  return ret

Finding the smallest key greater than or equal to a given $$$k$$$ is possible, albeit a little more complicated. The simplest way would be to also keep the maximum on the subtree. I will omit the actual implementation.

Performance

The Merge function seems not too efficient at first glance. However, the amortized complexity of it is bounded by $$$O(N log(U))$$$, where $$$U$$$ is the size of the universe, and $$$N$$$ is the total number of nodes. In order to see why is that, we will have to look at the depths of each node in the tree.

Let's suppose $$$T1$$$ and $$$T2$$$ have distinct keys. Each non-trivial call to the Merge function increases the depth of at least one vertex. The reason that is true is because, after each call, the root node with the higher key will get "demoted" one level downwards, therefore increasing its depth by one. As depths of a node can never exceed $$$O(log(U))$$$, the total complexity cannot exceed $$$O(N log(U))$$$.

Follow-up 1: Prove that this still holds for non-distinct keys!

Follow-up 2: What about erase operations? How do they affect the amortized analysis?

Follow-up 3: What about (Treap) splitting operations? Can you split T into two trees, depending on some (monotonic) predicate? How does that affect complexity?

Applications

One of the most standard applications of this is to improve the $$$O(N log^2(N))$$$ bound on various problems involving the small-to-large trick. It has the additional improvement of having $$$O(N)$$$ memory without making the implementation a lot harder.

Example problems:

343D. Water Tree | AC Submission (here we can implicitly encode the keys in the nodes' value, yielding a very elegant and short implementation)

Could you tell me about some other problems involving some form of sparse segment trees and other tasks where this could come handy, to add them to the blog?

Rev.	Кто	Когда	Δ	Комментарий
en6	bicsi	2020-09-29 21:59:59	2
en5	bicsi	2020-09-29 18:54:30	178
en4	bicsi	2020-09-29 14:16:23	316
en3	bicsi	2020-09-29 13:40:07	4	Tiny change: 'ion/94180781) (here we' -> 'ion/94180797) (here we'
en2	bicsi	2020-09-29 13:38:38	6
en1	bicsi	2020-09-29 10:03:23	4478	Initial revision (published)