“Log Decomposition”, a technique to insert random elements into sorted data structures

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	165
3	adamant	161
4	TheScrasse	160
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	SecondThread	145
9	pajenegod	145

Motivation

Dynamic CHT with sets is hard to write. I haven't learned LiChao Tree, but I will share this technique anyways, as it can be applied to other data structures.

AbdelmagedNour explained this to me $$$8$$$ months ago, and it introduced a new way of thinking to me. I could not find any references to something similar on the internet, so I decided I'll give it my best shot here.

If this already has a name, please comment it.

Prerequisites:

There are no prerequisities; however, the problem I have posted for checking implementation uses Convex Hull Trick. Please read this blog if you don't know about it.

The problem:

Assume we have a data structure which can add an element in $$$O(A)$$$ and query in $$$O(Q)$$$; however, elements have to be added in monotonic order of some valid comparison function. If we receive the elements in monotonic order, we have no problems! But what if we don't? My claim is that it is possible to add all $$$N$$$ elements in amortized

Unable to parse markup [type=CF_MATHJAX]

, and query for our property in $$$O(Q \log N)$$$. Usually we want to query the maximum element/minimum element, or any property in which we can easily merge $$$2$$$ possible answers.

A classical example of this is the convex hull trick data structure. This data structure allows us to insert lines in the form $$$y = mx + c$$$, and query the minimum/maximum value of $$$y$$$ for a given $$$x$$$. It works in amortized $$$O(n)$$$ where $$$n$$$ is the number of lines added. However, this assumes that lines are sorted in monotonic order of their slopes. What if they aren't?

Initially we create an array $$$blk$$$ of size $$$\lceil{\log N}\rceil$$$. $$$blk[i]$$$ will store a version of our data structure of size $$$2^i$$$, initially, $$$blk[i]$$$ is empty for all $$$i$$$.

Now, to add the first element, we just add it to

Unable to parse markup [type=CF_MATHJAX]

; but for the second element, we clear $$$blk[0]$$$, use merge sort to sort $$$blk[0]$$$ and the second element, and insert them into $$$blk[1]$$$. The third element is inserted into $$$blk[0]$$$, the fourth is merged using merge sort with $$$blk[0],blk[1]$$$, and they are all inserted into $$$blk[2]$$$, (and $$$blk[0],blk[1]$$$ are emptied).

A simple code demonstrating it would look something like this:

DS merge(DS& a, DS& b) {
 // This function merges the 2 sorted data structures. This would depend on how the data structure works.
}
void insert(T element) {
  DS current;
  current.insert(element);
  int index = 0;
  while (blk[index].size > 0) {
   current = merge(current, blk[index]), blk[index].clear();
   index++;
  }
  blk[index] = current;
}

Why is the complexity amortized $$$O(AN \log N)$$$?

Notice that each element you add will go through at most $$$\log N$$$ places in the $$$blk$$$ array, and we add $$$N$$$ elements, each of which needs $$$O(A)$$$ to be inserted into the data structure, so it amortizes to $$$O(AN \log N)$$$.

Now, to query on this data structure, we choose the best possible answer throughout all non empty data structures in $$$blk$$$.

A simple code demonstrating it would like this:

T query(U input) {
  T ans = identity;
  for (auto& ds : blk) {
    if (ds.empty()) continue;
    ans = better(ans, ds.query(input)); // better returns whichever answer is better, or merges them, again, some things change depending on the data structure.
  }
}

The complexity of this is clearly $$$O((Q+X) \log N)$$$, where $$$O(X)$$$ is the complexity of the better function.

As a practice for this technique, you can solve this problem.

Extra Problems

If you know any more problems which apply this technique (with any data structure), please comment and I'll add it to the list :)

P.S. This is my first educational blog, if there's any mistake or any improvements I can make, please criticize me in the comments :)

Comments (8)

Write comment?

AbdelmagedNour

4 months ago, # |

← Rev. 2 →

+26

I didn't see that coming :)

One thing I would like to add, I know only two usefull application for this one (I don't know if there are any more). The first one is the CHT mentioned in the blog as this method is much faster than just using sqrt decomposition.

The other one is making an insertion in the aho corasick, like I have a set of patterns and two types of queries either insert one more pattern or query for a certain string "how many pattern from the set appeared in it as a substring", again using this method will give us an $$$O(NlogN)$$$ solution where $$$N$$$ is the total length of the input which is much better than using sqrt decomposition.

→ Reply

adamant

4 months ago, # ^ |

For CHT, Li Chao Tree would be simpler, and without second $$$O(\log n)$$$ factor.

Once, there was a problem I couldn't use LCT in because of the double precission (but in case of comparing 2 lines I can avoid the division), to solve that problem I used sqrt decomposition that time, so this one is still usefull to know I think. Sadly I don't have the link of the problem now, I only remember it was on dmoj

thenymphsofdelphi

+25

This is idea 4 of General ideas by adamant!

ismail-but-noob

Interesting blog!

I hope that mine is more beginner-friendly so I will keep it :)

Thanks for the insight!

brunomont

One cool application of this trick: 710F - String Set Queries. I wrote an explanation about this in this blog.

Yes! AbdelmagedNour mentioned this in his comment :)

I have edited the blog to include this problem and a link to your blog.

okwedook

I don't really know what it's called, but I've heard "Add to merge data structure"
The simplest application for this technique is std::set w/o erasure, with $$$O(logn)$$$ addition and $$$O(log^2n)$$$ lookup

On a more serious note, we can make this technique a bit more general. Consider the following problem:
You're given $$$Q$$$ queries of the form
- Add string $$$s$$$ into the dictionary
- Query the number of occurrences of each word from the dictionary in a given text $$$t$$$

The static problem (without addition) is doable in $$$O(\sum|s| + |t|)$$$ via Aho-Corasick algorithm

Now to solve the dynamic problem, let's build more Aho-Corasicks. Let's suppose the size of an Aho-Corasick is the sum of length of strings in the dictionary. Let's have one of size in range $$$[1, 2)$$$, one in range $$$[2, 4)$$$, $$$[4, 8)$$$ and so on. If we need two Aho-Corasicks in the same range, we can merge them and get one of the next range. So by applying the same technique we can build at most $$$O(log \sum |s|)$$$ data structures which can each be queried in $$$O(|t|)$$$. So the total complexity would be $$$O(log(\sum|s|) \cdot (\sum |s| + \sum |t|))$$$.

Another cool thing about it is we can perform deletions via the same data structure. Just store the deleted strings and subtract the query value. The same constraints as in Fenwick tree holds though (existence of inverse element).

ismail-but-noob's blog

Motivation

Prerequisites:

The problem: