XOR basis without linear algebra

Revision en5, by Everule, 2022-02-17 23:59:34

I think that linear algebra, is a unnecessarily intimidating tool, to solve a problem I believe is far easier, and needlessly intimidates those without formal knowledge of linear algebra, which is a significant part of codeforces. I know I did. Therefore I want to show a step by step problem solving process that will lead to the concept of linear basis at the end, with no formal knowledge required.

We start with the elementary problem of linear basis. Given an array $$$A = [a_1, a_2, \ldots a_n]$$$, where $$$a_i \le 2^d$$$, Choose some subset of elements in $$$A$$$, and XOR them. Find the number of distinct elements you could make.

We start by using a simple dp. Let $$$X_i$$$ be the set of elements you could make with the first $$$i$$$ elements of $$$A$$$. Then its easy to see that $$$X_{i+1}$$$ contains $$$x$$$ and $$$x \bigoplus a_{i+1}$$$ for every $$$x \in X_i$$$. This is an $$$O(2^dn)$$$ algorithm, which is too inefficient.

We need to put some constraints on what $$$X_i$$$ looks like to make further progress. Let us assume $$$x,y$$$ are two elements in $$$X_i$$$. Then $$$x \bigoplus y$$$, must also be in $$$X_i$$$. This is because I can just take the elements I used to make $$$x$$$ and the elements I used to make $$$y$$$. The elements used in both simply cancel out. This means that the new set of elements is also a subset of $$$[a_1, a_2, \ldots, a_i]$$$.

This provides us with an equally powerful converse.

Let $$$x$$$ be some element in $$$X_i$$$, and $$$y$$$ some element not in $$$X_i$$$. Then $$$x \bigoplus y$$$, cannot be in $$$X_i$$$, because if I could make $$$x$$$ and $$$x \bigoplus y$$$, I could make $$$x \bigoplus x \bigoplus y = y$$$.

Intuitively, this tells us that anything with a new element already exists, or is always new. Another way of thinking about this is, we want to attack $$$y$$$ with some set of elements in $$$[a_1, a_2, \ldots a_i]$$$, and kill it by getting it to $$$0$$$.

If I could kill $$$x$$$ and $$$y$$$, I can kill $$$x \bigoplus y$$$ by killing $$$x$$$ first, then $$$y$$$.

If I can kill $$$x$$$ but not $$$y$$$, neither can I kill $$$x \bigoplus y$$$, because I could already get $$$y$$$ to $$$x \bigoplus y$$$, but I couldn't kill that either.

I recommend spending a minute internalizing this.

Now let us try adding a new element $$$a_{i+1}$$$ to $$$X_i$$$ again. I check if I can make $$$a_{i+1}$$$. If I can, I can simply ignore it. If I cannot, then $$$x \bigoplus a_{i+1}$$$ for $$$x \in X_i$$$ is a new element too. So the size of $$$X$$$ will double. Obviously the set can double only $$$d$$$ times. This gives an algorithm in $$$O(n + d2^d)$$$ or $$$O(n + 2^d)$$$ depending on implementation.

if $$$2^d$$$ is large, this algorithm doesn't work either, as we cannot explicitly store $$$X$$$. This means we will need to compress $$$X$$$ in some way. To do that, notice that there are only at most $$$d$$$ elements that matter, because the ones that I ignored, may as well not have been there. So just storing the elements $$$[a_{i_1}, a_{i_2}, \ldots, a_{i_k}]$$$, that doubled the size of $$$X$$$, is enough. $$$X$$$ is just the set of the XORs of every subset of the compressed array.

While we successfully compressed $$$X$$$, we lost our ability to quickly check if $$$a_{i+1}$$$ is in $$$X$$$. We need to get that back.

We cannot guarantee much about the structure of $$$[a_{i_1}, a_{i_2}, \ldots a_{i_k}]$$$. So it is hard to see an algorithm that does this quickly. However it is important to notice that its not the elements itself that are important, but more so what is the set that is generated by the compressed elements. So any edits made to the array, is okay as long it doesn't change the produced set.

We now show that the set $$$[a_{i_1}, a_{i_2}, \ldots a_{i_k}]$$$ and $$$[a_{i_1} \bigoplus a_{i_2}, a_{i_2}, \ldots a_{i_k}]$$$, produce the same $$$X$$$. Or more generally, I can take any element, and xor it with some other element, and it will produce the same set $$$X$$$. This is easy to prove, because if I was using $$$a_{i_2}$$$ in the first set, I use the first and second element in the second set. If I was using both $$$a_{i_1}$$$ and $$$a_{i_2}$$$, then I just need the second element. It is equally good.

While this is enough to solve the given problem, it doesn't do the idea enough justice. In fact, any invertible XOR transformation from $$$[a_1, a_2, \ldots, a_k]$$$ to $$$[b_1, b_2, \ldots, b_k]$$$ will produce the same set.

A XOR transformation is one where $$$b_i$$$ is the XOR of $$$a_j$$$ for all $$$j$$$ in some subset of $$$1,2,3,\ldots, k$$$. An invertible XOR transformation, is one where there exists some XOR transformation from $$$b$$$ back to $$$a$$$.

Let $$$X_a$$$ be the set of elements that are produced by $$$[a_1, a_2, \ldots, a_k]$$$ and similarly define $$$X_b$$$. Then because $$$b_i$$$ belongs to $$$X_a$$$ by definition, every element of $$$X_b$$$ also belongs to $$$X_a$$$, implying $$$X_b \subseteq X_a$$$. But as the operation is invertible, we can similarly argue that $$$X_a \subseteq X_b$$$. This is only possible if $$$X_a = X_b$$$.

Any set of the smallest size that produces $$$X$$$ is a XOR basis, and we can choose any basis of $$$X$$$ we want.

Let us now ask ourselves, what do I want my basis to look like. If some element had a "unique" bit, it would be easy to check if the element should be added to make $$$y$$$. I just need to see if the unique bit exists in $$$y$$$. In fact, we can do this. Let us look at the bits in decreasing order.

If no element has this bit set, we can ignore this bit. If there is already just one element with this bit set, its good already. If there are more than one, we just pick any element, and XOR it with all the other elements with this bit set to make this element the only element with this bit set.

Now since we know if this element is included, by checking the unique bit, we know when to XOR $$$y$$$ with this element. Now we need to check if the new $$$y$$$ can be made from the other elements of the basis. The other elements form a smaller basis, and we can repeat the process, and its trivial to check in a basis with only $$$0$$$. Therefore inductively we have found an efficient, $$$O(poly(d))$$$ way to check if $$$y$$$ is in the basis.

I will leave optimizing this to O(d) as an exercise, as my goal was to explain the conceptual underlying forces that allow the algorithm to work.

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en9 English Everule 2022-02-18 01:05:10 95
en8 English Everule 2022-02-18 00:16:13 53
en7 English Everule 2022-02-18 00:12:06 3 Tiny change: 'know I did. Therefor' -> 'know I didn't. Therefor'
en6 English Everule 2022-02-18 00:02:46 2 Tiny change: 'g this to O(d) as an exe' -> 'g this to $O(d)$ as an exe' (published)
en5 English Everule 2022-02-17 23:59:34 22
en4 English Everule 2022-02-17 23:57:49 1288
en3 English Everule 2022-02-17 23:39:27 801 Tiny change: 'ldots, a_k$ and simi' -> 'ldots, a_k]$ and simi'
en2 English Everule 2022-02-17 23:26:38 3452 Tiny change: 'plus x \bihoplus y = ' -> 'plus x \bigoplus y = '
en1 English Everule 2022-02-17 22:48:03 581 Initial revision (saved to drafts)