Fast polynomial composition

Правка en2, от adamant, 2022-04-15 14:37:36

Hi everyone!

Today I want to describe an efficient solution of the following problem:


Composition of Formal Power Series. Given $$$A(x)$$$ and $$$B(x)$$$ of degree $$$n-1$$$ such that $$$B(0) = 0$$$, compute $$$A(B(x)) \pmod{x^n}$$$.


The condition $$$B(0)=0$$$ doesn't decrease the generality of the problem, as $$$A(B(x)) = P(Q(x))$$$, where $$$P(x) = A(x+B(0))$$$ and $$$Q(x) = B(x) - B(0)$$$. Hence you could replace $$$A(x)$$$ and $$$B(x)$$$ with $$$P(x)$$$ and $$$Q(x)$$$ when the condition is not satisfied.

Solutions that I'm going to describe were published in Joris van der Hoeven's article about operations on formal power series. The article also describes a lot of other common algorithms on polynomials. It is worth noting that Joris van der Hoeven and David Harvey are the inventors of breakthrough $$$O(n \log n)$$$ integer multiplication algorithm in the multitape Turing machine model.

Naive methods

An obvious solution would be to evaluate $$$B(x)$$$ in $$$n^2$$$ points, then evaluate $$$A(B(x_i))$$$ in these points and interpolate $$$(A \circ B)(x)$$$. This solution would require $$$O(n^2 \log^2 n)$$$ operations with an enormous constant. Not the best idea for $$$n \leq 8000$$$.

Another possible approach is to go from the definition of the polynomial composition:

$$$ A(B(x)) = \sum\limits_{i=0}^{n-1} a_i B^i(x). $$$

You could compute $$$B^2(x), B^3(x), \dots, B^{n-1}(x)$$$ modulo $$$x^n$$$ one by one and sum them up, which would take $$$O(n^2 \log n)$$$ operations.

Divide and conquer

If we write $$$A(x) = A_0(x) + x^t A_1(x)$$$, then

$$$ A(B(x)) = A_0(B(x)) + B^t(x)A_1(B(x)). $$$

With this decomposition, you can reduce the computation of $$$A(B(x))$$$ to $$$2$$$ recursive calls to $$$A_0(B(x))$$$ and $$$A_1(B(x))$$$ with $$$O(n \log n)$$$ overhead for the multiplication. If you use $$$t = \lfloor \frac{n}{2} \rfloor$$$, the degree of $$$A_0$$$ will be at most $$$\lfloor \frac{n}{2} \rfloor$$$ and the degree of $$$A_1$$$ will be $$$\lceil \frac{n}{2} \rceil$$$.

Considering possible values of $$$t$$$ over all recursion levels they will be at most $$$O(\log n)$$$ different values, hence you can compute necessary powers of $$$B(x)$$$ in advance, which would require $$$O(n \log^2 n)$$$ cumulative time for the pre-processing.

The overall time complexity for this algorithm is still $$$O(n^2 \log n)$$$, but it allows significant constant-time optimizations, which is enough to pass the Library Checker problem. For this one need to carefully consider how many terms do you really need on each level of recursion.

In a more generic situation when $$$\deg A = p$$$ and $$$\deg B = q$$$, it is possible to prove the alternative complexity estimation of $$$O(pq \log^2 n)$$$, which is a bit better when $$$p$$$ or $$$q$$$ are significantly smaller than $$$n$$$.

Square root trick

In 1978, Brent and Kung suggested a faster algorithm to compute $$$A \circ B$$$ assuming that formal power series are taken over the division ring. Instead of decomposing $$$A(x)$$$, it is possible to decompose $$$B(x) = B_0(x) + x^q B_1(x)$$$ and use the Taylor expansion:

$$$ A(B(x)) = \sum\limits_{k=0}^{\lfloor n / q \rfloor} x^{kq} B_1^k(x)\frac{A^{(k)}(B_0(x))}{k!} + O(x^n). $$$

For this expression, it generally holds that

$$$ \frac{\partial}{\partial x} A^{(k)}(B(x)) = B'(x) A^{(k+1)}(B(x)), $$$

which allows to compute $$$A^{(1)}(B_0(x)), A^{(2)}(B_0(x)),\dots$$$ in $$$O(\frac{n^2}{q} \log n)$$$ by successively multiplying them with $$$B_0'(x)$$$ and integrating, starting with $$$A^{(\lfloor n/q \rfloor)}(B_0(x))$$$, which is computed with the divide and conquer algorithm above with the complexity $$$O(nq \log^2 n)$$$.

Optimal value of $$$q$$$ to minimize both complexities is $$$q = \sqrt{\frac{n}{\log n}}$$$, with which the final complexity is

$$$ O( n \sqrt{n \log n} \log n) $$$
Теги tutorial, polynomials, power series

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en9 Английский adamant 2022-06-12 16:29:38 286
en8 Английский adamant 2022-04-15 21:09:14 8
en7 Английский adamant 2022-04-15 20:36:50 13
en6 Английский adamant 2022-04-15 19:20:54 143
en5 Английский adamant 2022-04-15 18:33:46 346
en4 Английский adamant 2022-04-15 17:43:50 1020
en3 Английский adamant 2022-04-15 16:59:42 1317 Tiny change: 'for $n=pq$ (to compute full composition), the runn' -> 'for $n=pq$, the runn' (published)
en2 Английский adamant 2022-04-15 14:37:36 1526 Tiny change: '} + O(x^n)\n$$' -> '} + O(x^n).\n$$'
en1 Английский adamant 2022-04-15 05:02:00 2436 Initial revision (saved to drafts)