Блог пользователя a_little_cute

Автор a_little_cute, история, 5 месяцев назад, По-английски

Can we solve problem 1920C - Partitioning the Array faster than O(n*d(n)) ≈ O(n^(4/3))?

I'm getting interested in it.

  • Проголосовать: нравится
  • +16
  • Проголосовать: не нравится

»
5 месяцев назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

Of course.

»
5 месяцев назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

How could we solve the problem in O(N * d(N))? For each divisor we need to calculate the GCD of the differences of each index [0..k-1] which would result in a final complexity of O(N log N d(N)) if I am not mistaken.

  • »
    »
    5 месяцев назад, # ^ |
      Проголосовать: нравится +18 Проголосовать: не нравится

    Taking the gcd (using the Euclidean algorithm) of elements in an array of size $$$n$$$ and max element $$$x$$$ takes $$$O(n + \log x)$$$ time.

    • »
      »
      »
      5 месяцев назад, # ^ |
      Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

      Ah, my ignorance of basic Number Theory shows once again. Maybe I could have passed with much better time if I implemented the GCD function using the Euclidean algorithm instead of using __gcd, because I got close 1.5 seconds runtime in-con (which seems around N log N d(N)).

      Upd: According to https://stackoverflow.com/questions/24981380/implementation-of-the-in-built-gcd-method-in-c, __gcd is already implemented using the Euclidean algorithm... Maybe the constants are blowing up runtime

      • »
        »
        »
        »
        5 месяцев назад, # ^ |
        Rev. 2   Проголосовать: нравится +11 Проголосовать: не нравится

        Your solution is $$$O(n\log n\ d(n))$$$ since you're sorting all elements for every divisor.

        • »
          »
          »
          »
          »
          5 месяцев назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          You both are completely correct, it appears that I may have been a bit too drunk off of the buzzer-beater AC on C to properly analyze my solution.

      • »
        »
        »
        »
        5 месяцев назад, # ^ |
          Проголосовать: нравится +5 Проголосовать: не нравится

        The most recent gcd implementation (at least in the libstdc++ version I have) uses binary GCD, which doesn't enjoy this nice property. Binary GCD in general is pretty fast, though.

        (Here's a reference for why the claim I mentioned earlier is true).

        I went through your implementation, and it looked like you were doing some other operations and sorting too, which could have had an extra overhead.

        For references, here are some links to my implementations with Euclidean GCD (300ms), inbuilt GCD (650ms), inbuilt GCD without pragmas (700ms). The fact that there is a slight speedup with pragmas might mean that the binary GCD part is getting sped up using specific BMI/BMI2 instructions.

        • »
          »
          »
          »
          »
          5 месяцев назад, # ^ |
          Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

          btw __gcd is probably always euclidean

          std::gcd241431575 ~950 ms

          __gcd: — 241568039 ~400 ms

          • »
            »
            »
            »
            »
            »
            5 месяцев назад, # ^ |
              Проголосовать: нравится +5 Проголосовать: не нравится

            I find this quite interesting, since std::__gcd is libstdc++'s own implementation of GCD, and a divergence between std::gcd and std::__gcd points to either forgetting to update it, or some other more important reason that they might have found.

        • »
          »
          »
          »
          »
          5 месяцев назад, # ^ |
          Rev. 2   Проголосовать: нравится +10 Проголосовать: не нравится

          Can you explain how my most naive GCD implementation runs in 187ms? I can only imagine that comparing elements in different order breaks faster in particular test suite.

        • »
          »
          »
          »
          »
          5 месяцев назад, # ^ |
            Проголосовать: нравится +10 Проголосовать: не нравится

          Thanks for the analysis, I have removed the mentioned overheads and now it indeed performs much better (259 ms).

»
5 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Can you please explain how calculating d(n) is O(n^(1/3))?

»
5 месяцев назад, # |
  Проголосовать: нравится +5 Проголосовать: не нравится

I think the problem is solvable in something like $$$n\cdot\log(n)^2$$$, more precisely, it's $$$\sigma(n)\cdot\log(n)\cdot\omega(n)$$$, where $$$\sigma(n)$$$ is the sum of divisors of $$$n$$$ and $$$\omega(n)$$$ is the number of prime factors of $$$n$$$.

First, we come up with a function $$$g(x,d) \ \forall \ d \ |\ n, 0\le x < d$$$ (there are $$$\sigma(n)$$$ such values we need to calculate), and we'll calculate these values in decreasing order of the divisors.

We define

$$$g(x,d) = \gcd(a_x - a_{d+x}, \cdots, a_{n-2\cdot d+x} - a_{n-d+x})$$$

Now, notice that if $d = p \cdot q$ for some prime factor $$$p$$$ of $$$d$$$, then

$$$g(y,q) = \gcd(a_y - a_{y+q}, \cdots , a_{p-2\cdot q + y} - a_{p - q + y}, g(y, d), \cdots , g(p - q + y, d))$$$

which takes $O(p)$ time to compute for a given $$$y$$$. Since there are $$$q$$$ such values, the total computation time is $$$O(p\cdot q) = O(d)$$$ So, when iterating over the divisors in decreasing order, after we've computed the values of $$$g(x,d)$$$ for all $$$0 \le x < d$$$, we'll be doing this computation for all prime factors of $$$d$$$, which makes the overall complexity $$$O(d\cdot \omega(d))$$$. Since we're iterating in descending order of divisors, when we reach a given divisor, we would have already computed its set of values, and we can do the similar thing for all its prime factors.

Total complexity hence turns out to be $$$\sum_{d \ | \ n} O(d \cdot \omega(d)) = \sigma(n) * \omega(n)$$$. I believe the complexity could even be brought further down if we are somehow smarter about which prime factors to consider, as one divisor would have been calculated multiple times in this approach.