Блог пользователя Apple_Method

Автор Apple_Method, история, 3 года назад, По-английски

Hi everyone,

I was thinking about the following string problem:

You are given a string of length n, and q queries. Each query is a substring of the original string (s[l ... r]). For each query, you want to find the number of prefixes of the substring that are equal to a suffix. For example:

abcabc

2

1 3

1 5

For the first query, the answer should be 1, and the second query, the answer should be 2 (abcab == abcab, and ab == ab).

I haven't made much progress so far (I tried getting aho-corasick and suffix array to work, but I didn't find a way). Assuming $$$n \le 10^5$$$ and $$$q \le 10^5$$$, none of my ideas work.

Is there a solution to this problem, or is the lower bound on the time complexity $$$O(n^2+q)$$$ or $$$O(nq)$$$ (the best solutions i have found so far).

Because the problem has no source (I was thinking about problem ideas and stumbled upon this one) there might be no answer, but the problem seems so simple that there has to be one.

Excuse me if there are any mistakes here, as this is my first blog ever

  • Проголосовать: нравится
  • +40
  • Проголосовать: не нравится

»
3 года назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

Auto comment: topic has been updated by Apple_Method (previous revision, new revision, compare).

»
3 года назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

Auto comment: topic has been updated by Apple_Method (previous revision, new revision, compare).

»
3 года назад, # |
  Проголосовать: нравится +17 Проголосовать: не нравится

We can count the number of borders (prefixes that are suffixes) by repeatedly taking the longest proper border.

Suppose the longest proper border of $$$s$$$ has length $$$|s|-p$$$. Then $$$s_i=s_{i+p}$$$ for each $$$1\le i\le |s|-p$$$, so there's a border of length $$$|s|-kp$$$ for each valid $$$k\ge 1$$$. We can use this to skip to the shortest border of length at least $$$|s|/2$$$ (we won't miss any borders at least up to here). This reduces the length of the string by a constant factor, so will only happen $$$O(\log{n})$$$ times.

So we just need to quickly find the longest proper border of a substring. This seems like it could be done with string suffix data structures, but I haven't figured out how.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +16 Проголосовать: не нравится

    Oh, Thanks!

    I will think about how to solve the subproblem, but now it seems so much simpler.