Apple_Method's blog

By Apple_Method, history, 3 years ago, In English

Hi everyone,

I was thinking about the following string problem:

You are given a string of length n, and q queries. Each query is a substring of the original string (s[l ... r]). For each query, you want to find the number of prefixes of the substring that are equal to a suffix. For example:

abcabc

2

1 3

1 5

For the first query, the answer should be 1, and the second query, the answer should be 2 (abcab == abcab, and ab == ab).

I haven't made much progress so far (I tried getting aho-corasick and suffix array to work, but I didn't find a way). Assuming $$$n \le 10^5$$$ and $$$q \le 10^5$$$, none of my ideas work.

Is there a solution to this problem, or is the lower bound on the time complexity $$$O(n^2+q)$$$ or $$$O(nq)$$$ (the best solutions i have found so far).

Because the problem has no source (I was thinking about problem ideas and stumbled upon this one) there might be no answer, but the problem seems so simple that there has to be one.

Excuse me if there are any mistakes here, as this is my first blog ever

  • Vote: I like it
  • +40
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it +8 Vote: I do not like it

Auto comment: topic has been updated by Apple_Method (previous revision, new revision, compare).

»
3 years ago, # |
  Vote: I like it +8 Vote: I do not like it

Auto comment: topic has been updated by Apple_Method (previous revision, new revision, compare).

»
3 years ago, # |
  Vote: I like it +17 Vote: I do not like it

We can count the number of borders (prefixes that are suffixes) by repeatedly taking the longest proper border.

Suppose the longest proper border of $$$s$$$ has length $$$|s|-p$$$. Then $$$s_i=s_{i+p}$$$ for each $$$1\le i\le |s|-p$$$, so there's a border of length $$$|s|-kp$$$ for each valid $$$k\ge 1$$$. We can use this to skip to the shortest border of length at least $$$|s|/2$$$ (we won't miss any borders at least up to here). This reduces the length of the string by a constant factor, so will only happen $$$O(\log{n})$$$ times.

So we just need to quickly find the longest proper border of a substring. This seems like it could be done with string suffix data structures, but I haven't figured out how.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +16 Vote: I do not like it

    Oh, Thanks!

    I will think about how to solve the subproblem, but now it seems so much simpler.