YouKn0wWho's blog

By YouKn0wWho, 5 years ago, In English

There's a string $$$S$$$ of length $$$N$$$ containing $$$1s$$$ and $$$0s$$$ only, you don't know the string but you know $$$N (N<=1000)$$$. You can ask at most $$$1024$$$ questions: is $$$P$$$ a substring of $$$S$$$? $$$P$$$ can be of any length. Then you have to know the string $$$S$$$.

This comment claims that this problem has a deterministic solution as well as a nice randomization solution. But I am thinking of the solution for a pretty long time and nothing popped out of my tiny brain. Can you help me?

Note: I am interested in the randomized solution more. But any solution will quench my thirst!

  • Vote: I like it
  • +27
  • Vote: I do not like it

»
5 years ago, # |
  Vote: I like it +45 Vote: I do not like it

I am not sure whether "a proper" (with negligible probability of failure) randomized solution exists, but the following should work in the contest enviroment, assuming that problem is not a multitest one.

Let's maintain some substring $$$p$$$ of $$$s$$$. Initially, $$$p$$$ is empty and on each stage we try to extend $$$p$$$ by appending a random symbol $$$c$$$ to the right of $$$p$$$. There are three possibilities.

  • a) $$$p + c$$$ is a substring of $$$s$$$.

  • b) $$$p + \overline{c}$$$ is a substring of $$$s$$$, where $$$\overline{c}$$$ denotes $$$0$$$ for $$$c = 1$$$ and $$$1$$$ for $$$c = 0$$$.

  • c) Neither $$$p + c$$$, nor $$$p + \overline{c}$$$ are substrings of $$$s$$$. That happens if and only if $$$p$$$ occurs in $$$s$$$ as a suffix and does not have other occurences.

That leads us to an algorithm that makes at most $$$2002$$$ queries: ask interactor about strings $$$p + c$$$ and $$$p + \overline{c}$$$. After two such queries we can either extend $$$p$$$ or realize that $$$p$$$ is a proper suffix of $$$s$$$.

In the second case, we can start appending symbols to the front of $$$p$$$. Indeed, if $$$p$$$ is a proper suffix of $$$s$$$, then either $$$0 + p$$$, or $$$1 + p$$$ is a suffix of $$$s$$$ (no exceptions now). So, we can find the exact value of $$$s$$$ in $$$|s| - |p|$$$ queries. Along with at most $$$2(|p|+1)$$$ queries we used to reach the state when we are sure that $$$p$$$ is a suffix of $$$s$$$, the total number of queries does not exceed $$$|s|+|p|+2 \leqslant 2|s| + 2 = 2002$$$.

Let's optimize the number of queries used in order to fit in $$$1024$$$ queries. Let's investigate the three situations I named a)-c) more closely. If the alternative a) happens, then we extended $$$p$$$ by one symbol at the cost of one query — good enough.

What will we do if alternative a) does not happen? We can't tell situations b) and c) apart immediately, without expending a query each time, which is too wasteful.

However, we can assume that situation b) happened and pretend that $$$p + \overline{c}$$$ is a substring of $$$s$$$. That is, we will append $$$\overline{c}$$$ to $$$p$$$ from the right and continue the algorithm anyway, pretending that we "know" that $$$p + \overline{c}$$$ is a substring of $$$s$$$. Of course, we can't do that indefinitely, we will make a mistake at some moment.

Suppose that we did incorrectly pretend that $$$p + \overline{c_1}$$$ is a substring of $$$s$$$ when $$$p$$$ was a suffix of $$$s$$$, but neither $$$p + c_1$$$, nor $$$p + \overline{c_1}$$$ are substrings of $$$s$$$. Then all our subsequent queries ($$$p + \overline{c_1} + c_2$$$, $$$p + \overline{c_1} + \overline{c_2}$$$, e.t.c) won't be substrings of $$$s$$$.

On the other hand, if we did not make a mistake yet (because $$$p$$$ occurs in $$$s$$$ as a non-suffix substring), then either $$$p + 0$$$ or $$$p + 1$$$ is a substring of $$$s$$$. Therefore, by choosing $$$c$$$ randomly, we have a $$$1/2$$$ probability of asking a query that indeed is a substring of $$$s$$$.

All in all, we can do the following: extend $$$p$$$ by the above algorithm, which uses one query per character, until we get $$$19$$$ consequitive negative answers to our queries. By the above paragraph, unless we made a mistake somewhere along these $$$19$$$ queries, the probability of this happening is very low ($$$2^{-19}$$$, to be exact). So we can safely assume that a mistake did indeed happen. Suppose that $$$q$$$ was the last string with positive answer. Since then, the strings $$$q + c_1$$$, $$$q + \overline{c_1} + c_2$$$, $$$\ldots$$$, $$$q + \overline{c_1} + \overline{c_2} + \ldots + c_{19}$$$ all turned out to be not substrings of $$$s$$$. By above, $$$q + \overline{c_1} + \overline{c_2} + \ldots + \overline{c_{19}}$$$ is not a substring of $$$s$$$ as well with probability at least $$$1 - 2^{-19}$$$.

Now, we are left with the situation when we have some a string $$$q$$$, which is a substring of $$$s$$$, substring $$$q + \overline{c_1} + \overline{c_2} + \ldots + \overline{c_{19}}$$$, which is not, and, more importantly, all superstrings of $$$q$$$ that are still substrings of $$$s$$$ have the form $$$q + \overline{c_1} + \overline{c_2} + \ldots + \overline{c_k}$$$ with some $$$k$$$ from $$$0$$$ to $$$18$$$: we ruled out all other possibilities, because of negative answers to queries $$$q + c_1$$$, $$$q + \overline{c_1} + c_2$$$, $$$\ldots$$$, $$$q + \overline{c_1} + \overline{c_2} + \ldots + c_{19}$$$.

Now we can do binary search for largest $$$k$$$ such that $$$q + \overline{c_1} + \ldots + \overline{c_k}$$$ is a substring of $$$s$$$ (requires only $$$\lceil \log_2 18 \rceil = 5$$$ queries. After that, we are left with some proper suffix $$$p$$$ of $$$s$$$ and we used at most $$$|q| + 19 + 5 \leqslant |p| + 24$$$ queries. Now, use remaining $$$|s| - |p|$$$ queries to find out value of $$$s$$$ by appending symbols to the front of $$$p$$$.

Of course, there is no magical properties in the numbers $$$19$$$ and $$$5$$$, they were chosen in order to minimize probability of mistaking unluckiness for situation c) occuring somewhere along the way, while still fitting in query limit. We can roughly estimate this probability by $$$1000 \cdot 2^{-19} \leqslant 2 \cdot 10^{-3}$$$: we had at most $$$|s|$$$ opportunities to detect an occurence of situation c) when it did not actually occur, and each such false positive had a probability of $$$2^{-19}$$$.

The probability of getting AC if there are $$$100$$$ tests in the problem is at least $$$1 - 2 \cdot 100 \cdot 10^{-3} = 4/5$$$. While not very impressive, this still means that you are more likely to get AC with this solution then not.

  • »
    »
    5 years ago, # ^ |
    Rev. 4   Vote: I like it 0 Vote: I do not like it

    Awesome man. Such a beautiful explanation! Thank you. I guess it will work.