Need help in very interesting OA problem

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	162
4	TheScrasse	159
5	nor	157
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
9	Petr	145
9	pajenegod	145

Many of my friends asked this interesting OA problem to me:

Given an array of strings $$$A$$$ of size $$$N$$$ and two integers $$$B$$$ and $$$C$$$.

Let $$$D$$$ be the number of strings of length $$$C$$$ that contains exactly $$$B$$$ of the strings in $$$A$$$ as substring.

Return $$$D$$$ mod $$$10^9+9$$$.

Constraints

$$$1 \le N \le 6$$$

$$$1 \le |A[i]| \le 50$$$

All the $$$N$$$ strings are distinct

$$$0 \le B \le N$$$

$$$1 \le C \le 50.$$$

Note that if a substring belonging to set $$$A$$$ occurs multiple times in my resulting string, it is counted only once.

My approach:

Let $$$Z$$$ be the size of the alphabet $$$(26)$$$.

Let $$$dp[i][j][k][m]$$$ denote the number of strings satisfying the constraints:

1) It is of length $$$i$$$.

2) The longest suffix present in it which is a proper prefix of some string belonging to set $$$A$$$ is the substring $$$[k...i]$$$ and the string whose proper prefix is the $$$j^{th}$$$ string in set $$$A$$$, in case of multiple such strings in $$$A$$$ choose the one with longest length. In case no such suffix exists, we can put a fake "empty string" at index $$$0$$$ in set $$$A$$$ (rest of the strings are numbered from $$$1$$$ to $$$N$$$) and assume that substring is $$$[i+1 , i]$$$.

3) The substrings (belonging to set $$$A$$$) which has already been occurred is denoted by mask $$$m$$$, more formally, if the $$$w^{th}$$$ string in $$$A$$$ has already occurred, then $$$w^{th}$$$ bit of $$$m$$$ is set, otherwise its not.

I'll write transitions via forward style dp, if I'm adding the $$$(i+1)^{th}$$$ character, then, it might "complete" some substrings, by this I mean, some suffix which was a proper prefix of some string in $$$A$$$ before adding character will now be a complete string belonging to set $$$A$$$. Note that all such strings will be the suffix of that longest suffix.

So, some new bits in mask $$$m$$$ will be set, All this can be calculated, since we already know the longest suffix, in fact lets precalculate, $$$info[i][j][k]$$$ which gives a tuple $$$(bitmask, L, idx)$$$. If we consider the prefix of length $$$j$$$ of $$$i^{th}$$$ string in set $$$A$$$ and add character $$$k$$$ at the end, $$$w^{th}$$$ bit in bitmask is set iff, entire $$$w^{th}$$$ string in $$$A$$$ occurs as a substring in that prefix after adding character $$$k$$$, $$$L$$$ denotes the length of the longest suffix in resulting string (after adding character $$$k$$$) that is a proper prefix of $$$idx^{th}$$$ string in set $$$A$$$, this precomputation can be done naively in $$$O(N*C*Z*N)$$$.

So, after adding $$$(i+1)^{th}$$$ character (denote it by $$$c$$$), new mask is $$$ (m | info[j][i-k+1][c][0])$$$, new length of longest suffix is $$$info[j][i-k+1][c][1]$$$ so, add its contribution towards state $$$dp[i+1][info[j][i-k+1][c][2]][i+2-info[j][i-k+1][c][1]][(m | info[j][i-k+1][c][0])]$$$.

I think the complexity would be $$$O(C*N*C*2^{N}*Z)$$$, which might pass. However, I think I'm overkilling it and there has to be simpler solution. I'm not even sure whether my solution is correct or not.

Comments (10)

Write comment?

ShivanshJ

7 months ago, # |

Auto comment: topic has been updated by ShivanshJ (previous revision, new revision, compare).

→ Reply

induk_v_tsiane

← Rev. 2 →

Edit: incorrect

Geothermal

7 months ago, # ^ |

How do you ensure that we count all strings that occur? For example, suppose that you expect the strings to appear in the order 1, 2, 3, 4, 5, 6 and are counting occurrences where four of the strings appear. How are you ensuring that string 5 never appears before any of the first four strings?

I do not, thank you for correcting me.

A simpler solution (in the sense of requiring less original thought after applying a standard algorithm; the underlying idea is probably basically the same) that has the same time complexity as yours is to build an Aho-Corasick automaton on the $$$N$$$ input strings. For each node in the automaton, create a bitmask indicating which of the $$$N$$$ input strings could end in this node.

Then, run a DP where your state is the number of characters used, a mask indicating which of the $$$N$$$ strings have appeared in the string you're building, and the node of the Aho-Corasick automaton corresponding to the string you've built so far. To transition, iterate over the next letter in the string, transition to the corresponding Aho-Corasick node, and add any input strings appearing in that node to your bitmask.

There are $$$O(C2^N N |A[i]|)$$$ states and $$$26$$$ transitions per state, giving the same complexity as your solution.

Thanks! I don't know Aho Corasick, time for me to learn this!

bicsi

It's literally the same solution though, is it not?

navneet.h

← Rev. 3 →

I am little confused in understanding the task.

(Edit: I got it now , so we need to find A_i as substring of assumed Good string S not in other way. I thought we need to find number of substring in A_i or in total A)

yup N <= 6, DP bitmask seems the good intuitive way here.

S3cr3t_C0d3r

oh! i have a solution of O(|A|)

ShivanshJ's blog