Text Editor — Facebook Hacker Cup Qualification Round 2016

Revision en9, by zscoder, 2016-01-12 04:06:49

Abridged Problem Statement : You have a list of N distinct words, and you want to print K of them. You have a very basic text editor : It supports 3 types of operations: typing a letter, deleting the previous letter, and printing the current document. You need to leave the document empty after printing K words. What's the minimum number of operations required to get the job done?

Constraints : 1 ≤ K ≤ N ≤ 300 The total length of all N words in each set will be at most 100, 000 characters.

Solution :

Name the words s1, s2, ..., sN. Let s0 = "". We initially have s0 on the document. After some operations, we want to print k of s1, s2, ..., sN and change the document to s0. For some pair of string si, sj, how many operations do we need to print sj if we have a document with si? Let p be the length of the Longest Common Prefix of si and sj. We can compute this naively in O(min(|si|, |sj|)). Then, we need to delete |si| - p characters, type |sj| - p characters, and print it (1 operation), for a total of |si| + |sj| - 2p + 1 operations. We can precalculate this value for each pair of words si, sj in a short time.

Now, suppose we have chosen our K words s1, s2, ..., sK. In which order do we print them in order to use a minimum amount of operations? We claim that in fact, if we print the strings in lexicographical order, we will use the minimum amount of operations. Suppose not, let s1, s2, ..., sK be arranged in lexicographical order. Suppose we print the words in the sequence s1, s2, ..., si - 1, sj, si + 1, ..., sj - 1, si, sj + 1, ..., sK, with i < j. Let a1, a2, ..., K be the length of word s1, s2, ..., sK respectively and denote the length of the Longest Common Prefix of sa and sb by L(a, b). Note that L(i, j) ≥ L(i, k) if i < j < k. Suppose to the contrary the longest common prefix of si and sk have longer length than the longest common prefix of si and sj. Then, let p be the longest common prefix of si and sk. The prefix of sj does not contain p but this contradicts the fact that si, sj, sk are increasing in lexicographical order. Thus, the inequality holds. We'll use this inequality many times in the calculations below. It can be easily seen that the total number of operations used is 2(a1 + a2 + ... + aK) - 2(L(1, 2) + L(2, 3) + ... + L(i - 1, j) + L(j, i + 1) + ... + L(j - 1, i) + L(i, j + 1) + ... + L(K - 1, K)). Compare this to when the words are printed in lexicographical order. Firstly, if j > i + 1, we just need to compare L(i - 1, j) + L(j, i + 1) + L(j - 1, i) + L(i, j + 1) and L(i - 1, i) + L(i, i + 1) + L(j - 1, j) + L(j, j + 1). We want the former to be at most the latter. Indeed, L(i - 1, i) ≥ L(i - 1, j), L(i, i + 1) ≥ L(i, j - 1), L(j - 1, j) ≥ L(i + 1, j), L(j, j + 1) ≥ L(i, j + 1), since the words are arranged in lexicographical order. So, summing up yields the desired inequality. Now suppose j = i + 1. We want to compare L(i - 1, i + 1) + L(i + 1, i) + L(i, i + 2) and L(i - 1, i) + L(i, i + 1) + L(i + 1, i + 2). Again, we have L(i - 1, i) ≥ L(i - 1, i + 1), L(i + 1, i + 2) ≥ L(i, i + 2), as desired. So, printing the words lexicographically is the optimal way.

Now, we already have our cruical step. It remains to select the K words that minimize the amount of operations. We can do this by dynamic programming. First sort the N words in lexicographical order. Let dp[end][parts] be the minimum number of operations needed to print parts number of words all of whose index is at most end and the last word printed has index end. We want min(dp[1][k], dp[2][k], ..., dp[n][k]). If end < parts, we let dp[end][parts] be INF.

Let cost(i, j) be the number of operations needed to print word j if our current document is word i. Recall that we already know how to compute cost(i, j) in O(1) since we already precomputed the values.

Now, how do we find dp[end][parts]? It is the minimum of dp[i][parts - 1] + cost(i, end) among 1 ≤ i ≤ end - 1. Time complexity is O(n2k). Finally, don't forget to add the cost from s0 to si. (in the beginning and end).

Tags editorial, fhc, facebook hacker cup, dp, string

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en9 English zscoder 2016-01-12 04:06:49 0 (published)
en8 English zscoder 2016-01-10 16:53:19 2 Tiny change: '$L(i, j) \le L(i, k)$' -> '$L(i, j) \ge L(i, k)$'
en7 English zscoder 2016-01-10 16:52:55 471
en6 English zscoder 2016-01-10 14:20:10 9
en5 English zscoder 2016-01-10 14:15:03 841
en4 English zscoder 2016-01-10 14:02:48 641
en3 English zscoder 2016-01-10 13:56:30 1227
en2 English zscoder 2016-01-10 13:32:56 553 Tiny change: 'cters.\n\nName t' -> 'cters.\n\nSolution :\n\nName t'
en1 English zscoder 2016-01-10 13:25:03 645 Initial revision (saved to drafts)