How to merge a set of 3-grams?

Revision en3, by egor.okhterov, 2016-12-03 17:47:10

We have all the possible 3-grams of lower case letters (A = {a, b, c, ..., z}):
A3 = A × A × A = 
{
  (a, a, a), 
  (a, a, b), 
  (a, a, c), 
    ..., 
  (a, a, z), 
  (b, a, a), 
    ..., 
  (z, z, y), 
  (z, z, z), 
}

|A3| = 263 = 17576

  • If we try to merge aaa with bbb we have to append bbb to the end of aaa, because they don't overlap:
    merge(aaa, bbb) = aaabbb

  • If we merge aaa with aab we can create a smaller string:
    merge(aaa, aab) = aaab

In the best case, if we manage to always do a second kind of merge, we will be extending the resulting string by 1 character. There are 17576 triplets, so there can be at most 17575 single character extensions. This will result in 17575 + 3 = 17578 characters string.

In the worst case, we will append all of the 3-grams to each other, thus creating a 17576 * 3 = 52728 characters string.

How to combine all of them to get the total string of the minimum length?

Tags de bruijn sequence, de bruijn graph, strings

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
ru3 Russian egor.okhterov 2016-12-03 18:40:02 135 Мелкая правка: '*\n\n---\n![ ](htt' -
en4 English egor.okhterov 2016-12-03 18:36:17 117 Tiny change: ' length?**' -
en3 English egor.okhterov 2016-12-03 17:47:10 6 Tiny change: '-grams of smaller case le' -> '-grams of lower case le'
ru2 Russian egor.okhterov 2016-12-03 17:10:40 2 (опубликовано)
ru1 Russian egor.okhterov 2016-12-03 17:08:48 1380 Первая редакция перевода на Русский (сохранено в черновиках)
en2 English egor.okhterov 2016-12-03 15:28:51 948 Tiny change: 'e letters:\n$A_3 = $' - (published)
en1 English egor.okhterov 2016-12-03 15:05:57 182 Initial revision (saved to drafts)