How to merge a set of 3-grams?

Правка en2, от egor.okhterov, 2016-12-03 15:28:51

We have all the possible 3-grams of smaller case letters (A = {a, b, c, ..., z}):
A3 = A × A × A = 
{
  (a, a, a), 
  (a, a, b), 
  (a, a, c), 
    ..., 
  (a, a, z), 
  (b, a, a), 
    ..., 
  (z, z, y), 
  (z, z, z), 
}

|A3| = 263 = 17576

  • If we try to merge aaa with bbb we have to append bbb to the end of aaa, because they don't overlap:
    merge(aaa, bbb) = aaabbb

  • If we merge aaa with aab we can create a smaller string:
    merge(aaa, aab) = aaab

In the best case, if we manage to always do a second kind of merge, we will be extending the resulting string by 1 character. There are 17576 triplets, so there can be at most 17575 single character extensions. This will result in 17575 + 3 = 17578 characters string.

In the worst case, we will append all of the 3-grams to each other, thus creating a 17576 * 3 = 52728 characters string.

How to combine all of them to get the total string of the minimum length?

Теги de bruijn sequence, de bruijn graph, strings

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
ru3 Русский egor.okhterov 2016-12-03 18:40:02 135 Мелкая правка: '*\n\n---\n![ ](htt' -
en4 Английский egor.okhterov 2016-12-03 18:36:17 117 Tiny change: ' length?**' -
en3 Английский egor.okhterov 2016-12-03 17:47:10 6 Tiny change: '-grams of smaller case le' -> '-grams of lower case le'
ru2 Русский egor.okhterov 2016-12-03 17:10:40 2 (опубликовано)
ru1 Русский egor.okhterov 2016-12-03 17:08:48 1380 Первая редакция перевода на Русский (сохранено в черновиках)
en2 Английский egor.okhterov 2016-12-03 15:28:51 948 Tiny change: 'e letters:\n$A_3 = $' - (published)
en1 Английский egor.okhterov 2016-12-03 15:05:57 182 Initial revision (saved to drafts)