vinayaka's blog

By vinayaka, history, 7 months ago, In English

Hi,

My knowledge on String algorithms is poor so I am studying it now.

I solved 271D - Good Substrings using a Trie, but I am trying to solve it using hashing.

I am able to calculate a polynomial rolling hash and compare substrings based on the hash, but my solution is getting WA on test 8 (230797737). I am thinking this is because of collisions, since the expected answer is also pretty high.

I read a topic on double polynomial rolling hash to reduce probability of collisions. I understand that if we use two hashes of order 10^9 then probability will be less because it is equivalent to using a 10^18 modulo.

I am unable to understand how to implement it.

Can anyone help?

Please point me towards an article/blog/submission that has code on implementing it.

Thank you.

  • Vote: I like it
  • +7
  • Vote: I do not like it

»
7 months ago, # |
  Vote: I like it +3 Vote: I do not like it

Just hash it with two different primes. You can check my submission: https://codeforces.com/contest/271/submission/230803346

  • »
    »
    7 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I was confused on how to use the two hashes, now I understand we need to store them in a set, and they both combined point to a substring, this was the knowledge gap! Thank you!

    And..what does this code mean?

    struct hash_pair {
        template <class T1, class T2>
        size_t operator()(const pair<T1, T2>& p) const
        {
            auto hash1 = hash<T1>{}(p.first);
            auto hash2 = hash<T2>{}(p.second);
     
            if (hash1 != hash2) {
                return hash1 ^ hash2;              
            }
             
            // If hash1 == hash2, their XOR is zero.
              return hash1;
        }
    };
    
    unordered_set<pair<int,int>, hash_pair> countSet;
    
    • »
      »
      »
      7 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Earlier you were hashing with one prime and storing in hash set. Now you are hashing with two different primes so you will have to store it in hash set of pair. C++ does not have built in hash for pair, this part is just hashing the pair

»
7 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Using two different bases works as well, which has the added benefit that you can randomize them at runtime to avoid getting hacked.

»
7 months ago, # |
Rev. 2   Vote: I like it +1 Vote: I do not like it

You can just avoid modulo operator entirely. Instead, allow them to overflow, which is essentially behaves like modulo $$$2^{64}$$$ (if you use 64-bit integers).

  • »
    »
    7 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    This worked! 230852302 I didn't think of integer overflows like this till now. Thank you very much!

»
7 months ago, # |
  Vote: I like it 0 Vote: I do not like it

just roll the hashes