islam-al-aarag's blog

By islam-al-aarag, 10 years ago, In English

I was solving problem C from the last testing round 386C - Diverse Substrings
The idea I got was to handle substrings starting at different indices separately:
   1. You will do a precomputation next[i][26] telling you the next occurrence of each of the lower case letters from index i.
   2. You start at index i and each time you jump to the unseen character (using the precomputed array) with the least next occurrence    say k and all substrings starting at i and ending between the last jump and the current jump has diversity of the size of seen    characters so far.
The implementation was direct 5706503 and it was accepted.
I was playing around to optimize it when I thought I shouldn't loop on all characters and check what has been seen or not but instead maintain an actual set of the unseen characters so far and loop over them exclusively. The implementation was direct too 5706533.
To my surprise, this code timed out. I still can't explain why, it's supposed to be doing less loop iterations and hence less comparisons.
Any help would be appreciated.

  • Vote: I like it
  • -10
  • Vote: I do not like it

»
10 years ago, # |
  Vote: I like it +4 Vote: I do not like it

The AC submission was only 18 milliseconds under the time limit. Maybe just making the extra HashSet pushed the runtime over 1 second.

  • »
    »
    10 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I know that original code wasn't that fast but I am comparing the old to the new one. The new code is supposed to be faster as it contains less comparisons.

    • »
      »
      »
      10 years ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      Faster? With HashSet instead of boolean array? Of course it cannot be faster.

      • »
        »
        »
        »
        10 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Not because of the hash set.
        Imagine at a certain point you have k (k <= 26) unseen characters. In the next step when searching for the next jump:
        - In the old code, you will still loop over 26 letters and check whether it's seen or not in the array and eventually mark the selected one as seen. So that makes finding in 26 steps and marking in 1 step.
        - In the new code you will loop over k characters and mark the selected in log(k)
        So instead of exactly 27 steps in the old code pre update you do k + log k which is less when k <=22 (i.e., after you see 4 characters)

        • »
          »
          »
          »
          »
          10 years ago, # ^ |
            Vote: I like it +2 Vote: I do not like it

          Do you think these operations have the same perfomance?

          • »
            »
            »
            »
            »
            »
            10 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            What operations?

            • »
              »
              »
              »
              »
              »
              »
              10 years ago, # ^ |
              Rev. 2   Vote: I like it 0 Vote: I do not like it

              Doing add/remove/query on a HashSet has a larger constant factor than updates to a boolean array do.