harryzhengipsum's blog

By harryzhengipsum, history, 18 months ago, In English

On 704A - Thor, the policy-based hash table easily passes under the time limit, but the unordered set version TLEs on case 56.

With gp_hash_table: 181117819

With unordered_set: 181133673

Both submissions use a custom hash function, so it can't be due to collisions. On test case 56, the unordered_set suddenly gets TLE but it's only slightly slower than the gp_hash_table on all the other test cases. Why is this?

  • Vote: I like it
  • -3
  • Vote: I do not like it

| Write comment?
»
18 months ago, # |
  Vote: I like it +8 Vote: I do not like it

It is getting TLE because erasing from unordered_set is slow, a normal set should get AC.

  • »
    »
    18 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Is it because unordered_set is returning the iterator after when calling .erase()? Didn't know that it could result in an almost 10x slowdown. Are there any ways to prevent that, or should we simply resort to using gp_hash_table/set?

    • »
      »
      »
      18 months ago, # ^ |
      Rev. 4   Vote: I like it +12 Vote: I do not like it

      It's because when an unordered_set is cleared, the entries are removed, but the underlying allocation still remains the same size. Then when you iterate over it, it has to check the entire allocation to find the elements. In other words, iteration is $$$O(N)$$$ for $$$N$$$ being the hashtable's current capacity, not the number of elements actually in it.

      So if you add a lot of elements to a set, then clear it or iterate over it many times, it'll be slow.

      One way to avoid this is to construct a new set instead of clearing the current set.

      Vectors / dynamic arrays do not have this problem, as even if the allocation is much larger than the number of actual elements, it still knows how many elements it has and where they are (at the beginning), so it only has to iterate or clear those. But a hashmap/hashset could have elements scattered anywhere in its allocation, and would typically need extra metadata and implementation if you want it to be able to iterate and clear proportional to number of elements rather than its capacity.

      • »
        »
        »
        »
        18 months ago, # ^ |
        Rev. 2   Vote: I like it +3 Vote: I do not like it

        But .erase() and .find() should both be $$$O(1)$$$ regardless of hashset/table size. Plus, it TLEs even when the hashsets are cleared by using ={}.

        Without using .clear(): 181154447

      • »
        »
        »
        »
        18 months ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        Actually, I got it to work by clearing it via swapping with a temporary empty array. I'm still not seeing where the actual $$$O(N)$$$ bottleneck is though (is it because .clear() is $$$O(N)$$$ each time?), since .erase() and .find() are both $$$O(1)$$$ on a hashset with few collisions.

        With swapping: 181154447

        • »
          »
          »
          »
          »
          18 months ago, # ^ |
            Vote: I like it +6 Vote: I do not like it

          Yes, clearing is also proportional to the size of the allocation / the capacity, not the number of elements (because the elements could be anywhere in the allocation)

»
18 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Do this instead of unreadItems[b].clear()

Spoiler
»
18 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Everyone hates unordered_set