Dramatic Difference between Unordered Set and Policy Based Hash Table

→ Pay attention

Before contest
Codeforces Round 946 (Div. 3)
3 days
Register now »

→ Streams

By aryanc403

Before stream 14:27:08

View all →

→ Top rated

#	User	Rating
1	tourist	3690
2	jiangly	3647
3	Benq	3581
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	Radewoosh	3509
8	ecnerwala	3486
9	jqdai0815	3474
10	gyh20	3447

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	157
6	maroonrk	155
7	-is-this-fft-	152
8	Petr	146
8	orz	146
10	BledDest	145

View all →

→ Find user

→ Recent actions

Detailed →

harryzhengipsum's blog

Dramatic Difference between Unordered Set and Policy Based Hash Table

By harryzhengipsum, history, 18 months ago, In English

On 704A - Thor, the policy-based hash table easily passes under the time limit, but the unordered set version TLEs on case 56.

With gp_hash_table: 181117819

With unordered_set: 181133673

Both submissions use a custom hash function, so it can't be due to collisions. On test case 56, the unordered_set suddenly gets TLE but it's only slightly slower than the gp_hash_table on all the other test cases. Why is this?

-3

harryzhengipsum
18 months ago
9

Comments (8)

Show archived | Write comment?

TooFast

18 months ago, # |

It is getting TLE because erasing from unordered_set is slow, a normal set should get AC.

→ Reply

harryzhengipsum

18 months ago, # ^ |

Is it because unordered_set is returning the iterator after when calling .erase()? Didn't know that it could result in an almost 10x slowdown. Are there any ways to prevent that, or should we simply resort to using gp_hash_table/set?

→ Reply

Spheniscine

18 months ago, # ^ |

← Rev. 4 →

+12

It's because when an unordered_set is cleared, the entries are removed, but the underlying allocation still remains the same size. Then when you iterate over it, it has to check the entire allocation to find the elements. In other words, iteration is $$$O(N)$$$ for $$$N$$$ being the hashtable's current capacity, not the number of elements actually in it.

So if you add a lot of elements to a set, then clear it or iterate over it many times, it'll be slow.

One way to avoid this is to construct a new set instead of clearing the current set.

Vectors / dynamic arrays do not have this problem, as even if the allocation is much larger than the number of actual elements, it still knows how many elements it has and where they are (at the beginning), so it only has to iterate or clear those. But a hashmap/hashset could have elements scattered anywhere in its allocation, and would typically need extra metadata and implementation if you want it to be able to iterate and clear proportional to number of elements rather than its capacity.

→ Reply

harryzhengipsum

18 months ago, # ^ |

← Rev. 2 →

But .erase() and .find() should both be $$$O(1)$$$ regardless of hashset/table size. Plus, it TLEs even when the hashsets are cleared by using ={}.

Without using .clear(): 181154447

→ Reply

harryzhengipsum

18 months ago, # ^ |

← Rev. 2 →

Actually, I got it to work by clearing it via swapping with a temporary empty array. I'm still not seeing where the actual $$$O(N)$$$ bottleneck is though (is it because .clear() is $$$O(N)$$$ each time?), since .erase() and .find() are both $$$O(1)$$$ on a hashset with few collisions.

With swapping: 181154447

→ Reply

Spheniscine

18 months ago, # ^ |

Yes, clearing is also proportional to the size of the allocation / the capacity, not the number of elements (because the elements could be anywhere in the allocation)

→ Reply

_iAmSiam_

18 months ago, # |

Do this instead of unreadItems[b].clear()

Spoiler

unordered_set<int, custom_hash>emt;
unreadItems[b] = emt;

→ Reply

lis05

18 months ago, # |

Everyone hates unordered_set

→ Reply