partial_sort running time analysis

4 года назад, # |

That would require using a heap which has $$$O(1)$$$ insertion, and those usually have a really bad constant.

→ Ответить

4 года назад, # ^ |

How is the O(n.logk) achieved then? If not using heaps/PQs.

→ Ответить

4 года назад, # ^ |

For the $$$O(n$$$ $$$log$$$ $$$k)$$$ algorithm you can have a heap with both $$$O(log$$$ $$$size)$$$ insertion and deletion of first element, for which an ordinary binary heap works which is pretty fast in practice. For the $$$O(n + k$$$ $$$log$$$ $$$k)$$$ algorithm you would need to have a heap with $$$O(1)$$$ insertion and $$$O(log$$$ $$$size)$$$ deletion of first element, which usually have a much higher constant factor.

→ Ответить

4 года назад, # ^ |

+15

You do not need to insert all elements one by one. That would take O(n.logn) just to make the heap.

If I remember my Data Structures class correctly, one can simply use heapify to build a heap of n numbers in O(n) time. Doesn't necessarily need O(1) insertion for that.

→ Ответить

4 года назад, # ^ |

Yeah, that's actually true. In that case I don't know the answer.

→ Ответить

gzchenyin

4 года назад, # ^ |

a heap could be made in a complexity of O(n) by using the function makeheap(). It is a practical algorithm .It works in the way of making heap from the bottom to the top. In this problem，we don't need to insert the element N times.

→ Ответить

4 года назад, # ^ |

Yes, that was already mentioned.

→ Ответить

okwedook

4 года назад, # |

+16

It can be implemented using nth_element and sort in-place and $$$O(n + klogk)$$$ time. So every bigger time seems strange to me.

→ Ответить

riadwaw

4 года назад, # ^ |

nth-element guarantees only average complexity iirc

→ Ответить

balbit

4 года назад, # ^ |

Doesn't it also use median of medians, which guarantees complexity?

→ Ответить

4 года назад, # ^ |

We can use median of medians as pivot-selecting method and it will ensures Linear complexity, while afaik nth_element uses quickselect which can lead us to bad partitions and only ensures linear complexity on an average.

→ Ответить

4 года назад, # ^ |

Median of medians/ QuickSelect takes O(n) time to find i-th smallest number. So to find all 1 <= i <= k it'll take O(kn) time right?

→ Ответить

4 года назад, # ^ |

← Rev. 3 →

Quick select has worst case O(N^2). So instead of using Quick select, we can use median of medians algorithm to partition the array at position k(like nth_element does) as it ensures linear complexity and then sort the k elements which will be O(n + klogk) as okwedook suggested above.

→ Ответить

4 года назад, # ^ |

Oh so we just need to partition once. I was thinking maybe for every i<=k we need to run the algorithm again. Yep makes sense! Thanks

→ Ответить

4 года назад, # ^ |

In nth_element pure quickselect isn't used; it tries quickselect for the first $$$O(log$$$ $$$n)$$$ layers and then it ends by using a heap on the remaining part of the array, so on average the complexity is $$$O(n)$$$ while in the worst case it's $$$O(n$$$ $$$log$$$ $$$n)$$$.

→ Ответить

4 года назад, # ^ |

← Rev. 2 →

Any sources for this? The first answer to this question says that it is nowhere mentioned that which algorithms it uses if quick select fails due to bad partitions.

→ Ответить

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35968

4 года назад, # ^ |

https://github.com/gcc-mirror/gcc/blob/d9375e490072d1aae73a93949aa158fcd2a27018/libstdc%2B%2B-v3/include/bits/stl_algo.h#L1964

→ Ответить

4 года назад, # ^ |

Ok, thanks for pointing out i will update the statement above.

→ Ответить

Slides: https://github.com/CppCon/CppCon2018/blob/master/Presentations/a_little_order_delving_into_the_stl_sorting_algorithms/a_little_order_delving_into_the_stl_sorting_algorithms__fred_tingaud__cppcon_2018.pdf

4 года назад, # ^ |

+10

I just realised another "theoretical" way to achieve this $$$O(n+k\log k)$$$ bound. We could heapify the array in $$$O(n)$$$ time. Then construct another heap (starting from empty) and insert $$$(value,index)$$$ pairs from old array to new one. Every time pop one pair and push two (its children in the original array). Of course coding this would be a mess, but it would guarantee the theoretical upper bound of $$$O(n+k\log k)$$$.

→ Ответить

Gassa

4 года назад, # |

+12

An interesting question.

There is a presentation from CppCon 2018 by Fred Tingaud which addresses the matter.

Video, presumably: https://www.youtube.com/watch?v=-0tO3Eni2uo.

Here's a short summary from slide 87:

The typical usage of std::partial_sort is to sort a small subset of elements in a big container.
The STL implementers chose a faster $$$O(N \cdot \log(k))$$$ algorithm that performs well for this typical use-case at the expense of other scenarios.

→ Ответить