purplesyringa's blog

By purplesyringa, history, 2 months ago, In English

Just do this!!

#include <iostream>

int main() {
    std::ios_base::sync_with_stdio(false);
    std::cin.tie(nullptr);

    // Your code here
    int n;
    std::cin >> n;
    long long sum = 0;
    for (int i = 0; i < n; i++) {
        int x;
        std::cin >> x;
        sum += x;
    }
    std::cout << sum << std::endl;
}

This will speed up your code so much you won't have to care about spending time in I/O anymore!! Give likez plz :3

Full text and comments »

  • Vote: I like it
  • +288
  • Vote: I do not like it

By purplesyringa, history, 5 months ago, In English

TL;DR: the popular empirical bounds for FFT stability overestimate the precision by a few bits -- multiplication might thus produce wrong answer even after thorough testing on random inputs.

Introduction

So here's how people typically advise you to use FFT after proving various theorems and providing a $$$\mathcal{O}(n \log n)$$$ algorithm.

Suppose you want to multiply two big integers, stored in binary. Split them into groups of $$$B$$$ bits, called $$$a_0, a_1, \dots$$$ and $$$b_0, b_1, \dots$$$ respectively, so that the integers are equal to $$$a_0 + a_1 x + \dots$$$, $$$b_0 + b_1 x + \dots$$$ for $$$x = 2^B$$$. Multiply the polynomials $$$(a_0 + a_1 x + \dots) (b_0 + b_1 x + \dots)$$$ via FFT and substitute $$$x = 2^B$$$ to obtain the $$$n$$$-bit product.

$$$B$$$ is a variable here -- the algorithm works for every $$$B$$$, but larger $$$B$$$s are faster. We're limited by precision of 'double' though, because 'double' can only precisely store integers from $$$0$$$ to $$$2^{53} \approx 9 \cdot 10^{15}$$$ (e.g. $$$2^{53}+1$$$ cannot be represented by double). So we certainly require $$$2B + \log_2 \frac{n}{B} \le 53$$$ to be able to represent the product, but the actual limit is not $$$53$$$ but a bit less because of precision loss at intermediate stages. So there's this incentive to use $$$B$$$ as large as possible, but exactly how much we can increase it cannot be predictable because of the sheer complexity of analysis of floating-point error.

So here's what we're gonna do: let's start with some large value of $$$B$$$, run a few polynomials through this algorithm and check if the product is correct. If it isn't, decrease $$$B$$$ and repeat. If it is, call it a day and use this value of $$$B$$$ for the rest of your life. That's what I've always done. That's what I did writing my own bigint library, but I wanted to be more thorough so I stress-tested it for 20 hours in 8 threads on random input. Zero errors.

Twist

The problem is -- this is bullshit. The fact that the algorithm works for random data says nothing about how it behaves in worst case, for one simple reason. For random data, the errors accumulated during the intermediate stages are sort of independent and can be approximated as a one-dimensional random walk. The neat part about a $$$k$$$-step 1D random walk is that, on average, the distance from $$$0$$$ is around $$$\sqrt{k}$$$, as opposed to the maximal possible value of $$$k$$$. So the error is severely underestimated.

What is the worst case for FFT, then?

Full text and comments »

  • Vote: I like it
  • +217
  • Vote: I do not like it

By purplesyringa, history, 21 month(s) ago, In English

...or "How legacy systems fight obstacles no sane person would introduce in the first place".

Hello, Codeforces!

I am sure that those of you who tried to host their own programming competitions or write their own utilities for contest management are aware of the state of affairs regarding contest hosting. There are lots of incompatible formats, no one knows exactly how stuff ought to work, Polygon is full of nondescript and seemingly contradictory options, ej-polygon never seems to work quite correctly, modifications have to be applied to jury archives to make them work on your particular installation, non-trivial problem types require constant maintenance and hacks, ad infinitum.

Personally, I stumbled upon all these problems when I tried to get my own judge system working and suddenly discovered that the difficult part is not judging complex problems, but handling artificial complexity introduced by legacy software and standards. So here I am, attempting to solve these problems.


Firstly, I introduce a new format, which I shall call problem.xml format, which, as is perhaps obvious, is based on the Polygon format. I introduced one or two special cases here and there to ensure it's 99% compatible with the archives generated by Polygon presently. Unlike Polygon's format, it's completely documented, and as little leeway is allowed as possible while not compromising efficiency.

This format enables many, almost arbitrary problem types, in addition to the default types: input/output, interactive, run-twice, and problems with graders. For example:

  • Problems with custom scorers (better known as valuers by Ejudge adopters) are supported. This means that the points of a solution are not necessarily equal to the sum of points of each test; rather, any relations are possible, up to negative scores, efficiency-rated programs, and "write a program that prints your username".

  • What I shall call formulaic problems, when the user solution outputs a formula or a universal algorithm that is then executed by the judge.

  • Optional compilation on each test, which would come in handy for some practical development competitions.

  • Output-only problems, meaning that the user is asked to submit a ZIP archive that contains the answers to each test.

  • (Optional support) arbitrary strategies, which allow problemsetters to generalize the above as they see fit: run-thrice problems, compile-time testing, and CTF-like competitions are within reach with just a few lines of code,

  • Arbitration, allowing for marathon problems, which means that the points awarded to a submission may depend on the results of other submissions.

For existing platforms that support Polygon format or alike, few modifications will be necessary to get everything but strategy and arbitration working: I am hoping that major vendors are up to the task. Strategy support is somewhat tricky to implement, so that part is up in the air for the moment (not that this diminishes the rest of the work). Arbitration should theoretically be easy to get working, but might require some modifications to the software architecture.

The draft of the specification is available here: https://github.com/imachug/problem-xml-specs. While I think that the spec is mostly finished and I'm publishing it here for better visibility, I'd be glad to listen to your input and apply changes where necessary!


Tagging some people for better visibility—your input is highly appreciated: geranazavr555, MikeMirzayanov, grphil, andrewzta, dkirienko.

Full text and comments »

  • Vote: I like it
  • +296
  • Vote: I do not like it

By purplesyringa, history, 2 years ago, In English

Unfortunately, most derivations of the chance of polynomial hashing collision are invalid, wrong, or misleading, and finding reliable public sources with proofs is incredibly difficult. This article is a formal analysis of the method. The goal of this article is to complement well-known empirical facts with theory, provide boundaries on the probability of collision, justify common choices, and advise against certain popular parameters.

There is no code in this article, and it's mostly theoretical. It's more of a summa of everything we know about polynomial hashing in integers rather than a guide for beginners. You'll probably find something of interest still. I do, however, describe some methods of cracking hashes, which I can provide code and additional explanation for if someone asks in the comments and some general guidelines in the last section of the present article.

Table of contents

  • The concept of polynomial hashing
  • Classical justification of the coprimality requirement
  • The pitfall
  • Probability of size-bounded collision
  • Probability of value-bounded collision
  • Back to basics
  • Conditions of surjectivity
  • Handling lengths
  • Side note on coprimality
  • The two scenarios
  • Reversal
  • Thue-Morse sequence
  • Attempt to reuse Thue-Morse sequence for other moduli
  • Birthday paradox
  • Lower bounds for anti-hash tests
  • Particularities
  • Key takeaways

Full text and comments »

  • Vote: I like it
  • +308
  • Vote: I do not like it

By purplesyringa, history, 2 years ago, In English

I'd like to share this one unpopular but awesome algorithm that I only heard a mention of once. Kapun's algorithm finds a hash collision for moduli as large as $$$10^{18}$$$. I know, we have lots of algorithms that do much better than that (and I'm writing an article on that at the moment, keep tuned), but Kapun's algorithm is really surprising in its simplicity. That its correctness is so difficult to prove is of more surprise even.

Here we go.

A polynomial hash of a string $$$s$$$ is defined as

$$$ H(s) = s_0 b^0 + s_1 b^1 + s_2 b^2 + \dots + s_{n-1} b^{n-1} \pmod M. $$$

We want to find two strings with the same polynomial hash using only two characters: $$$0$$$ and $$$1$$$.

We can reformulate this problem in another way. Let $$$a_i = b^i \mod M$$$, then we want to find two distinct subsets of $$$a$$$ with the same sum modulo $$$M$$$. Now forget about the modulus: let's just find two subsets with the same sum.

Firstly, when is this possible? There are $$$2^n$$$ possible subsets and $$$n(M - 1) - n + 1 = n (M - 2) + 1$$$ distinct sums. If $$$2^n > n (M - 2) + 1$$$, that is, $$$\frac{2^n}{n} \gg M$$$, there are certainly two subsets with the same sum. Generating $$$M$$$ or even just $$$\sqrt{M}$$$ strings (if you use birthday paradox) is impossible for large $$$M$$$, but there's still a way through.

It turns out there is a deterministic algorithm that attempts to solve this problem that is very easy to implement!

Full text and comments »

  • Vote: I like it
  • +246
  • Vote: I do not like it

By purplesyringa, history, 2 years ago, In English

The first day of IATI 2021 has finished about half an hour ago. Please feel free to discuss the competition here!

Problems

Results

Editorials

^ Please feel free to provide links when they are available

Full text and comments »

  • Vote: I like it
  • +130
  • Vote: I do not like it

By purplesyringa, 3 years ago, In English

This is not an exit.

I am looking at the post called Is Mike Mirzayanov dictator? at the moment. I'm reading through the comments, I'm looking at people's reactions, I'm reading Mike's replies, and... I am utterly disappointed.

I demand progress.

For context, I do believe Codeforces is the best website for competitive programming contests at the moment. I sincerely give credit to Mike for creating Codeforces and providing participants and contest authors a platform for interaction completely for free. I am confident that Codeforces is the most appropriate website for being the platform for sharing knowledge, exchanging tricks and methods, and collaboration between both contest participants, their authors, and coordinators.

Please consider this an open letter, for this is a will not of a single person, but of many individuals. Please consider signing it by stating so in a comment under the post, should you incline to do so.

Full text and comments »

  • Vote: I like it
  • +527
  • Vote: I do not like it

By purplesyringa, history, 3 years ago, translation, In English

image

Polygon is not world's greatest evil. Polygon doesn't even have a particular defect that makes it horrible. You can't say what's exactly wrong with Polygon. Polygon seems to work, it seems like every feature is supported, but if you touch it here it will fall apart, and if you touch it there a problem (pun not intended) will appear. Or not, depending on your luck. Polygon is like PHP. For those who haven't read the famous rant, I'll cite it for you:

I can't even say what's wrong with PHP, because-- okay. Imagine you have uh, a toolbox. A set of tools. Looks okay, standard stuff in there.

You pull out a screwdriver, and you see it's one of those weird tri-headed things. Okay, well, that's not very useful to you, but you guess it comes in handy sometimes.

You pull out the hammer, but to your dismay, it has the claw part on both sides. Still serviceable though, I mean, you can hit nails with the middle of the head holding it sideways.

You pull out the pliers, but they don't have those serrated surfaces; it's flat and smooth. That's less useful, but it still turns bolts well enough, so whatever.

And on you go. Everything in the box is kind of weird and quirky, but maybe not enough to make it completely worthless. And there's no clear problem with the set as a whole; it still has all the tools.

Now imagine you meet millions of carpenters using this toolbox who tell you “well hey what's the problem with these tools? They're all I've ever used and they work fine!” And the carpenters show you the houses they've built, where every room is a pentagon and the roof is upside-down. And you knock on the front door and it just collapses inwards and they all yell at you for breaking their door.

That's what's wrong with PHP.

Full text and comments »

  • Vote: I like it
  • +180
  • Vote: I do not like it

By purplesyringa, history, 3 years ago, In English

This is the tenth time I stumble upon a controversial blog, write a large comment that'd be useful both to the OP and other people, and when I finally post the comment Codeforces tells me the post is deleted and my giant rant doesn't get saved. I usually breath in and out and get over it, but this time I figured out I have spare contribution to post a complaint I can share my thought with the community.

It'd be great if comments were saved to drafts just like posts. This would be useful in case of network problems or power failure too, and would just improve UX overall.

Full text and comments »

  • Vote: I like it
  • +45
  • Vote: I do not like it

By purplesyringa, history, 3 years ago, In English

Hello, Codeforces!

A few days ago MohammadParsaElahimanesh posted a blog titled Can we find each Required node in segment tree in O(1)? Apparently what they meant was to find each node in $$$\mathcal{O}(ans)$$$, according to ecnerwala's explanation. But I was too dumb to realize that and accidentally invented a parallel node resolution method instead, which speeds up segment tree a lot.

A benchmark for you first, with 30 million RMQ on a 32-bit integer array of 17 million elements. It was run in custom test on Codeforces on Apr 6, 2021.

  • Classic implementation from cp-algorithms: 7.765 seconds, or 260 ns per query
  • Optimized classic implementation: (which I was taught) 4.452 seconds, or 150 ns per query (75% faster than classic)
  • Bottom-up implementation: 1.914 seconds, or 64 ns per query (133% faster than optimized)
  • Novel parallel implementation: 0.383 seconds, or 13 ns per query (400% faster than bottom-up, or 2000% faster than classic implementation)

Full text and comments »

  • Vote: I like it
  • +402
  • Vote: I do not like it

By purplesyringa, history, 3 years ago, In English

Hi, Codeforces!

I've just made a script that publishes all new posts (and their updates) on Codeforces to a Telegram channel, with formatting and stuff.

The notifications should are quite fast, they should propagate in less than 5 minutes.

Enjoy!

Full text and comments »

  • Vote: I like it
  • +63
  • Vote: I do not like it

By purplesyringa, history, 3 years ago, translation, In English

Hello, everyone! You might have noticed that Codeforces has changed the logo to a new one temporarily, but it seems the admins can't decide what is better. Let's vote on the logo and see what the community itself likes! Upvote the comment for the logo you like below.

Full text and comments »

  • Vote: I like it
  • +214
  • Vote: I do not like it

By purplesyringa, history, 3 years ago, In English

Hello, Codeforces!

The current state of affairs is that running your own a contest on Codeforces is quite difficult. First, Codeforces doesn't accept single-problem proposals, so if you don't have many CP friends and can't make 6 good problems, you're out of luck. Second, many trash problems get proposed, and the coordinators have to spend much time filtering them out and then explaining why these problems were rejected. This status quo is bad for everyone, both participants and problem setters, so I'm thinking of a way to fix the situation.

Here is my idea. People generate trash problems because, when just a single of their problems is rejected, the whole contest is in danger, so problem setters 'bruteforce' problems to plug the hole. What if we help problem setters to propose just a single problem, and then problems from different people could be merged into a contest? This would reduce the fraction of bad problems because setters won't propose them just to fill holes.

The question is about bypassing the bus factor—coordinators. I think there are many 2300+/red users who could help moderate problem proposals. There are many people from Div 1 who like competitive programming but write Codeforces rounds not very often, perhaps because of Codeforces style or timing. I think such people would be ideal moderators: they have experience, they won't cheat, they are ready to help and they're interested in looking 'behind the scenes'.

As a bonus, testers who would like to test rounds but they don't have friends—problem setters, will be able to take part in the process. And these testers aren't necessarily high-rated participants — it's quite common that neither contest authors, nor testers see a simple problem solution based on, say, greedy algorithms, or the tests are too weak, and this is only noticed by experts during the round.

In TL;DR form, the idea is to help testers, moderators and problem setters find each other to reduce the burden on Codeforces coordinators and run quality rounds more often.

What do you think?

Full text and comments »

  • Vote: I like it
  • +66
  • Vote: I do not like it

By purplesyringa, history, 3 years ago, In English

I'm interested in non-classic problems here on Codeforces, so I've looked through the problems with special tag. The ones with non-standard format are:

  1. Problems that can only be solved in a single special language, such as Q# or a secret language. Example: a single problem 409B - Mysterious Language or the whole Kotlin Heroes 5: ICPC Round contest.
  2. Problems with a stateful checker, which detects how many submissions one has made. Example: 409H - A + B Strikes Back.

I also vaguely remember a problem where one had to output his username with some limitations on the source code, but it could be just a comment on a blog.

Anyway, I can't find out how to create any similar problem on Polygon. Obviously there's a whitelist of supported languages, but what about allowing the user to add an interpreter for the language in C? Or allowing the checker to read the original code, not its output? I'm interested how this is implemented for the official contests and if I can do that in a gym.

As for the second type, it'd be useful if the checker could get meta-information such as the user handle or ID, and access a local DB or use network as a state store. I couldn't find any sane documentation so I'm asking here.

Full text and comments »

  • Vote: I like it
  • +134
  • Vote: I do not like it