physics0523's blog

By physics0523, history, 2 weeks ago, In English

Sadly, until today, many people use single $$$\rm{mod} = 10^9+7$$$ or so on for hashing, so why not write how to blow up them easily, with minimized thinking process?

The key idea is simple. Just use Birthday attack.

  1. Write a function to calculate the hash value
  2. Write a random generator of string
  3. When there are $$$k$$$ possible distinct hash values, look up $$$O(\sqrt{k})$$$ random candidates
  4. Now we get a pair of strings that has exactly the same hash value!

In this method, the only things we should consider are that $$$k$$$ is small enough (around $$$k \le 10^{12}$$$ ), and the number of possible candidates is large enough. We don't need to care about what the hashing function is!

Example Code(C++)

Exercise:
Find two $$$18$$$-digit integers $$$x,y$$$ such that:

  • Each digit of $$$x,y$$$ are $$$1,3,5,7,9$$$
  • $$$x \equiv y \mod 998244353$$$
Short Explanation
Example Code(C++)
... then how to avoid attacking?
  • Vote: I like it
  • +159
  • Vote: I do not like it

»
2 weeks ago, # |
Rev. 9   Vote: I like it +22 Vote: I do not like it

For candidate hash quantities at the 1e18 level (which is actually quite common, for example, when the moduli are 998244353 and 1e9+7), because a collision at the 1e9 level is required for a birthday attack, the memory requirement becomes too large. It can be considered to degrade this into a process of checking 4e8 at a time until a collision is found (expecting around 15 iterations). For the 4e8 level check, radix sort can be employed. If duplicates are found, only one O(n log n) sorting operation is needed to locate their specific positions, and then the entire program can be terminated.

Because in reality, sorting at the 4e8 level with unsigned long long (ull) is very fast. In an educational 12-hour hack or even in regular contest(if someone has pre-written programs and distributed them), finding a collision is expected.

Code

If you want to use this code, don't forget to adjust maxn (lower or higher) to suit your computer.

The larger maxn brings higher unit efficiency, so it should be large as much as possible if there is enough memory.

If there is no length requirement for the attack string, here are more

  • »
    »
    13 days ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    I also write a code by myself(because the rev. 5 cannot output the hack string).

    Code

    Successful hack records

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Add randomness to your hash. For example, determine some constant values in your hashing function randomly when start executing.

I would want to add in a little bit more: try to avoid the pseudorandom nature of the RNG itself. If your seed is too explicitly stated, it could also be reverse-engineered.

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

As a participant you hacked, I wanted to say that probably many people know that their hashes are weak.

I was actually speed coding this problem in last 9 minutes & I managed to get it accepted.

The only thing that I thought it could defend my hash is randomized base, but it also got hacked because you can submit too many testcases with some range of bases you try to hack + I used a bad random srand(time(0)), I should've used double hash.

I believe this is a good lesson :)

»
2 weeks ago, # |
  Vote: I like it -87 Vote: I do not like it

This person hacked my submission and made me fail to get out of Specialist hellhole, instant downvote

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    downvote who?

  • »
    »
    13 days ago, # ^ |
    Rev. 2   Vote: I like it +3 Vote: I do not like it

    Harsh truth:

    • You only solved till E if not counting G1, and even D gave you massive issues (which should be an easy Div2B at most). You are neither ready nor qualified for blue just yet.
    • Rather than learning from the mistake (here the hacker even presented his discovery to the public), you bashed the one pointing that out for you. Hey, it was your mistake after all, and in case of not him hacking you but the testset having it at the far bottom, should you downvote the contest organizer as well? Mentally, you are also not ready for blue.

    TL;DR: You're blaming others for your own current incompetence, while learning chances are still plentiful. Oh, and surely downvoted you instantly as well.

    • »
      »
      »
      13 days ago, # ^ |
        Vote: I like it +4 Vote: I do not like it

      Well, from my point of view getting hacked is worse than failing systests, since in the last case you get to adore your results for some time rather than being crushed immediatly from realization that your solution is incorrect. So the dude above could just hack a few and leave the rest for systests

      • »
        »
        »
        »
        13 days ago, # ^ |
          Vote: I like it +23 Vote: I do not like it

        I see that as a fake pleasure to be honest, like a placebo to hide oneself from reality. And in a way, being crushed late when all hopes have grown the highest is the worst deal to one's mind.

        • »
          »
          »
          »
          »
          13 days ago, # ^ |
            Vote: I like it -15 Vote: I do not like it

          Pretests are made before the contest, fairly created to.....cut corners? instead of specifically aimed to break a certain submission. Getting hacked on.....usual/normal problems are fair and all — either the author hasn't bothered to generate strong enough tests (to the point of missing obvious details, like forgetting to change int to long long or initializing a 2e5 array when n<=5e5). It sucks, but at least it's entirely my fault (same with getting -5 on a trivial problem thánks to the absolute genius method of finding out the max value on a permutation). Hash hacking, on another hand, is literally "fuck you in particular" in every sense of the word. Tests in OI contests aren't supposed to be adaptive enough to destroy certain submissions in particular. It seems likely enough that you've participated in Vietnam's Olympiad in Informatics before, which (if I recall correcly) requires hashing sometimes. Imagine the test-making council SPECIFICALLY going after your submission in an attempt to.....fuck you up very bad. In a proper OI contest, that kind of behavior is...massive groudnds for the test-makers to get sued to oblivion for discrimination.

          In the field of problem of "weak in-contest tests", there has been...enough examples of much better, non-half-asssed instances. Most recently, in the previous Vietnamese TST (which happened a month ago), the testcase for problem 3 (could be 6?) was very weak initially, someone got 70 something pts out of it, the contest council genersted new test, update them live and the dude's back to 0pts again. The changes were made LIVE and was made as an attemnt to...correct the contest setter's mistakes. Another (important enough) instance was the horribly weak testcase in subtask 3 of IOI2016's Railroad: for some reason, noone managed to find test that breaks a horribly wrong algorithm (which only requires 6 numbers!) within the entire preparing process AND the 5 hour duration of the contest. So, at least 6 people got a free 30pts, which might as well just lead to a difference between no medals and Bronze, or even Bronze to Silver. The way they handlednit in a (probably) prroper, international contest? Contestants got to keep their score because it's the setters (?) fault that the tests were that weak AND NO ONE caught the issue. That's the way issues with weak tests were resolved in OI contests worth their salt.

          (Basically: hacks made to counter pathetic systests, while questionable, is still.....okay for me. Hash hack (in general), not so much. Just simply ban people from making hacks to problems where a big enough amount of contestant is expected to use hashing — it makes every problem less of an algorithmic challenge, and make things much more unecessarily complicated in aspects that had nothing to do with problem solving at all. People who managed to survive the hacking shitstorm either used Z-function (stronger than hashing, indeed) or used 5 different hash bases or something (which.....what the hell does it even have anything to do with coming up with the algo that solved the problem at all?). And, before you jump in and say something like "b...but hash collisions are a major security issue and would be exploited to hell in realistic contexts", who the hell even goes on Codeforces, a COMPETITVE programming site to practice proper, FUNCTIONAL programming?

          • »
            »
            »
            »
            »
            »
            13 days ago, # ^ |
              Vote: I like it +5 Vote: I do not like it

            I honestly never enrolled in any VOIs or Vietnamese TSTs before. I was a Chemistry major student in high school, and only came towards competitive programming in college days.

            However, using perspectives from an environment to expect at another, I can guarantee you will have to experience the frustration you seemingly just expressed.

            Codeforces isn't any OI contest, and it isn't any ICPC contest even. It's an independent platform, with its own rules and features. Take a simpler analogy: even in the world of FPS games, assuming you are proficient in one (for example Counter Strike), you don't just jump into another (Valorant maybe) and expect everything to be exactly the same, do you?

            Hacks in Codeforces have always been this way, as you are freely to see whatever solution you like and hack any you wish, and it's the duty of the problem setters (or maybe the system) to include such tests if they indeed separate "pretest passed" submissions into success or failure, and there has been no rules to stop any hacks you deemed "unreasonable". That's a thing you learnt here. I won't say it's an applied experience (it probably is to some, but I won't assume), but it is an experience, and it helps you, at least on the exact place you are taking contest at. And there are many, many exploits that have been done here in Codeforces. Morally right or not, it's for each of us to decide. Even when I agree with a lot of such exploits, I also have for my own a fair share of things I wouldn't do even if I had the ability (for example, tampering up with allocations on some bad implementation of G++ compilers).

            Speaking of "fuck you in particular", I once destroyed rand() implementations on a random-based problem on pretest. I'm glad you weren't a participant there, or else you would hate me a lot worse.

            • »
              »
              »
              »
              »
              »
              »
              13 days ago, # ^ |
                Vote: I like it -17 Vote: I do not like it

              Would say that it's....a hell lot fairer: the test were made before the contest started, without trying to target a certain participant in particular (I myself actually only used rand() for quite a long while, until I tried to actually generate tests for a haha funny problem with n<=5e5 and only get small values). There's a difference between "a troll test made to exploit a common enougu mistake (see the Div1+2 B on the latest MofKuroni round, at least the beta-tested version)" and "a test made specifically to kill a cerrtain submission that doesn't work with anything else" (and, heck, I even used a different base as the usual while staying at low-400 standing, so it's a bit more annoying and even more...specific. Though, that's the first time I ever implemented hashing after nearly 2 years so there,s...that. At least I now know that I can implement hashes well enough to work with non-adaptive tests.....)

          • »
            »
            »
            »
            »
            »
            13 days ago, # ^ |
              Vote: I like it +8 Vote: I do not like it

            Hash hacking, on another hand, is literally "fuck you in particular" in every sense of the word.

            I have to disagree; consider this: if the systests included all the possible input configurations, then those hackable solutions would not be passed. Since it is not practical to test for all inputs, it is kinda fair to outsource it to participants (like a bug bounty).

            • »
              »
              »
              »
              »
              »
              »
              13 days ago, # ^ |
              Rev. 4   Vote: I like it -10 Vote: I do not like it

              Personally speaking (and coming entirely from my perspective), hashing is somewhat similar to randomized algorithms: there ARE cases whhere they'll be wrong, but, more often than not (all but once every several billion times or something), they'll end up being correct. Can't remember the exact problem but a year ago or something there exists a 3000 DS problem that had been solved by some Chinese black magic fuckery something something (iirc it was randomizing 32 times, each time having 1/2 chance for the answer to be "wrong", and for the algo to actually be wrong, all 32 randomized runs has to fail). Could that be broken? Abssolutely — someone with enough time and desire to ruin someone's day after they managed to find a (potentially) outstanding, unexpected, practical solution could ABSOLUTELY find a test to deal with that kind of thing. Is it even.....fair? Not really — at that point, it's less about "coming up with stratergies that works" and more about "finding ways to deal with issues that normally wouldn't occur EVER, which, depending on your "problem designing philosophy", could range anywwhere from unecessary complications (highly looked down upon by setters worth their salt) to straight up petty bullshit (looked down upon by.....pretty much everyone).

              When the difference between the "brain usage level" between a randomized solution and a proper "deduction scheme" is too large, when there exists a proper scheme to get the informations you need & that is the main concept of the problem, randomized solutions are not favored by the setters and they deal with it by making the grader adaptive (which is STILL tests "generated" DURING the contest WITHOUT any access from the grader to the contestant's submission. One example (that I kinda known of — haven't actually attempted the problem) is IOI2020's Counting Mushrooms. If the grader wasn't adaptive, there are 2 ways to get the "good enough" efficiency needed tk solve the problem (one that's only solved by yours truly William Lin, for all that matters):

              • Using mathematics to calculate a relatively optimal scenario, do some casework, brach & bound with samples of size 5, them/and/or run a DP on the undetermined mushrooms, which...requires a bit of though, definitely

              • Just use a braindead randomized solution lmao.

              In cases like that where the main CONCEPT of the task lies in coming up with a solution that got trivialized TOO HARD by a shitty randomized solution, obviously randomized solutions are....not very well-liked by the task author(s). HOWEVER, that doesn't stop randomized solution from being "wrong", AS LONG AS the authors can't manage to write a grader that could deal with it.

              In my opinion, chance-bassed solutions should only be considered "wrong" IF AND ONLY IF when considering the interactor/grader/test generator as a communication task, it is a solvable one. Think about it like this: "turn" the creation of a grader into a task itself, something among the lines of "You are given the participant's submission as an interactor. Using some queries, determine whether they're using a randomized algorithm/unintended shitty solutions or not. Break their solution and report that they're using some black magic fuckery, or return -1 if they're actually doing some fair stuffs". There has been ENOUGH adaptive interactors/grader lying around for one to know that it's entirely possible in certain cases.

              Now, consider that dual/transformed problem, but for hashing instead of dealing with randomized algorithm. Is that problem solvable? HELL NO LMAO, people have been using hashing in programming contests for.....at least 20 years as of now (as far as I know of), and to this day, NO ONE, I repeat, NO ONE have been able to create an adaptive test-generator that could deal with hashing the way they managed to create ones that could deal with chance-based solutions. Not the thousands of Reds around, not the countless professors, not Tarjan, not Knuth, not even Tourist himself (again, as far as I could know of). One would think that with how "improper" of ánolution hashing is, people would've tried to find ways to FAIRLY break it, yet all the attempts were to no avail. Which makes anti-hashing hack test that requires access to the contestant's solution and human brain doing things (instead of fully automating the submission-breaking process) looks like petty bullshits coming from people who really, really wants to deliberately ruin someone's daybfor their enjoyment in my point of view.

              TL:DR: this is a competitive programming website. I have to come up with ways to "break" your tasks without human inteference. If you want to break my chance-based solution, treat the task of breaking my submission as an interactive/communicative one, and SOLVE IT. It's simply not fair for the submission hackers to be able to generate custom tests against automated solutions, the equivalent of that from the contestant's side would be.....literally turning Codeforces into an output-only-only site, and anyone who knows what an output only is would immediately realize how silly and broken that is

              • »
                »
                »
                »
                »
                »
                »
                »
                13 days ago, # ^ |
                  Vote: I like it +16 Vote: I do not like it

                If you want to use probability in a problem, but your probability is functionally bad, why should it pass? You can make claims on what is good enough to be functionally good, but on cf, and I think a reasonable general claim, is it's good enough if others can't break it.

                You can add multiple mods or whatever to make hashing this form of functionally good.

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  13 days ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it

                  Breaking once every several billion times or so is.....good enough for CP standards, believe it or not.

                  While adding several mods makes it harder to hack, that...basically has no algorithmic value at all. It has nothing to do with the "key" part of the solution (checking whether two strings are equal or not, which, more often than not, is just a minor fraction of the main concept of the problem. Adding in additional, irrelevant, uneeded difficulty that has nothing to do with the main algo is just shitty problem design

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  13 days ago, # ^ |
                  Rev. 3   Vote: I like it 0 Vote: I do not like it

                  CP standards change over time. If you look at much older problems (maybe older than cf), solutions can often pass even with fundamental problems in idea (and sometimes they still do today, just less often).

                  Constraints increase and wrong probabilities need to decrease as compute increases, solutions need to minimize minor errors as edge cases get better, and participants need to update methods when communities find ways to break them. Theoretically good ideas with bad constant factors also killed. CP is not all about theory, it's about whether it passes under constraints of judge today, and today on cf those constraints include what participants can break.

        • »
          »
          »
          »
          »
          13 days ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          Well again, from my experience, you kinda get enough time to enjoy your performance, so that when systesting comes, you are settled down and can actually admit your mistakes

          • »
            »
            »
            »
            »
            »
            13 days ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Fair enough, that's one way to behave that I never really experienced, to just take the joy while it lasts then going back to serious mode. Perhaps in the mentality issues, we can agree to disagree right here. ;)

            • »
              »
              »
              »
              »
              »
              »
              13 days ago, # ^ |
                Vote: I like it +3 Vote: I do not like it

              I agree, I just wanted to point out that if author of the blog feels the same, his intention might've been hurting ones ego instead of actually teaching him something. As you can see with the dude above, with this method he only got enraged and didn't want to read the blog.

    • »
      »
      »
      13 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      The failure of solving other problems means nothing.In general,we all have different fields that we are good at.Putting the trash reason in front of your comment makes your comment look like trash ;)

    • »
      »
      »
      12 days ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      I get the second point but slighly confused about the first one? Is blue really a point you should be ready for? And is someone really qualified to gatekeep by looking at a single contest? I could see myself having problems with E or F on a bad day, and I've failed a simple div2 C in the latest educational round. Is there someone I should write so that they reduce my rating since I'm definitely not even qualified to be green? (Once again, not defending the original position, just surprised to see the amount of sheer gatekeeping)

      • »
        »
        »
        »
        12 days ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        To be honest, I was a little bit subjective there. I guess I might be a bit hypocritical even when I was practically similar a few rounds before.

        What I tried to view this situation was that, the guy really hadn't had any proper peak to ascertain that his true potential could be higher, and he was inconsistent at stuff much lower than his expected goal (the problem I gatekept him was something supposedly to be green/cyan boundaries), thus I rendered him "not ready".

        Yeah, I might be slightly toxicly strict sometimes... That part, I am at fault.

  • »
    »
    13 days ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    hahaha, nooooooooooobs always blame others.Instead of crying out for your weak submission,Retiring is a better choice.XDD

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

if i'm using three powers with random values and random mod from 1e9 ..2e9 >> is it possible to be hacked

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

if i'm using large mod with candidates 1e9 how i'll handle operations like Hash[i] = Hash[i+1] *(p%mod) i mean the range of datatype i use

  • »
    »
    13 days ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    You meant $$$p$$$ being close to $$$10^9$$$, am I right?

    long long is a braindead datatype you could use that will surely not go overflown if done right.

    Still, if your mod is still $$$10^9 + 7$$$, then int is still possible. Just that you need to temporarily do your calculations on long long. Something like this:

    Hash[i] = (1LL * Hash[i+1] * p) % mod

»
13 days ago, # |
  Vote: I like it +10 Vote: I do not like it

So how to blow up two mods's hash? PEIMUDA once told me, but I didn't comprehend, and forgot. According to him, there is an efficient algorithm.

  • »
    »
    13 days ago, # ^ |
      Vote: I like it +10 Vote: I do not like it
  • »
    »
    13 days ago, # ^ |
      Vote: I like it -11 Vote: I do not like it

    You can use the two strings you got from the birthday attack for the first hash function as the alphabet for another birthday attack on the second hash function (i.e. do another stage of birthday attack on the second hash function with the generated strings being a random concatenation of the strings you got from the attack on the first hash function).

    With a smarter application of the birthday attacks, you can increase the alphabet size for the later stages of birthday attacks and generate smaller strings that can be used to hack a wider range of hash functions (if someone is randomizing the base for example).

    • »
      »
      »
      13 days ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      What would be the complexity? Especially the memory requirement.

      • »
        »
        »
        »
        13 days ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        memory requirement may be $$$O(\sqrt m\log^2m)$$$, not very big though.

»
13 days ago, # |
  Vote: I like it +3 Vote: I do not like it

This guy hacked 2 out of 3 my AC submissions in 1968G1 and spared me the last one, I should learn KMP by now