C++20 is back - Codeforces

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	163
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	147
9	orz	146
10	pajenegod	145

In contests and in the custom invocation, after a pause, one can again find GNU G++20 64-bit. However, the slowdown issue because of which the language disappeared persists, there are still snippets of code that can slow down the execution on Codeforces servers by a factor of 100 or so.

So what is the official position of Codeforces headquarters and of our community on that?

Are there general methods of constructing testcases that can exploit this GCC/Windows bug and therefore slow down solutions (like there are anti-hash tests for solutions using unordered_set)? Are there methods that slow down both contest mode and custom invocation mode (since 249807302 is only slow in the contest mode whereas 253289473 is only slow in the custom invocation)?
If so, should we expect such tests in future contests, will such methods be used during the test preparation in contests?
If this is deemed an unwanted bug, will participants be protected from unscrupulous hacks and tests? (For example, if the solution was hacked or FSTed because of a compiler bug rather than because of anything else, will the author be able to appeal against it and get their solution accepted?)
Is there no bug in 32-bit version? If it's 64-bit only, maybe CF team could add a C++20 32-bit compiler? UPD: 32-bit version can also be slowed down, see 253400416.
Is there this bug in Microsoft Visual C++ compilers (32 or 64 bits)? If so, is it the reason why we cannot see it among the options?
Are there generic methods of protecting against the bug (except switching to 32 bits)? I read (and enjoyed) a blogpost by ToxicPie9 regarding this issue, he suggested to add several lines of code to the solution. However, it does not always help: 253290209 still gets TL (and the author said about that).
Has anyone a clue what is going on, why do the aforementioned programs run slowly? Is this bug reported to the GNU or Microsoft developers who are responsible for this strange behavior?
Why does behavior in custom invocation and in contest/gym mode differ?
Is this bug reproducible on other platforms? For instance, on godbolt there are some windows compilers. Is there a way to slow down this code: https://godbolt.org/z/5G73dfezK (by cleverly choosing the value of evil or anything else)? UPD: probably no way since Godbolt uses UCRT rather than MSVCRT. UPDUPD: this code is slow on Godbolt: https://godbolt.org/z/WE4Mxv9cK, so UCRT is also flawed.
Is there surely no such bug on Linux? (Anyway, with regard to Codeforces it is probably a pointless question. I am quite certain that transition of all invokers to Linux is the last thing MikeMirzayanov would prefer to do since it is for sure really costly in time, effort, architecture changes, and/or money.)

#include <bits/stdc++.h> int main() { int MBs; std::cin >> MBs; int len = MBs * 1024 * 1024; for (int i = 0; i < 1000; ++i) { char* a = new char[len]; std::memset(a, 0, len); delete[] a; } return 0; }

use std::{ alloc::{GlobalAlloc, Layout, System}, sync::atomic::{AtomicU64, Ordering}, }; pub struct Counter(AtomicU64, AtomicU64, AtomicU64, AtomicU64); impl Counter { pub const fn new() -> Self { Counter( AtomicU64::new(0), // number of allocs AtomicU64::new(0), // number of deallocs AtomicU64::new(0), // last size AtomicU64::new(0), // last alignment ) } pub fn count(&self) -> (u64, u64, u64, u64) { ( self.0.load(Ordering::Relaxed), self.1.load(Ordering::Relaxed), self.2.load(Ordering::Relaxed), self.3.load(Ordering::Relaxed), ) } } unsafe impl GlobalAlloc for Counter { unsafe fn alloc(&self, layout: Layout) -> *mut u8 { self.0.fetch_add(1, Ordering::Relaxed); self.2.store(layout.size() as u64, Ordering::Relaxed); self.3.store(layout.align() as u64, Ordering::Relaxed); System.alloc(layout) } unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { self.1.fetch_add(1, Ordering::Relaxed); self.2.store(layout.size() as u64, Ordering::Relaxed); self.3.store(layout.align() as u64, Ordering::Relaxed); System.dealloc(ptr, layout) } } #[global_allocator] static A: Counter = Counter::new(); fn test2(n: i32, m: i32, k: i32, l: i32) { let mut ans = 0usize; let init_count = A.count(); let mut v: Vec<Vec<u32>> = Vec::with_capacity(n as usize); let second_count = A.count(); // printlns will contribute to allocations as well, so ignore those allocations // println!("init_count = {:?}", init_count); // println!("second_count = {:?}", second_count); for i in 0..n { let before = A.count(); v.push(Vec::with_capacity(m as usize)); let after = A.count(); // println!("before = {:?}", before); // println!("after = {:?}", after); for j in 0..m { v[i as usize].push(j as u32); ans += v[i as usize][j as usize] as usize; } } for i in 0..k { let before2 = A.count(); let mut x = Vec::with_capacity(l as usize); let after2 = A.count(); // println!("before2 = {:?}", before2); // println!("after2 = {:?}", after2); for j in 0..l { x.push(j as u32); ans += x[j as usize] as usize; } } println!("{}", ans); } // this code roughly corresponds to: // int main() { // vector<vector<int>> TLE(40000, vector<int>(7)); // for (int j = 0; j < 60000; ++j) { // vector<int>(7); // any size from 7 to 10 works // } // } pub fn main() { test2(40000, 7, 60000, 7); }

#include <limits> #include <type_traits> using uint128_t = __uint128_t; int main() { static_assert(std::is_unsigned<uint128_t>{}, ""); static_assert(std::is_integral<uint128_t>{}, ""); static_assert(std::numeric_limits<uint128_t>::digits == 128, ""); }

#include <iostream> #include <string> #include <cinttypes> #include "windows.h" #include "VersionHelpers.h" #include "cpuid.h" // https://zhuanlan.zhihu.com/p/28322626 void getCPUID() { uint32_t data[4]; char str[48]; for (int i = 0; i < 3; ++i) { __cpuid_count(0x80000002 + i, 0, data[0], data[1], data[2], data[3]); for (int j = 0; j < 4; ++j) reinterpret_cast<uint32_t*>(str)[i * 4 + j] = data[j]; } std::cout << str << std::endl; } //https://www.cnblogs.com/LyShark/articles/15019661.html //https://bbs.kanxue.com/thread-252559.htm void getSysVersion() { std::string ret = std::string(IsWindowsServer() ? "Windows Server" : "Windows") + " "; typedef void(__stdcall * NTPROC)(DWORD*, DWORD*, DWORD*); DWORD dwMajor, dwMinor, dwBuildNumber; NTPROC proc = (NTPROC)GetProcAddress( LoadLibrary(TEXT("ntdll.dll")), "RtlGetNtVersionNumbers" ); uintptr_t PEB; proc(&dwMajor, &dwMinor, &dwBuildNumber); #ifdef _WIN64 PEB = __readgsqword(0x60); dwBuildNumber = *((INT*)(PEB + 0x120)); #else PEB = __readfsdword(0x30); dwBuildNumber = *((INT*)(PEB + 0xAC)); #endif if (dwMajor >= 6) { ret += std::to_string(dwMajor) + "." + std::to_string(dwMinor) + "." + std::to_string(dwBuildNumber); } else { // win 8 以下 float f_ret; SYSTEM_INFO info; GetSystemInfo(&info); OSVERSIONINFOEX os; os.dwOSVersionInfoSize = sizeof(OSVERSIONINFOEX); if (GetVersionEx((OSVERSIONINFO*)&os)) { f_ret = os.dwMajorVersion + os.dwMinorVersion * 0.1; ret += std::to_string(f_ret); } } std::cout << ret << std::endl; } //https://www.cnblogs.com/lidabo/p/7554473.html void getMemoryInfo() { constexpr int kMaxInfoBuffer = 1 << 8; constexpr int GBYTES = 1 << 30; constexpr int MBYTES = 1 << 20; constexpr int KBYTES = 1 << 10; constexpr float DKBYTES = KBYTES; std::string memory_info; MEMORYSTATUSEX statusex; statusex.dwLength = sizeof(statusex); if (GlobalMemoryStatusEx(&statusex)) { uint64_t total = 0, remain_total = 0, avl = 0, remain_avl = 0; double decimal_total = 0, decimal_avl = 0; remain_total = statusex.ullTotalPhys % GBYTES; total = statusex.ullTotalPhys / GBYTES; avl = statusex.ullAvailPhys / GBYTES; remain_avl = statusex.ullAvailPhys % GBYTES; if (remain_total > 0) decimal_total = (remain_total / static_cast<double>(MBYTES)) / DKBYTES; if (remain_avl > 0) decimal_avl = (remain_avl / static_cast<double>(MBYTES)) / DKBYTES; decimal_total += (double)total; decimal_avl += (double)avl; char buffer[kMaxInfoBuffer]; //sprintf_s(buffer, kMaxInfoBuffer, "total %.2f GB (%.2f GB available)", decimal_total, decimal_avl); std::sprintf(buffer, "total %.2f GB (%.2f GB available)", decimal_total, decimal_avl); memory_info.append(buffer); } std::cout << memory_info << std::endl; } int main() { std::cout << "===OS information===" << std::endl; getSysVersion(); std::cout << "===CPU information===" << std::endl; getCPUID(); std::cout << "===Memory information===" << std::endl; getMemoryInfo(); }

Comments (48)

Write comment?

nor

5 weeks ago, # |

← Rev. 2 →

+19

I'd also like to point out that there are multiple C standard libraries for Windows — for example, MSVCRT and UCRT. The C standard library is what is responsible for memory allocation among other things. As ToxicPie9 pointed out, it is likely that Codeforces uses MSVCRT. UCRT is generally better than MSVCRT in every way (and is the default on newer Windows systems, but can be installed on older Windows systems too). More information is also present on the MSYS2 website, which also provides some clues on how to use UCRT.

So one solution to the problem, that does not involve using Linux, could be to switch to a different C standard library and check if the issue persists.

Note that the issue with memory allocations on MSVCRT is also documented on this stack overflow question from 9 years ago which compares cygwin with MSVCRT — surprisingly, cygwin wins in this case. One of the comments says that MSVCRT is not really a "production" library — it is supposed to be a minimal library that ensures that there is support for C stuff on every Windows system. And as pointed out earlier, all new Windows systems come with UCRT installed instead of MSVCRT, so MSVCRT should not be used at all anyway.

I would definitely suggest Codeforces to look into trying out both UCRT and cygwin, and if possible, provide compilers for both on CF so the community can figure out which one does not have random issues.

→ Reply

orz

5 weeks ago, # ^ |

I got it! That is why on Godbolt I was not able to reproduce it — UCRT is used there.

Red0

-38

I don't know why people disliked this :/ so... Deleted

vstiff

I am quite certain that transition of all invokers to Linux is the last thing MikeMirzayanov would prefer to do since it is for sure really costly in time, effort, architecture changes, and/or money.

I believe it's possible to keep all existing windows based infrastructure, and just add some linux based test runners in docker (or podman). The benefit is that it would make it trivial adding new languages, like common lisp (a saw some requests somewhere) without the risk to damage host libraries.

ShaoNianTongXue5307

where did clang go, i really need it to check my codes ub

Qwerty1232

← Rev. 4 →

+17

6: I believe it is possible to protect against this bug by using the following kind of allocator

code

#include <cassert>

constexpr size_t MEM = 2e8; // set just below the memory limit
char my_data[MEM];
size_t my_ptr = 0;
void* operator new(size_t s) {
    my_ptr += s;
    assert(my_ptr <= MEM);
    return my_data + (my_ptr - s);
}
void operator delete(void*) { ; }
void operator delete(void*, size_t) { ; }

Although it doesn't recycle memory, it is still an effective way to deal with the problem, since almost all solutions don't heavily depend on memory recycling.

The code as written is UB — names starting with double underscore are reserved for the compiler/standard library implementers.

Just to note, such an allocator is called a bump allocator, and can be found in KACTL for example.

Thank you for pointing out that this is UB

maxplus

← Rev. 3 →

+40

1: In a way, yes, but these testcases aren't very stable. Adding or removing an allocation of a suitable size will change the set of evil testcases, so abusing this bug to hack a solution would be much more complicated than hacking unordered_set.

4: Yes, 32-bit versions are affected. See TL 1000ms, WA 46ms.

9: Yes, UCRT is affected too. For this particular code, compiler and environment 12581 is an example of an evil constant.

hos.lyric

4 weeks ago, # |

+66

Considering how "standard" C++ 64bit is, I believe these things change the contest experience drastically and so it is worth having an official annoucement post fixed at the top of Codeforces.

4 weeks ago, # ^ |

+20

I totally agree with you. And if there is no official announcement post, I at least expect MikeMirzayanov to comment on items (2), (3), partly (5), and (8) somewhere else as nobody else knows what CF headquarters is planning to do regarding the situation. It is true that many people, including me, extremely rarely need anything but 64-bit GNU C++20.

Unfortunately, two easiest ways might be to completely ignore the problem, or to switch invokers to Linux. I understand why both of them might seem undesirable for various reasons.

_Kee

10: It's very hard to ensure non-existence of such bugs on Linux, but (1) I've never heard of them (I've been using Linux for a long time and have basic knowledge of memory allocators), (2) I searched the web for a while and could not find similar issues on Linux (OTOH I could find another nasty slowdown bug on Windows), and (3) if a similar bug was found on glibc allocator, it would be more likely to get fixed.

A little more context on (3): on Linux, memory allocators are mostly implemented in a user-land library called glibc (a library is basically the same thing as a DLL on Windows). If a similar bug existed on Linux, I'm very certain that it would be in glibc and not in the kernel (the core of the Linux OS). Bugs in user-land libraries are easier to handle -- basically libraries are compiled in the same way as you compile normal programs -- and I believe a performance bug like this would be considered very serious by glibc people and will get fixed fairly quickly.

I managed to find a (much milder) slowdown on Linux (on glibc, can't comment on other libraries), though it is nowhere near as concerning as the slowdown on Windows.

The value of DEFAULT_MMAP_THRESHOLD_MAX is generally 32MiB on 64-bit systems, so according to https://man7.org/linux/man-pages/man3/mallopt.3.html, mmap is used instead of sbrk. This leads to a difference in the performance characteristics at this cutoff, but it's still relatively pretty good. In this benchmark, with an input of 32, we are allocating and memsetting a 32MiB array to all 0s 1000 times (we need the memset else Linux does not give us the memory unless we use it). On my personal laptop it goes from 2s to 3.8s when input changes from 31 to 32 (considering that this is a huge memory allocation, it's still very fast).

Nice find! On my local (somewhat dated) environment I actually observed a much severe slowdown (~2.5s for MBs=31, ~14s for MBs=32), but I think this behavior is expected.

In my understanding, for the cases MBs<=31, the allocator keeps returning the same memory, so the benchmark basically writes the data into the same memory again and again, which is generally fast due to the caches. For MBs >= 32, the memory is returned to the OS (i.e. unmapped) every time delete[] is called, so the program is likely to receive a different chunk of memory in each iteration. In this case, the cache is unusable, so this is why the program runs slowly. Also there're tons of page faults happening, which also contribute to the slowness (actually, this can be the major reason of slowness).

Overall, the results don't look too bad to me, given that your benchmark writes more than tens of GBs of memory.

Interestingly your results seem to be reproducible on godbolt too. Which kernel are you using (output of uname -r ideally)? I am on 6.8.1, and there have been a significant number of optimizations for the memory management part of the kernel in the last two major releases.

I'm using Fedora's stock kernel; uname -r says "6.7.10-200.fc39.x86_64".

In the case of MBs==32, I can see the kernel time dominates the total running time (according to a sample run: 13.7s real, 3.3s user, 10.4s sys), so some kernel stuff may play some role here, like security hardening.

warks

My understanding here is limited, could this in turn also affect other languages via propagation through compiler-bootstrapping?

Interesting point. If the only reason of these slowdowns is Microsoft Windows, there might be killer snippets in other languages, too.

Compiler bootstrapping should not lead to any issues — the main issue is how malloc is implemented in the relevant DLLs on the Windows judge.

I was trying to reproduce this issue with Rust, but directly using the examples in the original blogpost seems to not reproduce the bug. I am guessing that it is still possible, but haven't succeeded yet.

For anyone who wants to reproduce the issue in Rust, here's some relevant stuff:

Default allocator documentation — apparently on Windows, they use HeapAlloc, so the hack for HeapAlloc should work here as well.
Some code:

Spoiler

I dug into the Rust source for how HeapAlloc is being used, and found this: https://github.com/rust-lang/rust/blob/master/library/std/src/sys/pal/windows/alloc.rs#L139 — this means that the usage is quite similar.

However, it seems that translating the code that breaks HeapAlloc on C++20 (GCC13) into Rust (with exactly the same allocations) does not work — relevant submission. Since the behaviour is also different between GCC versions as well as across C++ standards (and even between custom invocation and submission), a part of it seems to be some issue with libraries installed on Codeforces. It is quite easy to experiment with constants (look for evil and total in the code), so interested people can tweak the constants and try to figure out which ones work. It's possible that they don't work at all, but I've only tried around 10 such constants.

+13

I tried using a simple script (link) to find the "evil" constants for Rust, and it actually worked TL, WA

Nice! No wonder I was not able to find the right constants by hand, 10991 is pretty far from 10000 :)

Now that we have a proof of concept of this issue coming up on a programming language other than C++, will Codeforces migrate to a different OS, potentially a Linux VM or just Linux on bare metal, MikeMirzayanov? From what I have been recently told, setting up an online judge on Windows is much harder than on Linux, so I am a bit hopeful about it.

Even python seems to be affected by this bug WA ~500ms, WA ~60ms (n = 10^5) and TL 1000ms, WA ~250ms (n = 10^6)

Evil constant was found by similar script (but the same script found nothing for PyPy)

However, in PyPy there are other sorts of strange slowdown bugs, see https://codeforces.com/blog/entry/126654?#comment-1124635, for instance.

TheScrasse

+28

As a coordinator, I don't know how to deal with point 2. (for example, in the upcoming CodeTON Round 8 (Div. 1 + Div. 2, Rated, Prizes!)), since there is no official answer to point 3.. So I would like a reply on this blog by MikeMirzayanov.

AkiLotus

Regarding 2 and 3, and this is my opinion with the experience of a past problemsetter.

I would only prepare counter tests if an exploit is stable throughout and not case specific. From what I am understanding right now (please correct me if I'm wrong), the evil array has a pattern, but there's not yet a universal constant that would guarantee to fail every forms of "obviously exploitable" implementation, thus I will not construct those. If there is a universal constant though, it will be included, even at pretests — I'd rather have participants figuring out in contest instead of future frustrations due to FSTs.

Speaking of which — FSTs will happen regardless of problemsetter's decision, due to hacking feature (and the fact that successful hacks on a submission which was originally AC in system test will be automatically added into the testset). However, I have an intuition that the variety of counter tests would be fewer if the contest is rated for everyone (i.e. not ruling out Div1 people), since rating-related consequences are obvious, lowering the incentive to attempt a mass hack on different evil arrays.

MikeMirzayanov

+148

TLDR: Very soon, I will migrate the testing servers to Windows Server 2022. I have spent a great deal of effort in recent days to accomplish this.

Good news: most of these issues are not reproducible there (more precisely, we do not know how to reproduce them), and it will be possible to update some compilers/interpreters that could no longer be updated for Windows 7.

Bad news: modern operating systems have deep protections against speculative vulnerabilities (like Spectre, Meltdown, and others), and therefore, on the whole, any solution will work a bit longer (by my estimates, plus 5-15%).

P.S. We will likely switch to LLVM's clang++ instead of g++. It performs better in some tests.

Are there general methods of constructing testcases that can exploit this GCC/Windows bug and therefore slow down solutions (like there are anti-hash tests for solutions using unordered_set)? Are there methods that slow down both contest mode and custom invocation mode (since 249807302 is only slow in the contest mode whereas 253289473 is only slow in the custom invocation)?

I'm not entirely sure about the current situation. I hope that after migrating to Windows Server 2022, there won't be such methods.

If so, should we expect such tests in future contests, will such methods be used during the test preparation in contests?

I hope that after my work, it will not be possible to exploit the operating system/compiler bugs so explicitly. Therefore, when preparing problems, one should not expect that such tests will be specifically constructed.

However, I urge you to clearly distinguish between two different scenarios in your mind: a bug and a feature of implementation. For example, exploiting collisions in hash tables, quadratic tests for sorting in Java — these are examples of using features of the standard library. The presence of such tests is a feature of problem preparation, and they may either be present or absent in the test set.

If this is deemed an unwanted bug, will participants be protected from unscrupulous hacks and tests? (For example, if the solution was hacked or FSTed because of a compiler bug rather than because of anything else, will the author be able to appeal against it and get their solution accepted?)

In any fairly large and complex software, there are bugs. This is especially true for compilers with their aim to optimize the resulting code. Sometimes the line between a bug and a feature is thin and undefined. I am not ready to take responsibility on our part for individual attention to cases of probable bugs in compilers. In other words, if a participant writes code that encounters unexpected behavior from the compiler/OS, we will not be able to individually understand and help in this situation. On the other hand, I am closely monitoring the situation and paying attention to widespread instances of any such problems.

Is there no bug in 32-bit version? If it's 64-bit only, maybe CF team could add a C++20 32-bit compiler? UPD: 32-bit version can also be slowed down, see 253400416.

I think that 32-bit versions are stable. But currently, I am working on resolving the situation more systematically, rather than fighting it using half measures.

Is there this bug in Microsoft Visual C++ compilers (32 or 64 bits)? If so, is it the reason why we cannot see it among the options?

Read the previous response.

Are there generic methods of protecting against the bug (except switching to 32 bits)? I read (and enjoyed) a blogpost by ToxicPie9 regarding this issue, he suggested to add several lines of code to the solution. However, it does not always help: 253290209 still gets TL (and the author said about that).

I really hope that switching to a modern OS (and likely to clang++ as well) will resolve the issue. However, I'm fairly certain that this won't be the last bug in the compiler/OS, and over time, we will discover new ones.

Has anyone a clue what is going on, why do the aforementioned programs run slowly? Is this bug reported to the GNU or Microsoft developers who are responsible for this strange behavior?

I haven't delved deeply into this issue. It seems that in the current setup, even if somehow resolved, the problem will emerge in some other unexpected place. I prefer to try moving forward.

Why does behavior in custom invocation and in contest/gym mode differ?

Many bugs in the world are quite unstable to reproduce. This is an example of one of them. The method of launching for solutions during testing and during custom invocation is the same.

Is this bug reproducible on other platforms? For instance, on godbolt there are some windows compilers. Is there a way to slow down this code: https://godbolt.org/z/5G73dfezK (by cleverly choosing the value of evil or anything else)? UPD: probably no way since Godbolt uses UCRT rather than MSVCRT. UPDUPD: this code is slow on Godbolt: https://godbolt.org/z/WE4Mxv9cK, so UCRT is also flawed.

It seems to me that this problem is clearly reproduced only on specific versions of Windows 7. This does not mean that there are no other problems in Linux or newer versions of Windows. Tests by maxplus (thank you!) show that there are some irregularities in other setups as well, but we expect that they have little impact on the testing of problem solutions. In any case, I plan to make the main setup one in which such problems do not reproduce at all.

Is there surely no such bug on Linux? (Anyway, with regard to Codeforces it is probably a pointless question. I am quite certain that transition of all invokers to Linux is the last thing MikeMirzayanov would prefer to do since it is for sure really costly in time, effort, architecture changes, and/or money.)

Apparently, this particular bug does not reproduce on Linux. However, even on Codeforces, we lived for a long time without noticing it. I don't know and cannot confidently assert what happens on other systems, especially considering the wide variety of versions, hardware, security patches, etc.

Regarding the switch to Linux: yes, it is a labor-intensive and complex task. We have a huge number of problems (checkers, author solutions, generators, etc.) that may not work or work unpredictably on other systems. In some cases, they are written in languages/dialects whose support in Linux significantly differs.

I should note that the number of participants using Windows is twice as many as those using Linux and iOS combined.

+34

Regarding this, will clang++ have any major differences compared to g++? I am sorry if this question sounds naive or something, but I'm quite in the dark regarding the technical details / implementations behind compilers.

+49

From what we have seen above, MSVCRT, UCRT and the direct low-level Windows API (HeapAlloc, also used by MSVCRT and UCRT) all three are hackable. Any compiler will use one of these three implementations, so the only possible fix will be if HeapAlloc is fixed in the new OS (and the corresponding runtime that is used is also fixed). The compiler has nothing to do with this issue.

Migrating from g++ to clang++ will also prevent usage of optimization pragmas, and might introduce some peculiarities to the behaviour of submissions, as well as break compatibility with mirrors/other contests (for example, most onsite contests use g++ and not clang++). GCC also generally produces better code (at least on Linux), and has better warnings (again, at least on Linux). There might be fewer compiler bugs in Clang, but what really matters is whether those compiler bugs are relevant for competitive programming or not — and GCC is much more thoroughly tested (both generally and for competitive programming) compared to Clang. It makes sense to add both compilers and let users decide — this is what AtCoder does, for example.

OTOH, I am hopeful that with the OS update, there will also be newer programming languages added to Codeforces that a lot of people have been requesting for more than a decade now.

EDIT: If the include path is changed when migrating from g++ to clang++ (and/or if g++ is not installed side-by-side), it will also become impossible to use bits/stdc++.h and pbds, which a lot of people use.

+15

So in short we can start having issues with bits/stdc++.h, pbds and pragmas. Just what I'm fearing... (it's okay, I can still adapt, but it'll be a nuisance in some first days)

Much thanks for the insight!

Maybe the set of allowed pragmas will change a bit. As for pb_ds, this is going to be sad. But I won't commiserate over the loss of bits/stdc++.h, it's of no good but slowing down the compilation several times.

+24

I don't know of any way of setting pragmas globally in Clang. It is not possible to set O3 or Ofast or O1 or O0 even for specific functions (only optnone works, which disables all optimizations for that function which is useless), but it is possible to specify targets using something like [[gnu::target("avx2,bmi,bmi2")]] just before the function declaration.

People often use bits/stdc++.h alongside precompiled headers — in fact, the official reason this file exists is to allow precompiled headers. It ends up being faster than manual includes that you might have in your template. However, if we are using C++23, there is module support that, if Codeforces enables, would lead to even better compilation times on CF (and it is better to use them locally too). It is not trivial to enable module support though. My favourite reason to use modules is that it gets rid of the includes and a simple line import std; at the top imports the whole standard library and compiles in barely noticeable time. This is however conditional on C++23 support from compilers.

tatianyi

3 weeks ago, # ^ |

The clang++ is just a compiler frontend that doesn't come along with an STL implementation. On Windows, it could use MSC++'s MSSTL, or use MinGW64-UCRT's libstdc++. I would call the former "clang-cl", and the latter "MinGW clang".

The clang-cl would be compatible with most MSC++ compile commands, and MinGW clang would be compatible with most GCC compile commands. MikeMirzayanov didn't mention which clang would be migrated to (or both?). If it's clang-cl, almost all the behaviour would be as similar as MSC++ (except VLA extension), such as 64-bit long double type, using std::_Signed128 instead of __int128 for 128-bit integers support, lack of some builtin functions like __lg (although it can be replaced by C++20 std::bit_width(x)-1), etc.

However, clang could manually set its compile target to MinGW, or automatically do this if it's installed with MinGW (e.g. the packed one with winlibs, currently it provides clang17 with UCRT and clang18 with MSVCRT, but clang18 with UCRT may be coming soon). In this case, all headers that libstdc++ has could be used normally, even with full precompile header bits/extc++.h, and other behaviour would be compatible with GCC, like #pragma GCC with the usage of optimization pragmas, and __int128 and 128-bit long double and more. It's so compatible that you could hardly notice that you are not using GCC.

Although GCC is much more thoroughly tested as so many OJ are using this, Clang would have a much more aggressive optimisation and better performance due to many tests of participants (say, much faster or even TLEs become ACs after switching GCC to clang on Atcoder). What's more, both clang-cl and MinGW clang have the almost same optimisation ability for my test codes on Windows, thus is no worry about its code generation. And of course, it makes sense to add both compilers and let users decide.

I meant that generally clang comes bundled on systems that have libc++ instead of libstdc++ (and Codeforces would probably just use the default), and assumed that it would be the case on Windows too. Given how many issues this website has related to language versions, it is probably safe to assume that their knowledge of setting up compilers is a bit less specialized than what would be optimal, so I thought this would be a fair assumption to make.

#pragma GCC optimize is a compiler feature though, and not a library feature, so I don't see how that helps. On my system where I use libstdc++, it doesn't get recognized as a valid pragma by clang, confirming that this is the case.

Of course, clang is a production-ready compiler, so there is no reason to believe that it will generate worse code. Whatever one compiler optimizes, the other often catches up with in a few months generally. I was just speaking from my experience while in uni, when I found that GCC was generally the first one to include certain optimizations (though it might not matter for Codeforces because it has never had bleeding-edge compilers). clang uses LLVM meanwhile GCC uses something of its own, so developing the compiler is slightly easier for clang too, which might make it better in the long run.

However, my concerns about the usability of clang in competitive programming were about things like this issue that was fixed surprisingly late (quadratic std::sort was fixed in 2021, even when C++11 had guaranteed $$$O(n \log n)$$$ sorting). I have not seen libstdc++ ignore the standard for this long, for something this important to competitive programming. There are also subtle differences with how GCC and clang handle library features — for example, std::move is a compiler built-in in clang instead of being a library feature (this is actually gives better baseline performance with -O0 compared to GCC, but the difference vanishes at least at -O2).

Well, in my simple tests, things don't seem to be the same as you say. The clang on Linux or Windows is usually bundled with libstdc++; it seems that only on MacOS is the clang by default bundled with libc++.

The bad performance of many STL implementations of libc++ is surely concerning. I got to know that for C++17std::gcd on Apple Clang's libc++, it would use the Euclidean algorithm instead of Stein's.

However, as for me, I can hardly use libc++ for clang on Windows. I tried to install and configure that but failed. The result is, that I can only use clang-cl or MinGW clang version, whose default configuration is using msSTL or MinGW libstdc++.

In my test case, the clang-cl that uses msSTL would report warning: unknown pragma ignored [-Wunknown-pragmas], but the MinGW clang doesn't. It does use MinGW libstdc++, but in my point of view, it is compatible with almost all GCC's compiler features and library features.

So as a newbie participant, in my point of view, I still hold the point that for competitive programming, the MinGW clang is almost no different in the usage experience, even the clang-cl would have better performance than the "default" libc++ clang. Maybe what you are concerned about would not appear on the Windows platform with those two versions?

Bundling is a bit different across compilers. What you note happens with Linux because most distros simply just use GCC for everything, and libstdc++ is provided as a part of GCC itself (this bundling is non-standard for compilers). So if you install clang on such a system, it will almost always be configured to detect libstdc++ that GCC had installed for you — for example, see this SO post. On systems that have libc++ bundled, clang is usually the recommended compiler (I think you might have misunderstood my comment, which could also have been better written). This also makes people believe that clang should only be used with libc++. And this is not really bad advice — since compiler extensions sometimes break when things are mixed and matched. For example, a while back, 128 bit unsigned integers did not work when using clang with libstdc++ but worked with clang with libc++ and GCC with libstdc++. Similarly, these kinds of incompatibilities sometimes crop up, so using libc++ is the safer choice in such cases. Not saying it is impossible to use libstdc++ with clang (I use it on my personal system), but one day down the line someone will report an issue on Codeforces and then Codeforces will start using libc++ instead, and as you pointed out, libc++ implementations are not very polished.

Coming to the situation on Windows, I am surprised you failed to set up libc++ on Windows — the documentation states that Windows is officially supported for the libc++ project, so I would expect the installation process to be much easier (but then it is also Windows...).

It's also unexpected that clang-cl with MinGW libstdc++ does not warn you about the pragmas. Clang supports quite a few GCC pragmas as in the doc, but #pragma GCC optimize and #pragma GCC target are not among them (and since competitive programmers only care about these, I simplified in one of the previous comments and made the remark that pragmas don't work on Clang in general). Pragmas are just preprocessor directives, so it is the compiler that will take care of them instead of the standard library. I recommend trying to compile the following program and looking at the executable sizes with and without passing -O3 as a flag. I would be surprised if the sizes are the same on Windows (they are different on Linux for example) — because that would mean that clang-cl does not have the same support for optimization levels as clang.

#pragma GCC optimize("O3")
#include <bits/stdc++.h>
int main() {
    int n, ans = 0; std::cin >> n;
    std::vector<int> a(n, 10);
    for (auto x : a) ans += x;
    std::cout << ans << '\n';
}

I'm just a Windows user and rarely use Linux. It's my fault that I didn't read the doc thoroughly, and compile and build the libc++ myself. For me, I usually don't compile those libs. Instead, I tend to use existing packages. I install the MSC++ and clang-cl using VCInstaller, and install the MinGW clang using msys2-mingw-w64-ucrt toolchain which acts like a "Linux environment" that comes with GCC by default.

The latter is way much easier than building with CMake. Thus, it naturally comes with libstdc++ as it's bundled with GCC toolchain and library. On the other hand, if you try to sync with CodeForces' language config (I mean, see the word "winlibs" in that quote), you could download the GCC + LLVM/Clang/LLD/LLDB release, which is similar to the aforementioned "msys2-mingw-w64-ucrt" version.

In this case, I used your code and compiled it with the -Wall flag. It does generate a warning message, and the #pragma GCC directive is ignored, the executable sizes are the same here. Seems like I made a bit of a serious error in my testing. I'm sorry.

Although users can not use the pragma directive to enable O3 optimization, the O2 for clang is already nice enough for competitive programming. I'm not saying that clang O2 is necessarily better than GCC O2 or even O3. After all, this thing isn't absolute as we encounter different results. It's enough for this use, I think.

Apart from that, as for the __int128 problem, I simply added unsigned __int128 x; line and it successfully compiled. However, for this code you mentioned,

it failed to compile both for MinGW clang and MinGW GCC(11.4 and 13.2, ucrt).

Clang

./Main.cpp:8:2: error: static assertion failed due to requirement 'std::is_unsigned<unsigned __int128>{}': 
        static_assert(std::is_unsigned<uint128_t>{}, "");
        ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./Main.cpp:9:2: error: static assertion failed due to requirement 'std::is_integral<unsigned __int128>{}': 
        static_assert(std::is_integral<uint128_t>{}, "");
        ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.

Even unexpectedly, on some online compilers like Customtest of CodeForces and Godbolt, it still failed to compile.

GCC

<source>: In function 'int main()':
<source>:8:28: error: static assertion failed
    8 |         static_assert(std::is_unsigned<uint128_t>{}, "");
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~
<source>:9:28: error: static assertion failed
    9 |         static_assert(std::is_integral<uint128_t>{}, "");
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~
Compiler returned: 1

As far as I tested, only GCC 12.2 on AtCoder could pass this code. It's confusing and may come with some additional compile flags that control those behaviours.

Although users can not use the pragma directive to enable O3 optimization, the O2 for clang is already nice enough for competitive programming.

Indeed. I was more concerned about things like loop unrolling and Ofast to get in some extra optimizations that the compiler would ordinarily not do (and for good reason in most cases). The lack of global target pragmas bugs me a bit too — the syntax I mentioned in a previous comment allows you to compile with specific targets for your own functions, and not the header files you include, so a lot of the code still remains potentially unoptimized. All ways I can think of involve marking functions with attributes (and I am not completely sure whether these attributes are applied recursively down the AST).

The reason why you face the issue with the static_asserts is that 128 bit integers are a compiler extension, and as such they don't have to be bound by the standard. A simpler example — if you try to using std::cin or std::cout with 128 bit integers, you would find that it does not compile with certain compilers (if any compilers at all). The point of sharing that link was that there might be some features that just are not supported by compilers if you use a "non-native" standard library implementation with them — it might have been fixed now, but there is potential for breakage in other features due to lack of official support.

nicksms

If the issue persists after upgrade, would it be possible to try enabling the segment heap (see page 7 of this)? Based on what I've skimmed it adds some more CPU overhead and is slower in some cases, but might not have the same blowup that the LFH does.

Thank you for your answer!

When approximately to expect this transition to the new OS?

+37

The plan is to begin the migration on Monday morning.

DevUt

+18

Will you be posting a separate blog regarding the changes ? I'm just curious when will we be able to try the new stuff hands on.

Thanks for your work!

The new Windows Server 2022 is deployed already.

You can use the code in custom test to get info about the system:

For Windows only

And now the output is

===OS information===
Windows Server 10.0.20348
===CPU information===
Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz
===Memory information===
total 15.86 GB (8.88 GB available)

2200520100138

3 weeks ago, # |

-23

I an writing a code on codeforces and in C++17 is giving right answer but C++20 is giving wrong

Now here is the submission link for both code. I write it on yesterday so I requesting you to also look this issue

[submission:https://codeforces.com/contest/1795/submission/255050449] Giving Runtime Error

[submission:https://codeforces.com/contest/1795/submission/255050551] Perfectly Exicuted in c++17.

-10

Please Look this issue also

-is-this-fft-

+14

Your code contains undefined behavior.

On line 50, you do if (a[i] == *it), where it = lower_bound(b.begin(), b.end(), a[i]). Sometimes a[i] is bigger than any element of b (e.g. when $$$a = [10^9, 10^9]$$$ and $$$b = [1, 1]$$$). If that happens, then it is equal to b.end(), and dereferencing that is undefined behavior. In other words, anything is allowed to happen. You just got lucky that this ended up doing what you wanted to do with C++17.

You also have another similar bug, on line 70 j can sometimes be -1, and accessing b[-1] is again undefined behavior.

Here's how you can find bugs like this on your own in the future. It's not a lot of work, it took me perhaps 5 minutes.

Compile with --fsanitize=address and --fsanitize=undefined.
Run your code on thousands of small, randomly generated test cases. If something like this happens, you'll see diagnostic info on what went wrong. It looks quite scary, but it has line numbers so you'll find out where the bug was.

In general when you have issues like this where something works on one compiler and doesn't work on another, your code usually has mistakes like this. It's very rare that this is a compiler bug that MikeMirzayanov needs to look at. The people who found the issue that necessitated the changes in the blog above did far more thorough testing.

Thanks I got my mistake.

I was thinking the run time error was due to any issue of C++20.

Xellos

+16

Note that it's not necessary to change a whole system or all libraries to use a different heap allocator implementation. A colleague recently told me about GNU malloc sucking for multithreaded memory fragmentation, with jemalloc or tcmalloc as a solution. It's possible to directly link or preload... on Linux, at least, but the principle's there.

orz's blog