Hi everyone!
Some time ago I started developing a linear algebra library for Library Checker, and it turned out to be much fun. I'd like to write this blog to outline how one can implement a linalg library with a very good Library Checker performance, while also avoiding highly technical low-level details that are common in other similar blogs. Instead, we will learn a few high level ideas that will let us write a code that the compiler will then optimize using all these fancy techniques in our stead.
Huge thanks to ToxicPie9 and magnus.hegdahl for explaining me some of the optimizations used here! You may also look into sslotin's entry about how it can be sped up even further with more advanced and low-level techniques.
Tl'dr.
In this blog, I present my matrix library, which maintains a simple and elegant implementation of basic matrix operations without advanced techniques, while also having a remarkable performance on most Library Checker linear algebraic tasks:
- $$$1024\times 1024$$$ matrix product: 400ms, roughly top-15% of Library Checker submissions.
- $$$200\times 200$$$ matrix pow: 27ms with Frobenius form, top-1 performance. 200ms with binary exponentiation, top-10 performance.
- $$$500 \times 500$$$ determinant: 29ms, top-2 performance.
- $$$nm \leq 2.5 \cdot 10^5$$$ rank of matrix: 38ms, top-1 performance.
- $$$500 \times 500$$$ system of linear equations: 39ms, top-1 performance.
- $$$500 \times 500$$$ inverse matrix: 84ms, top-2 performance (current top-1 is the same code + fast IO).
- $$$500 \times 500$$$ characteristic polynomial: 93ms, top-1 performance.
Unlike most implementations, I do not manually optimize each of these tasks separately, but rather use a uniform optimized approach to implementing them via simple array operations, and only the later ones are somewhat optimized (with a very simple high-level approach). This also allows to avoid a significant code bloat into unreadable mess, which is common for regular Library Checker's superfast solutions.
Note: All the implementations above are focused on integer arithmetics module a prime number known at compilation time. Results of operations when the underlying type does not form a field (e.g. pure integers, composite mod, etc) may be unexpected, and will most likely be so if any division is expected.
While the library theoretically also supports floating point underlying data types (as in, compiles when they're plugged in), this functional is completely experimental, untested and may easily behave in an unexpected manner.