Tutorial on SIMD vectorization to speed up brute force

Правка en1, от maomao90, 2022-01-02 10:52:20

I decided to write a blog on this as I was doing a problem on local judge and I decided to try to speed up my brute force code. However, it was quite difficult to find resources on SIMD vectorization, so I decided to try to compile some of the resources I found together to hopefully allow more people to learn to scam brute force solutions

Introduction

SIMD stands for single instruction, multiple data. SIMD allows us to give vector instructions which will allow the code to run faster. Vector instructions are instructions that handle short (2-16) vectors of integers / floats / characters in a parallel way by making use of the extra bits of space to do operations simultaneously.

To make use of SIMD, we have to add the following code at the top of the code.

#include <nmmintrin.h>

#pragma GCC target("avx2")

There are three 128 bit data types for integers, floats and doubles.

__m128i i; // stores 4 integers since each integer is 32 bit and 128/32=4
__m128 f; // stores 4 floats since each float is 32 bit and 128/32=4
__m128d d; // stores 2 doubles since each double is 64 bit and 128/64=2

The integer data type can also be used to store 32 characters.

In order to do operations on these data types, we will make use of SIMD intrinsics.

Теги brute-force, simd, scam

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en30 Английский maomao90 2024-04-07 12:38:41 746
en29 Английский maomao90 2023-07-31 12:10:52 18 Tiny change: 'ation.\n\n[cut]\n\nSyntax' -> 'ation.\n\nSyntax'
en28 Английский maomao90 2023-07-31 12:07:15 9 Tiny change: 'ation.\n\nSyntax' -> 'ation.\n\n[cut]\n\nSyntax'
en27 Английский maomao90 2022-01-04 15:10:01 0 (published)
en26 Английский maomao90 2022-01-04 15:08:59 445 Added mullo (saved to drafts)
en25 Английский maomao90 2022-01-03 15:44:16 0 (published)
en24 Английский maomao90 2022-01-03 15:43:41 1045 Added conclusion
en23 Английский maomao90 2022-01-03 14:09:40 19
en22 Английский maomao90 2022-01-03 14:01:41 454
en21 Английский maomao90 2022-01-03 13:54:48 141 More minor edits
en20 Английский maomao90 2022-01-03 13:49:50 342 Minor edits
en19 Английский maomao90 2022-01-03 11:54:29 19
en18 Английский jamessngg 2022-01-03 11:51:18 14
en17 Английский maomao90 2022-01-03 11:46:17 3 Tiny change: ' 3 spaces 4 intege' -> ' 3 spaces as 4 intege'
en16 Английский maomao90 2022-01-03 11:43:21 14367 Added more examples and references
en15 Английский maomao90 2022-01-03 10:10:08 2425 Added logical arithmetic
en14 Английский maomao90 2022-01-03 06:44:08 13 Added u to loadu and storeu
en13 Английский maomao90 2022-01-03 06:07:16 4 Tiny change: 'nly store 2 64-bit in' -> 'nly store two 64-bit in'
en12 Английский maomao90 2022-01-03 04:26:21 18 Tiny change: 'pi32(sum, (1 << 8) - 1);\n }' -> 'pi32(sum, 0b11111111);\n }'
en11 Английский iLoveIOI 2022-01-03 03:40:25 8
en10 Английский iLoveIOI 2022-01-03 03:39:44 28 Tiny change: '128 bits. Similar t' -> '128 bits. **Note that b is before a** Similar t'
en9 Английский iLoveIOI 2022-01-03 03:32:52 239
en8 Английский maomao90 2022-01-02 18:23:36 3815 Added examples and made minor edits
en7 Английский maomao90 2022-01-02 16:44:58 2611 Finalised syntax section
en6 Английский maomao90 2022-01-02 15:35:09 1080 Finish reordering
en5 Английский maomao90 2022-01-02 14:16:50 1267 Completed 2 functions of reordering
en4 Английский jamessngg 2022-01-02 13:40:06 38
en3 Английский maomao90 2022-01-02 13:33:21 2 Tiny change: 'de block\n~~~~~' -> 'de block\n\n~~~~~'
en2 Английский maomao90 2022-01-02 13:28:26 3049 Finish loading and arithmetic
en1 Английский maomao90 2022-01-02 10:52:20 1383 Initial revision (saved to drafts)