Main Navigation

Secondary Navigation

Page Contents

Contents

SIMD (Single Instruction Multiple Data) instructions have been available on CPU's from many different vendors for some time now. Intel originally had the MMX (MultiMedia eXtensions). These extensions were originally intended to speed up the processing of multimedia data such as the compression and decompression of images and audio files through the parallelization of integer arithmetic operations. These were extended with the 128-bit SSE/SSE2/SSE3 (Streaming SIMD Extensions). In turn, these have been augmented with the AVX and AVX2 (Advanced Vector Extensions) extensions. These offer the potential for considerable performance speed-ups through the parallelized nature of these instructions, being able to perform multiple floating-point or double arithmetic operations in a single instruction call such as FMA (Fused Multiply and Add). A remnant of the MMX era is the denotation of the registers used for AVX as YMM or ZMM registers. Wether a code is vectorized and which kind of vector registers are used is best controlled by generating an assembly output (Option -S) and searching for "mm" in the resulting *.s file:

grep mm my_source.s

These SIMD instructions are quite similar to the instructions found in the assembly language used by shaders available on GPU's, and as a result many applications have been optimized by using the CPU as the equivalent of a GPU core with these instructions.

Vectorization is a special case of SIMD. Producing SIMD codes is sometimes done manually for important specific application, but is often produced automatically by Compilers (referred to as auto-vectorization). Auto-vectorization is enabled through compiler options. Compiler options for auto-vectorization can be referred from the relevant manuals, like Intel C++ Compiler User & Reference Guides or manuals for the GNU compiler suite.