Contents
Explanations
GFlops
Flops is the shortcut for Floating Point Operations per Second. Most numeric calculations handle floating point numbers like 1.34 or 0.7e-5. To compare performance of different systems we count the operations (only * and +/-) and measure the time required which gives us a Flops rate. GFlops are just 1.000.000.000 operations per second.
Intel Compiler
The compiler suite from Intel is a commercial product which requires licencing. We buy and offer several of this licenses to our costumers. To access the compiler and additional products select the corresponding module and use ifort (FORTRAN), icc (C) or icpc (C++) for compilation.
Calculating Peak Performance
Here a simple calculation for our systems. The best operation is a so called fused multiply add (FMA) operation which delivers 2 Flops every cycle (+/- and *). Without vectorization a core has 1 FMA unit and a core running at 2 GHz (= 1./cycle time) therefore has a peak performance of 4 GFlops. AVX provides additional FMA units (see introduction). If we look at double precision AVX and AVX2 will result in 16 GFlops peak and AVX512 in 32 GFlops. The Skylake generation of Intels CPUs may have up to 2 of these AVX512 units resluting in a peak performance of 64 GFlops.
This is not quite true as AVX instruction typically run at a lower cycle time than normal instructions.
Pragmas
The #pragma directive is the method specified by the C standard for providing additional information to the compiler, beyond what is conveyed in the language itself. Unfortunately there is no common definition for pragmas for all compilers making their usage problematic for portability. The IVDEP pragma is to my knowledge general available. It tells the compiler that the subsequent loops carries no data dependencies.