Contents
Compilers
Different compilers to the same language all obey the language standard but have different capabilities to generate optimized code and different possibilities, i.e. options and pragmas, to interact. Here is a collection of the most important options and pragmas for the Intel compiler and the GNU compiler.Options and there Meaning
Meaning | Intel compiler | GNU compiler |
compiler invocation | icc (C) icpc (C++) | gcc(C) g++ (C++) |
Basic optimization | -O | -O |
compile only | -c | -c |
Vectorizing reductions | -ffast-math | |
Vectorizing AVX (256-bit) | -xavx | -mavx |
Vectorizing AVX (512-bit) | -xCOMMON-AVX512 | -mavx512f -mavx512cd |
Reports | -qopt-report=1 -qopt-report-phase=vec |
-ftree-vectorize -fopt-info-loop-optimized |
Assembler code | -S | -S |
Optimizing function calls | -ipo | -flto |
OpenMP | -qopenmp | -fopenmp |
Pragmas
Meaning | Intel compiler | GNU compiler |
ignore potential dependencies | ivdep | GCC ivdep |
AVX specific options
The Intel compiler provides several AVX specific instructions at different compiler versions. There difference is not that clear seen from documentation but in the generated code. Best results are in all cases seen with the option -axCOMMON-AVX512 which is documented to be the option for any Intel processor that supports AVX512 instructions.The option -axCORE-AVX512 is documented to be the option for the XEON PROCESSOR SCALABLE FAMILY which covers for example all Skylake processors with AVX512 instructions thus the difference is not really clear. But (s. AOS vs. SOA) -axCORE-AVX512 seems not to be able to load elements in a AOS data layout into its AVX512 registers while -axCOMMON-AVX512 does.
-axCOMMON-AVX512 is therefore our recommendation.
Upwards version 7.0, the GNU compiler is capable of generating AVX-512 code. Several options are required to achieve good vectorization. Most import from a performance point of view seems to be the option for link time optimization (-flto). It appears that function calls in different files are not properly optimized and information about non aliased pointers are not properly handled without this option. For performance we recommend the options:
-O3 -ftree-vectorize -mavx512f -mavx512cd -ffast-math -march=skylake-avx512 -flto