Contents
Helping the Compiler to Vectorize
Sometimes the Compiler has insufficient information to decide to vectorize a loop. There are several ways to provide additional information to the compiler:Pragmas
ivdep
Potential dependencies prevent auto-vectorization. #pragma ivdep may be used to tell the compiler that it may safely ignore any potential data dependencies. Consider the following loop:for (j=1; j<MAX; j++) C[j]=A[j]+B[j];
For some iteration j the output C[j] might refer to the same memory location as a one of the inputs A[j] or B[j]. This prohibits straightforward conversion of this loop into vector instructions. At this point, the compiler may decide to keep the loop serial or generate a run-time test for overlap, where the loop in the true-branch can be converted into vector instructions. Run-time data-dependency testing provides a generally effective way to exploit implicit parallelism in C or C++ code at the expense of a slight increase in code size and testing overhead. If the loop is only used in specific ways, however, you can assist the vectorizing compiler as follows:
- If the loop is mainly used for small values of MAX or for
overlapping memory regions, you can simply prevent vectorization and,
hence, the corresponding run-time overhead by inserting a
#pragma novector
hint before the loop. - Conversely, if the loop is guaranteed to operate on non-overlapping
memory regions, you can provide this information to the compiler by
means of a
#pragma ivdep
hint before the loop, which informs the compiler that conservatively assumed data dependencies that prevent vectorization can be ignored.
#pragma ivdep
for (j=1; j<MAX; j++) C[j]=A[j]+B[j];
Beware, that the GNU compiler requires a
GCC identifier for
the pragma and the ivdep pragma reads:
#pragma GCC ivdep
loop count
This pragma may be used to advise the compiler of the typical trip count of the loop. This may help the compiler to decide whether vectorization is worthwhile, or whether or not it should generate alternative code paths for the loop. It is not available for the GNU compiler. For the Intel compiler it may read:#pragma loop count (3)
vector always
This pragma asks the compiler to vectorize the loop if it is safe to do so, whether or not the compiler thinks that will improve performance - Intel only.novector
This pragma asks the compiler not to vectorize a particular loop - Intel only.Keywords
The restrict keyword may be used to assert that the memory referenced by a pointer is not aliased, i.e. that it is not accessed in any other way. Consider the following example:For some iteration i the output c[i] might refer to the same memory location as a one of the inputs a[i] or b[i]. This is a case for the ivdep pragma or we may use the restrict keyword defined in the C99 language standard. The first line with this keyword looks like:
void func(int n, float *restrict c, float *restrict a, float *restrict b) {
Though both ivdep pragma and restrict keyword helpt to vectorize the loop, the restrict keywords generates the leaner and faster assembler code and it is therefore strongly recommended to use it.