Contents
for f in `cat /proc/cpuinfo |grep avx | uniq `;do if [ "${f##avx}" != "$f" ]; then echo $f; fi; done
In case you are not familiar to the above given line - here is some information about it.
The output may contain the following lines:
name | instruction set | #floats | #doubles |
"empty" | no avx | ||
avx | set 1 - limited | 8 | 4 |
avx2 | set 2 - advanced | 8 | 4 |
avx512f | foundation | 16 | 8 |
avx512cd | conflict detection | 16 | 8 |
avx512dq | double word | 16 | 8 |
avx512bw | byte and word | 16 | 8 |
avx512vl | vector length | 16 | 8 |
If you are going to use AVX the entry avx2 means that you may work on 4 doubles at a time and thus your code may run 4 times faster with vectorization. How is explained below. The output displays only the instruction sets available, avx2 is a superset of avx and avx512 is a superset of avx2. If several instruction sets are listed therefore only the highest one is responsible for the number of operands treated in vectorization (more details may be found here).
Some CPUs may contain more than one vector unit per core. Xeon Platinum, Gold 61XX, and Gold 5122 have two AVX-512 FMA units per core. Xeon Gold 51XX (except 5122), Silver, and Bronze have a single AVX-512 FMA unit per core. If there are two units, the number of operands treatable in vectorization is of course twice the number specified in the table above.
The model type of your CPU is available through
cat /proc/cpuinfo |grep "model name" | uniq
In some cases you don't even have to change the code to gain this performance boost, auto vectorization by the compiler will do the most of the job.
The sets avxdq, avxbw and avxvl are only of minor interest (details) and cover peripheral requirements like float to 64-bit integer conversion and support of 8-bit and 16-bit elements and the possibility to vectorize smaller loops.