这里简要介绍AVX指令集的一些基本指令,可以通过调用C++的库函数实现SIMD。
见并行编程
Data Type | Description |
---|---|
__m128 |
128-bit vector containing 4 floats |
__m128d |
128-bit vector containing 2 doubles |
__m128i |
128-bit vector containing integers |
__m256 |
256-bit vector containing 8 floats |
__m256d |
256-bit vector containing 4 doubles |
__m256i |
256-bit vector containing integers |
_mm<bit_width>_<name>_<data_type>
<bit_width>
: the return size, 128 - empty, 256 - 256<name>
: describes the operation performed by the intrinsic<data_type>
: the function’s primary argumentsInstructions | Description |
---|---|
ps | packed single-precision |
pd | packed double-precision |
epi8/epi16/epi32/epi64 | signed integers |
epu8/epu16/epu32/epu64 | unsigned integers |
si128/si256 | unspecified vector |
m128/m128i/m128d m256/m256i/m256d |
input vector types |
举例:_mm256_srlv_epi64
64-bit signed int -> 256-bit vector
#include <immintrin.h>
#include <stdio.h>
int main() {
/* Initialize the two argument vectors */
__m256 evens = _mm256_set_ps(2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0);
__m256 odds = _mm256_set_ps(1.0, 3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0);
/* Compute the difference between the two vectors */
__m256 result = _mm256_sub_ps(evens, odds);
/* Display the elements of the result vector */
float* f = (float*) &result; // type conversion
printf("%f %f %f %f %f %f %f %f\n",
f[0], f[1], f[2], f[3], f[4], f[5], f[6], f[7]);
return 0;
}
编译时加-mavx
或-mavx2
_mm256_setzero_ps
_mm256_set1_ps
_mm256_set_ps
: predefined values_mm256_setr_ps
: reversed order_mm256_load_ps
_mm256_maskload_ps(address, integer vector)
: mask 1 read, 0 setzeroaligned_alloc(32, 64 * sizeof(float))
: 32-byte boundary_mm256_add/sub/mul_ps
_mm256_and/cmpeq_ps
_mm256_hadd/hsub_ps
_mm256_mullo_epi32
编译指令-mfma
_mm_fmadd_ps
: res = a * b + c
_mm_fmadd_ss
: res[0] = a[0] * b[0] + c[0]
_mm256_permute_ps
: based on 8-bit control value_mm256_shuffle_ps
: first 2, second 2