Dispatching SIMD instructions + SIMDPP + qmake

simd example
simd architecture
simd applications
simd processor
simd instructions intel
simd instructions c++
simd architecture diagram
simd tutorial

I'm developing a QT widget that makes use of SIMD instruction sets. I've compiled 3 versions: SSE3, AVX, and AVX2(simdpp allows to switch between them by a single #define).

Now, what I want is for my widget to switch automatically between these implementations, according to best supported instruction set. Guide that is provided with simdpp makes use of some makefile magic:

CXXFLAGS=""

test: main.o test_sse2.o test_sse3.o test_sse4_1.o test_null.o
    g++ $^ -o test

main.o: main.cc
    g++ main.cc $(CXXFLAGS) -c -o main.o

test_null.o: test.cc
    g++ test.cc -c $(CXXFLAGS) -DSIMDPP_EMIT_DISPATCHER \
        -DSIMDPP_DISPATCH_ARCH1=SIMDPP_ARCH_X86_SSE2 \
        -DSIMDPP_DISPATCH_ARCH2=SIMDPP_ARCH_X86_SSE3 \
        -DSIMDPP_DISPATCH_ARCH3=SIMDPP_ARCH_X86_SSE4_1 -o test_null.o

test_sse2.o: test.cc
    g++ test.cc -c $(CXXFLAGS) -DSIMDPP_ARCH_X86_SSE2 -msse2 -o test_sse2.o

test_sse3.o: test.cc
    g++ test.cc -c $(CXXFLAGS) -DSIMDPP_ARCH_X86_SSE3 -msse3 -o test_sse3.o

test_sse4_1.o: test.cc
    g++ test.cc -c $(CXXFLAGS) -DSIMDPP_ARCH_X86_SSE4_1 -msse4.1 -o test_sse4_1.o

Here is a link to the guide: http://p12tic.github.io/libsimdpp/v2.0~rc2/libsimdpp/arch/dispatch.html

I have no idea how to implement such behavior with qmake. Any ideas?

First that comes to mind is to create a shared library with dispatched code, and link it to the project. Here I'm stuck again. App is cross-platform, which means it has to compile with both GCC and MSVC(vc120, to be exact), which forces using nmake in Windows, and I tried, really, but it was like the worst experience in my whole programmer life.

Thanks in advance, programmers of the world!

sorry if this is a bit late. Hope I can still help.

You need to consider 2 areas: Compile time and run time.

Compile time - need to create code to support different features. Run time - need to create code to decide which features you can run.

What you are wanting to do is create a dispatcher...

FuncImpl.h:

#pragma once
void execAvx2();
void execAvx();
void execSse();
void execDefault();

FuncImpl.cpp:

// Compile this file once for each variant with different compiler settings.
#if defined(__AVX2__)
void execAvx2()
{
 // AVX2 impl
...
}

#elif defined (__AVX__)

void execAvx()
{
// AVX impl
...
}

#elif defined (__SSE4_2__)

void execSse()
{
 // Sse impl
...
}

#else

void execDefault()
{
 // Vanilla impl
...
}

#endif

DispatchFunc.cpp

#include "FuncImpl.h"

// Decide at runtime which code to run
void dispatchFunc()
{
     if(CheckCpuAvx2Flag())
     {
         execAvx2();
     } 
     else if(CheckCpuAvxFlag())
     {
         execAvx();
     }
     else if(CheckCpuSseFlags())
     {
         execSse();
     }
     else
     {
         execDefault();
     }
}

What you can do is create a set of QMAKE_EXTRA_COMPILERS.

SampleCompiler.pri (Do this for each variant):

MyCompiler.name = MyCompiler         # Name
MyCompiler.input = MY_SOURCES        # Symbol of the source list to compile
MyCompiler.dependency_type = TYPE_C
MyCompiler.variable_out = OBJECTS
# EXTRA_CXXFLAGS = -mavx / -mavx2 / -msse4.2
# _var = creates FileName_var.o => replace with own variant (_sse, etc)  
MyCompiler.output = ${QMAKE_VAR_OBJECTS_DIR}${QMAKE_FILE_IN_BASE}_var$${first(QMAKE_EXT_OBJ)}
MyCompiler.commands = $${QMAKE_CXX} $(CXXFLAGS) $${EXTRA_CXXFLAGS} $(INCPATH) -c ${QMAKE_FILE_IN} -o${QMAKE_FILE_OUT}
QMAKE_EXTRA_COMPILERS += MyCompiler   # Add my compiler

MyProject.pro

...
include(SseCompiler.pri)
include(AvxCompiler.pri)
include(Avx2Compiler.pri)
..

# Normal sources
# Will create FuncImpl.o and DispatchFunc.o
SOURCES += FuncImpl.cpp \
           DispatchFunc.cpp

# Give the other compilers their sources
# Will create FuncImpl_avx2.o FuncImpl_avx.o FuncImpl_sse.o
AVX2_SOURCES += FuncImpl.cpp
AVX_SOURCES += FuncImpl.cpp
SSE_SOURCES += FuncImpl.cpp

# Link all objects
...

All you need now is to call dispatchFunc()!

Checking cpu flags is another exercise for you: cpuid

Dispatching SIMD instructions + SIMDPP + qmake, sorry if this is a bit late. Hope I can still help. You need to consider 2 areas: Compile time and run time. Compile time - need to create code to support different� To force the dispatcher to use a specific SIMD instruction set (e.g., you want to always use the SSE2 instruction set) you can call ippInitCpu(), regardless of the actual processor type detected. Use ippInitCpu()as a replacement for ippInit(), since a subsequent call to ippInit()will undo your call to ippInitCpu().

These are just project defines. You set them with DEFINES += in your .pro file.You set the flags for the instructions sets you want to support and simdpp takes care of selecting the best one for the processor at runtime.

See for example, Add a define to qmake WITH a value?

Single Instruction Multiple Data, Single Instruction, Multiple Data (SIMD) units refer to hardware components that SIMD execution is that relative to ALU work, the amount of scheduling and the � OpenMP SIMD in Visual C++. OpenMP SIMD, introduced in the OpenMP 4.0 standard, targets making vector-friendly loops. By using the simd directive before a loop, the compiler can ignore vector dependencies, make the loop as vector-friendly as possible, and respect the users’ intention to have multiple loop iterations executed simultaneously.

Here is a qmake .pro file for use with SIMD dispatchers. It is quite verbose, so for more instruction sets, it is better to generate the dispatched blocks by a script, write it to a .pri file and then include it from your main .pro file.

TEMPLATE = app
TARGET = simd_test
INCLUDEPATH += .

QMAKE_CXXFLAGS = -O3 -std=c++17

SOURCES += main.cpp

SOURCES_dispatch = test.cpp
{
    # SSE2
    DISPATCH_CXXFLAGS = -msse2
    DISPATCH_SUFFIX = _sse2

    src_dispatch_sse2.name = src_dispatch_sse2
    src_dispatch_sse2.input = SOURCES_dispatch
    src_dispatch_sse2.dependency_type = TYPE_C
    src_dispatch_sse2.variable_out = OBJECTS
    src_dispatch_sse2.output = ${QMAKE_VAR_OBJECTS_DIR}${QMAKE_FILE_IN_BASE}$${DISPATCH_SUFFIX}$${first(QMAKE_EXT_OBJ)}
    src_dispatch_sse2.commands = $${QMAKE_CXX} $(CXXFLAGS) $${DISPATCH_CXXFLAGS} $(INCPATH) -c ${QMAKE_FILE_IN} -o ${QMAKE_FILE_OUT}
    QMAKE_EXTRA_COMPILERS += src_dispatch_sse2
}
{
    # SSE3
    DISPATCH_CXXFLAGS = -msse3
    DISPATCH_SUFFIX = _sse3

    src_dispatch_sse3.name = src_dispatch_sse3
    src_dispatch_sse3.input = SOURCES_dispatch
    src_dispatch_sse3.dependency_type = TYPE_C
    src_dispatch_sse3.variable_out = OBJECTS
    src_dispatch_sse3.output = ${QMAKE_VAR_OBJECTS_DIR}${QMAKE_FILE_IN_BASE}$${DISPATCH_SUFFIX}$${first(QMAKE_EXT_OBJ)}
    src_dispatch_sse3.commands = $${QMAKE_CXX} $(CXXFLAGS) $${DISPATCH_CXXFLAGS} $(INCPATH) -c ${QMAKE_FILE_IN} -o ${QMAKE_FILE_OUT}
    QMAKE_EXTRA_COMPILERS += src_dispatch_sse3
}
{
    # SSE41
    DISPATCH_CXXFLAGS = -msse4.1
    DISPATCH_SUFFIX = _sse41

    src_dispatch_sse41.name = src_dispatch_sse41
    src_dispatch_sse41.input = SOURCES_dispatch
    src_dispatch_sse41.dependency_type = TYPE_C
    src_dispatch_sse41.variable_out = OBJECTS
    src_dispatch_sse41.output = ${QMAKE_VAR_OBJECTS_DIR}${QMAKE_FILE_IN_BASE}$${DISPATCH_SUFFIX}$${first(QMAKE_EXT_OBJ)}
    src_dispatch_sse41.commands = $${QMAKE_CXX} $(CXXFLAGS) $${DISPATCH_CXXFLAGS} $(INCPATH) -c ${QMAKE_FILE_IN} -o ${QMAKE_FILE_OUT}
    QMAKE_EXTRA_COMPILERS += src_dispatch_sse41
}

google/highway: Performance-portable SIMD with runtime , Performance-portable SIMD with runtime dispatch. For example, pre-SSE4.1 CPUs are increasingly rare and the AVX instruction set is limited to floating-point � P8/Y8 Internal Run-Time Dispatcher. Within the 32-bit 'p8' and equivalent 64-bit 'y8' architectures there is an additional "run-time" dispatching mechanism, a kind of mini-dispatcher. The Nehalem (Intel ® Core ™ i7) and Westmere processor families add additional SIMD instructions beyond those defined by SSE4.1. The Nehalem processor family

Computer Organization, Design, and Architecture, Fifth Edition, Four decoders to decode up to five instructions per cycle. • Macrofusion and Three issue ports available to dispatching SIMD instructions for execution. support function enables you to dispatch to an ISA-specific code path of your choice. For example, you can run the Intel® Advanced Vector Extensions (Intel® AVX) code path on an Intel processor based on Intel® Advanced Vector Extensions 2 (Intel® AVX2), or you can run the Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) code path on an Intel AVX-enabled Intel processor.

[PDF] Data-Level Parallelism in Vector, SIMD, and GPU Architectures, We will get twice as wide SIMD instructions every 4 years. ○. Exponential growth Thread scheduler uses scoreboard to dispatch ready threads. – No data� Other instructions in Intel AVX-512CDI allow for efficient manipulation of the comparison results. We can use VPCONFLICT in different ways to help us vectorize loops. The simplest is to check if there are any duplicate indices in a given simd register. If not, we can safely use simd instructions to compute all elements simultaneously. If so, we

A bug found in Glibc limits modern SIMD instructions to only Intel, The dl_platform check is used for dispatching SIMD (Single instruction, multiple data) libraries. Explaining the bug in detail, Wang writes, that in� The MKL_ENABLE_INSTRUCTIONS environment variable or the mkl_enable_instructions support function enables you to dispatch to an ISA-specific code path of your choice. For example, you can run the Intel® Advanced Vector Extensions (Intel® AVX) code path on an Intel processor based on Intel® Advanced Vector Extensions 2 (Intel® AVX2), or you can run the Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) code path on an Intel AVX-enabled Intel processor.

Comments
  • You can't build the same file more than once with qmake. You'll need to find some way around that (split the ifdef'd code across multiple files, etc.)
  • The idea that is expressed in makefile is based on building multiple object files based on a single .cpp file, linking them all with changed names, and then using right ones when necessary. This is what i don't know how oto do. Defining a symbol is not the problem.
  • I see what you mean now and I think peppe is right that there isn't a way to do that easily with Qt. I think you could create a different .pro file for each version of the file you want to build. You'd be making different libraries (TEMPLATE = lib) and then linking all of the library versions to your application. You'd probably have to make the #defines change the namespace of the functions/classes too, otherwise you'd have multiple versions of the same functions which wouldn't work. Maybe that's done already in simdpp?