CPU features: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 2: Line 2:
customize the TensorFlow source build to take advantage of the availability of some CPU features that contribute to a speedier execution of TensorFlow code
customize the TensorFlow source build to take advantage of the availability of some CPU features that contribute to a speedier execution of TensorFlow code


Available CPU flags on target system can be found with following command and Linux [https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/include/asm/cpufeatures.h kernel source helps] unravel the meaning for each flag,
 
Available CPU flags on target system can be found with following command and Linux [https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/include/asm/cpufeatures.h kernel source helps] unravel the meaning for each flag


<code>$ more /proc/cpuinfo | grep flags</code>
<code>$ more /proc/cpuinfo | grep flags</code>
optimization flags will be supplied when configuring the TensorFlow source build. The following command is used to populate the optimization flags:
<code>$grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }</code>
and followings are an output example on AMD EPYC Process
-march=native -mssse3 -mfma -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2


{| class="wikitable"
{| class="wikitable"

Revision as of 08:46, 24 March 2023

Compiler CPU flags

customize the TensorFlow source build to take advantage of the availability of some CPU features that contribute to a speedier execution of TensorFlow code


Available CPU flags on target system can be found with following command and Linux kernel source helps unravel the meaning for each flag

$ more /proc/cpuinfo | grep flags


optimization flags will be supplied when configuring the TensorFlow source build. The following command is used to populate the optimization flags:

$grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }


and followings are an output example on AMD EPYC Process

-march=native -mssse3 -mfma -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2


No Flag CPU Feature Additional Info
1 ssse3 Supplemental Streaming SIMD Extensions 3 (SSSE-3) instruction set
2 sse4_1 Streaming SIMD Extensions 4.1 (SSE-4.1) instruction set
3 sse4_2 Streaming SIDM Extensions 4.2 (SSE-4.2) instruction set
4 fma Fused multiply-add (FMA) instruction set
5 cx16 CMPXCHG16B instruction (double-width compare-and-swap)
6 popcnt Population count instruction (count number of bits set to 1)
7 avx Advanced Vector Extensions
8 avx2 Advanced Vector Extension 2

References