CPU features: Difference between revisions

Latest revision as of 09:43, 29 October 2023

Compiler CPU flags

customize the TensorFlow source build to take advantage of the availability of some CPU features that contribute to a speedier execution of TensorFlow code

Available CPU flags on target system can be found with following command and Linux kernel source helps unravel the meaning for each flag

$ cat /proc/cpuinfo | grep flags
...
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

Optimization flags will be supplied when configuring the TensorFlow source build. The following command is used to populate the optimization flags:

$grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }

and followings are an output example on AMD EPYC Process

-march=native -mssse3 -mfma -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2

No	Flag	CPU Feature
1	ssse3	Supplemental Streaming SIMD Extensions 3 (SSSE-3) instruction set
2	sse4_1	Streaming SIMD Extensions 4.1 (SSE-4.1) instruction set
3	sse4_2	Streaming SIDM Extensions 4.2 (SSE-4.2) instruction set
4	fma	Fused multiply-add (FMA) instruction set
5	cx16	CMPXCHG16B instruction (double-width compare-and-swap)
6	popcnt	Population count instruction (count number of bits set to 1)
7	avx	Advanced Vector Extensions
8	avx2	Advanced Vector Extension 2

@@ Line 1: / Line 1: @@
-===References===
+=== Compiler CPU flags ===
+customize the TensorFlow source build to take advantage of the availability of some CPU features that contribute to a speedier execution of TensorFlow code
+Available CPU flags on target system can be found with following command and [[Linux]] [https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/include/asm/cpufeatures.h kernel source helps] unravel the meaning for each flag<syntaxhighlight lang="bash">
+$ cat /proc/cpuinfo | grep flags
+...
+flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
+</syntaxhighlight>
+[[Optimization]] flags will be supplied when configuring the TensorFlow source build. The following command is used to populate the optimization flags:
+ <code>$grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }</code>
+and followings are an output example on AMD EPYC Process
+ -march=native -mssse3 -mfma -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2
+{| class="wikitable"
+! colspan="1" rowspan="1" |No
+! colspan="1" rowspan="1" |Flag
+! colspan="1" rowspan="1" |CPU Feature
+! colspan="1" rowspan="1" |Additional Info
+|-
+|1
+|ssse3
+|Supplemental Streaming SIMD Extensions 3 (SSSE-3) instruction set
+|
+|-
+|2
+|sse4_1
+|Streaming SIMD Extensions 4.1 (SSE-4.1) instruction set
+|
+|-
+|3
+|sse4_2
+|Streaming SIDM Extensions 4.2 (SSE-4.2) instruction set
+|
+|-
+|4
+|fma
+|Fused multiply-add (FMA) instruction set
+|
+|-
+|5
+|cx16
+|CMPXCHG16B instruction (double-width compare-and-swap)
+|
+|-
+|6
+|popcnt
+|Population count instruction (count number of bits set to 1)
+|
+|-
+|7
+|avx
+|Advanced Vector Extensions
+|
+|-
+|8
+|avx2
+|Advanced Vector Extension 2
+|
+|}
+== References ==
 <references/>

CPU features: Difference between revisions

Latest revision as of 09:43, 29 October 2023

Compiler CPU flags

References

Navigation menu

Search