CPU features: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
(새 문서: ===References=== <references/>)
 
No edit summary
Line 1: Line 1:
=== Compiler CPU flags ===
customize the TensorFlow source build to take advantage of the availability of some CPU features that contribute to a speedier execution of TensorFlow code
Available CPU flags on target system can be found with following command and Linux [https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/include/asm/cpufeatures.h kernel source helps] unravel the meaning for each flag,
<code>$ more /proc/cpuinfo | grep flags</code>
{| class="wikitable"
! colspan="1" rowspan="1" |No
! colspan="1" rowspan="1" |Flag
! colspan="1" rowspan="1" |CPU Feature
! colspan="1" rowspan="1" |Additional Info
|-
|1
|ssse3
|Supplemental Streaming SIMD Extensions 3 (SSSE-3) instruction set
|
|-
|2
|sse4_1
|Streaming SIMD Extensions 4.1 (SSE-4.1) instruction set
|
|-
|3
|sse4_2
|Streaming SIDM Extensions 4.2 (SSE-4.2) instruction set
|
|-
|4
|fma
|Fused multiply-add (FMA) instruction set
|
|-
|5
|cx16
|CMPXCHG16B instruction (double-width compare-and-swap)
|
|-
|6
|popcnt
|Population count instruction (count number of bits set to 1)
|
|-
|7
|avx
|Advanced Vector Extensions
|
|-
|8
|avx2
|Advanced Vector Extension 2
|
|}
===References===
===References===
<references/>
<references/>

Revision as of 08:42, 24 March 2023

Compiler CPU flags

customize the TensorFlow source build to take advantage of the availability of some CPU features that contribute to a speedier execution of TensorFlow code

Available CPU flags on target system can be found with following command and Linux kernel source helps unravel the meaning for each flag,

$ more /proc/cpuinfo | grep flags

No Flag CPU Feature Additional Info
1 ssse3 Supplemental Streaming SIMD Extensions 3 (SSSE-3) instruction set
2 sse4_1 Streaming SIMD Extensions 4.1 (SSE-4.1) instruction set
3 sse4_2 Streaming SIDM Extensions 4.2 (SSE-4.2) instruction set
4 fma Fused multiply-add (FMA) instruction set
5 cx16 CMPXCHG16B instruction (double-width compare-and-swap)
6 popcnt Population count instruction (count number of bits set to 1)
7 avx Advanced Vector Extensions
8 avx2 Advanced Vector Extension 2

References