Optimize TensorFlow

From HPCWIKI
Revision as of 11:50, 29 October 2023 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Optimizing target CPU flags

Optimize TensorFlow to CPU features by turning on all the computation optimization opportunities provided by the CPU.

Compiler optimization flags

Do you wondering how much of a difference those instructions end up making on your machine between genera pip packages and custom optimization on the system?

Because pre-built pip packages do not enable all machine capable optimization flags in it or may not perfectly set for your machine, for example, GCC compiler optimization flgags like

gcc -O<number> (O the letter, not the number).

  • -O0: Turns off optimization entirely. Fast compile times, good for debugging. This is the default if you don't specify any which you probably don't want.
  • -O1: Basic optimization level.
  • -O2: Recommended for most things. SSE / AVX may be used, but not fully.
  • -O3: Highest optimization possible. Also vectorizes loops, can use all AVX registers.
  • -Os: Small size. Basically enables -O2 options which do not increase size. Can be useful for machines that have limited storage and/or CPUs with small cache sizes.

If you are building TensorFlow on the same machine that will be running it you can just use -O3 -march=native. If you are building on a different machine, you can use the command below to see which flags GCC would set and then pass them when configuring TF.

# This is the output on my machine:
$ gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
/usr/libexec/gcc/x86_64-pc-linux-gnu/7.3.0/cc1 -E -quiet -v - -march=znver1
-mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe
-maes -msha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi
-mno-sgx -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm
-mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave
-mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf
-mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves -mno-avx512dq -mno-avx512bw
-mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps
-mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero -mno-pku -mno-rdpid --param
l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=512
-mtune=znver1

References