Optimize TensorFlow
Jump to navigation
Jump to search
Do you wondering how much of a difference those instructions end up making on your machine between genera pip packages and custom optimization on the system?
Because pre-built pip packages do not enable all machine capable optimization flags in it or may not perfectly set for your machine, for example, GCC compiler optimization flgags like
gcc -O<number>
(O the letter, not the number).
- -O0: Turns off optimization entirely. Fast compile times, good for debugging. This is the default if you don't specify any which you probably don't want.
- -O1: Basic optimization level.
- -O2: Recommended for most things. SSE / AVX may be used, but not fully.
- -O3: Highest optimization possible. Also vectorizes loops, can use all AVX registers.
- -Os: Small size. Basically enables -O2 options which do not increase size. Can be useful for machines that have limited storage and/or CPUs with small cache sizes.
If you are building TensorFlow on the same machine that will be running it you can just use -O3 -march=native
. If you are building on a different machine, you can use the command below to see which flags GCC would set and then pass them when configuring TF.
# This is the output on my machine: $ gcc -march=native -E -v - </dev/null 2>&1 | grep cc1 /usr/libexec/gcc/x86_64-pc-linux-gnu/7.3.0/cc1 -E -quiet -v - -march=znver1 -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -msha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero -mno-pku -mno-rdpid --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=znver1