Transformer Engine
What is (TE) Transfermer Engine
Most deep learning frameworks train with FP32 by default. This is not essential, however, to achieve full accuracy for many deep learning models. Using mixed-precision training, which combines single-precision (FP32) with lower precision (FP16) format when training a model, results in significant speedups with minimal differences in accuracy as compared to FP32 training. With Hopper and Ada GPU architecture FP8 precision was introduced, which offers improved performance over FP16 with no degradation in accuracy. Although all major deep learning frameworks support FP16, FP8 support is not available natively in frameworks today.
NVIDIA® Transformer Engine is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.[1]
TE addresses the problem of FP8 support by providing APIs, a Python API consisting of modules to easily build a Transformer layer as well as a framework-agnostic library in C++ including structs and kernels needed for FP8 training, greatly simplifying mixed precision training for users.
TE is available at open source