Transformer Engine

From HPCWIKI
Jump to navigation Jump to search

What is (TE) Transfermer Engine

Most deep learning frameworks train with FP32 by default. This is not essential, however, to achieve full accuracy for many deep learning models. Using mixed-precision training, which combines single-precision (FP32) with lower precision (FP16) format when training a model, results in significant speedups with minimal differences in accuracy as compared to FP32 training. With Hopper and Ada GPU architecture FP8 precision was introduced, which offers improved performance over FP16 with no degradation in accuracy. Although all major deep learning frameworks support FP16, FP8 support is not available natively in frameworks today.


NVIDIA® Transformer Engine is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.[1]


TE addresses the problem of FP8 support by providing APIs, a Python API consisting of modules to easily build a Transformer layer as well as a framework-agnostic library in C++ including structs and kernels needed for FP8 training, greatly simplifying mixed precision training for users.

TE is available at open source under Apache-2.0 license, TE supports

  • Support for FP8 on NVIDIA Hopper and NVIDIA Ada GPUs
  • Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later
  • Optimizations (e.g. fused kernels) for Transformer models
  • Easy-to-use modules for building Transformer layers with FP8 support

Transformer Engine has been integrated with popular LLM frameworks such as:[2]

  • DeepSpeed
  • Hugging Face Accelerate
  • Lightning
  • MosaicML Composer
  • NVIDIA JAX Toolbox
  • NVIDIA Megatron-LM
  • NVIDIA NeMo Framework
  • Amazon SageMaker Model Parallel Library
  • Levanter
  • Hugging Face Nanotron - Coming soon!
  • Colossal-AI - Coming soon!
  • PeriFlow - Coming soon!
  • GPT-NeoX - Coming soon!

TE introduction videos

References