HPC System Setting

From HPCWIKI
Jump to navigation Jump to search

AMD System Setting

systems settings that are required to configure the system for AMD Instinct™ accelerators and to improve the system to obtain optimal performance of the GPUs.[1]

“High Performance Computing (HPC) Tuning Guide for AMD EPYC™ Processors”[2]

•        Core C states

•        AMD-PCI-UTIL (on AMD EPYC™ 7002 series processors)

•        IOMMU (if needed)


Recommended settings for the system BIOS

BIOS Setting Location Parameter Value Comments
Advanced

  |- PCI Subsystem Settings


Above 4G Decoding


Enabled

GPU Large BAR Support
  |- AMD CBS

  |- CPU Common Options

  |- Performance

Global C-state Control

CCD/Core/Thread Enablement

  |- SMT Control

Auto

Accept

Disable

Global Core C-States
  |- DF Common Options

  |- Memory Addressing

NUMA nodes per socket

Memory interleaving

NPS1,2,4

Auto

NUMA Nodes (NPS)
  |- Link 4-link xGMI max speed

3-link xGMI max speed

18Gbps

18Gbps

Set AMD CPU xGMI speed to highest rate supported
  |- NBIO Common Options IOMMU

PCIe Ten Bit Tag Support

Preferred IO


Preferred IO Bus

Enhanced Preferred IO Mode

Disabled

Enable

Manual

“Use lspci to find pci device id”

Enable

  |- SMU Common Options Determinism Control

Determinism Slider

cTDP Control

cTDP

Package Power Limit Control

Package Power Limit

xGMI Link Width Control

xGMI Force Link Width

xGMI Force Link Width Control

APBDIS

DF Cstates

Fixed SOC Pstate

Manual

Power

Manual

240

Manual

240

Manual

2

Force

1

Auto

P0

Set cTDP to 240W


Set Package Power Limit to 240W


Set AMD CPU xGMI width to 16 bits

  |- UMC Common Options

  |- DDR4 Common

Options

  |-Enforce POR

  |- DRAM Controller Configuration

  |- DRAM Power Options



|- Overclock

|- Memory Clock Speed


Power Down Enable


Accept

Enabled

1600MHz


Disabled


Set to max Memory Speed, if using 3200MHz DIMM’s: 1600MHz.

RAM Power Down

  |- Security TSME Disabled Memory Encryption

NBIO Link Clock Frequency

The NBIOs (4x per AMD EPYC™ processor) are the serializers/deserializers (also known as “SerDes”) that convert and prepare the I/O signals for the processor’s 128 external I/O interface lanes (32 per NBIO).

LCLK (short for link clock frequency) controls the link speed of the internal bus that connects the NBIO silicon with the data fabric. All data between the processor and its PCIe lanes flow to the data fabric based on these LCLK frequency settings. The link clock frequency of the NBIO components need to be forced to the maximum frequency for optimal PCIe performance.

For AMD EPYC™ 7002 series processors, this setting cannot be modified via configuration options in the server BIOS alone. Instead, the AMD-IOPM-UTIL (see Section 3.2.3) must be run at every server boot to disable Dynamic Power Management for all PCIe Root Complexes and NBIOs within the system and to lock the logic into the highest performance operational mode.

For AMD EPYC™ 7003 series processors, configuring all NBIOs to be in “Enhanced Preferred I/O” mode is sufficient to enable highest link clock frequency for the NBIO components.

Memory Configuration

For the memory addressing modes, especially the number of NUMA nodes per socket/processor (NPS), the recommended setting is to follow the guidance of the “High Performance Computing (HPC) Tuning Guide for AMD EPYC™ Processors” [3] to provide the optimal configuration for host side computation.

If the system is set to one NUMA domain per socket/processor (NPS1), bidirectional copy bandwidth between host memory and GPU memory may be slightly higher (up to about 16% more) than with four NUMA domains per socket/processor (NPS4). For memory bandwidth sensitive applications using MPI, NPS4 is recommended. For applications that are not optimized for NUMA locality, NPS1 is the recommended setting.

Operating System Settings

CPU Core State - ‘C States’

There are several Core-States, or C-states that an AMD EPYC CPU can idle within:

•        C0: active. This is the active state while running an application.

•        C1: idle

•        C2: idle and power gated. This is a deeper sleep state and will have a greater latency when moving back to the C0 state, compared to when the CPU is coming out of C1.


Disabling C2 is important for running with a high performance, low-latency network. To disable power-gating on all cores run the following on Linux systems:

$ cpupower idle-set -d 2 

Note that the cpupower tool must be installed, as it is not part of the base packages of most Linux® distributions. The package needed varies with the respective Linux distribution.

For Ubuntu

$ sudo apt install linux-tools-common 

AMD-IOPM-UTIL

This section applies to AMD EPYC™ 7002 processors to optimize advanced Dynamic Power Management (DPM) in the I/O logic (see NBIO description above) for performance. Certain I/O workloads may benefit from disabling this power management. This utility disables DPM for all PCI-e root complexes in the system and locks the logic into the highest performance operational mode.

Disabling I/O DPM will reduce the latency and/or improve the throughput of low-bandwidth messages for PCI-e InfiniBand NICs and GPUs. Other workloads with low-bandwidth bursty PCI-e I/O characteristics may benefit as well if multiple such PCI-e devices are installed in the system.

The actions of the utility do not persist across reboots. There is no need to change any existing firmware settings when using this utility. The “Preferred I/O” and “Enhanced Preferred I/O” settings should remain unchanged at enabled.

The recommended method to use the utility is either to create a system start-up script, for example, a one-shot systemd service unit, or run the utility when starting up a job scheduler on the system. The installer packages (see Power Management Utility at https://developer.amd.com/iopm-utility/) will create and enable a systemd service unit for you. This service unit is configured to run in one-shot mode. This means that even when the service unit runs as expected, the status of the service unit will show inactive. This is the expected behavior when the utility runs normally. If the service unit shows failed, the utility did not run as expected. The output in either case can be shown with the systemctl status command.

Stopping the service unit has no effect since the utility does not leave anything running. To undo the effects of the utility, disable the service unit with the systemctl disable command and reboot the system.

The utility does not have any command-line options, and it must be run with super-user permissions.

Systems with 256 CPU Threads - IOMMU Configuration

For systems that have 256 logical CPU cores or more (e.g., 64-core AMD EPYC™ 7763 in a dual-socket configuration and SMT enabled), setting the IOMMU configuration to “disabled” can limit the number of available logical cores to 255. The reason is that the Linux® kernel disables X2APIC in this case and falls back to APIC, which can only enumerate a maximum of 255 (logical) cores.

If SMT is enabled by setting “CCD/Core/Thread Enablement > SMT Control” to “enable”, the following steps can be applied to the system to enable all (logical) cores of the system:

•        In the server BIOS, set IOMMU to “Enabled”.

•        When configuring the Grub boot loader, add the following arguments for the Linux kernel:

amd_iommu=on iommu=pt

•        Update Grub to use the modified configuration:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg


•        Reboot the system.

•        Verify IOMMU passthrough mode by inspecting the kernel log via dmesg:

References