RDMA: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
m (Admin moved page Verify that RDMA is working to RDMA)
No edit summary
Line 1: Line 1:
== Verify that RDMA is working ==
RDMA (Remote Direct Memory Access), which means remote direct memory access, is a network communication protocol that was first applied in the field of high-performance computing, and has gradually become popular in data centers.
how to verify that the RDMA stack is working properly




RDMA allows user programs to bypass the operating system kernel (CPU) and directly interact with the network card for network communication, thereby providing high bandwidth and extremely small latency.<ref>https://www.naddod.com/blog/rdma-high-speed-network-for-large-model-training</ref>


== Verify that RDMA kernel part is loaded ==
[[File:RDMA FLOW.png|center|frameless|557x557px]]
Check that the kernel part of the RDMA stack is working.


[root@localhost] # /etc/init.d/rdma status
== Types of RDMA ==


lsmod can show the loaded kernel modules.
# Infiniband is a network protocol tailored specifically for RDMA, which can ensure the reliability of data transmission from the hardware level. Although InfiniBand technical specifications and standard specifications were officially published in 2000, InfiniBand Architecture (IBA) has been widely used on cluster supercomputers after 2005. The biggest reason for the slow development is that Infiniband requires its own dedicated hardware from L2 to L4. The cost of the enterprise is very high.
# RoCE (RDMA over Converged Ethernet) is a relatively cheaper option, although it still cannot be considered inexpensive. RoCE primarily provides RDMA capabilities over Ethernet. In recent years, RoCE has witnessed rapid development as a substitute for InfiniBand due to its high cost. Currently, many vendors specializing in Ethernet interconnectivity are actively promoting RoCE. However, if one aims to achieve a lossless network with RoCE, it becomes challenging to keep the overall network cost below 50% of what it would be with InfiniBand.
# iWARP is a network protocol that allows RDMA to be executed on TCP. Its advantage is that it can run in today's standard TCP/IP network. RDMA can be used only by purchasing a network card that supports iWARP. For those with a slightly lower financial budget It is especially suitable for enterprises. But its disadvantage is that it is slightly worse than RoCE in performance. After all, you get what you pay for.
 
== GPUDirect RDMA ==
When training large-scale models in multi-node cluster, the cost of inter-node communication is significant.
 
The combination of InfiniBand and GPUs enables a feature called [[GPUDirect]] RDMA, which allows direct communication between GPUs across nodes without involving memory and CPU. In other words, the communication between GPUs of two nodes takes place directly through the InfiniBand network interface cards, bypassing the need to go through CPU and memory.
 
== References ==
<references />

Revision as of 09:17, 3 January 2024

RDMA (Remote Direct Memory Access), which means remote direct memory access, is a network communication protocol that was first applied in the field of high-performance computing, and has gradually become popular in data centers.


RDMA allows user programs to bypass the operating system kernel (CPU) and directly interact with the network card for network communication, thereby providing high bandwidth and extremely small latency.[1]

RDMA FLOW.png

Types of RDMA

  1. Infiniband is a network protocol tailored specifically for RDMA, which can ensure the reliability of data transmission from the hardware level. Although InfiniBand technical specifications and standard specifications were officially published in 2000, InfiniBand Architecture (IBA) has been widely used on cluster supercomputers after 2005. The biggest reason for the slow development is that Infiniband requires its own dedicated hardware from L2 to L4. The cost of the enterprise is very high.
  2. RoCE (RDMA over Converged Ethernet) is a relatively cheaper option, although it still cannot be considered inexpensive. RoCE primarily provides RDMA capabilities over Ethernet. In recent years, RoCE has witnessed rapid development as a substitute for InfiniBand due to its high cost. Currently, many vendors specializing in Ethernet interconnectivity are actively promoting RoCE. However, if one aims to achieve a lossless network with RoCE, it becomes challenging to keep the overall network cost below 50% of what it would be with InfiniBand.
  3. iWARP is a network protocol that allows RDMA to be executed on TCP. Its advantage is that it can run in today's standard TCP/IP network. RDMA can be used only by purchasing a network card that supports iWARP. For those with a slightly lower financial budget It is especially suitable for enterprises. But its disadvantage is that it is slightly worse than RoCE in performance. After all, you get what you pay for.

GPUDirect RDMA

When training large-scale models in multi-node cluster, the cost of inter-node communication is significant.

The combination of InfiniBand and GPUs enables a feature called GPUDirect RDMA, which allows direct communication between GPUs across nodes without involving memory and CPU. In other words, the communication between GPUs of two nodes takes place directly through the InfiniBand network interface cards, bypassing the need to go through CPU and memory.

References