NVIDIA Mellanox MQM8790-HS2F in Action: Low-Latency Interconnect Optimization for RDMA/HPC/AI Clusters
April 10, 2026
A fast-growing AI research organization was facing a familiar pain point: their 200+ GPU cluster, used for large language model training and molecular dynamics simulations, was experiencing unpredictable job completion times. Despite powerful compute nodes, the existing 100Gb/s Ethernet fabric suffered from tail latency spikes, packet drops under incast patterns, and high CPU overhead due to traditional TCP/IP stack processing. The team needed a solution that could deliver consistent sub-microsecond latency, fully support RDMA for GPU Direct, and scale without forklift upgrades. After evaluating available options, they selected the 迈络思(NVIDIA Mellanox) MQM8790-HS2F as the core building block for their next-generation cluster fabric.
The organization deployed the MQM8790-HS2F InfiniBand switch in a two-tier fat-tree topology, connecting 128 compute nodes (each equipped with NVIDIA ConnectX-6 HDR adapters) and 4 storage nodes. With its 40 QSFP56 ports running at 200Gb/s HDR, a single NVIDIA Mellanox MQM8790-HS2F provided 16Tb/s of non-blocking switching capacity—enough to replace two legacy Ethernet switches while reducing cabling complexity. The deployment leveraged the MQM8790-HS2F 200Gb/s HDR 40-port QSFP56 native support for RDMA and GPUDirect, enabling direct memory access between GPUs across different servers without CPU intervention.
Key implementation details included:
- Adaptive routing to automatically balance traffic across multiple paths, eliminating hot spots.
- SHARPv3 (Scalable Hierarchical Aggregation and Reduction Protocol) for in-network reduction, accelerating All-Reduce operations by up to 2.5x.
- Congestion control at the switch level, preventing head-of-line blocking common in lossy Ethernet environments.
Before purchase, the engineering team reviewed the MQM8790-HS2F datasheet and MQM8790-HS2F specifications to confirm compatibility with their existing Mellanox cables and transceivers. The MQM8790-HS2F compatible ecosystem—including HDR optical and copper cables—allowed them to reuse 40% of their previous interconnect investments, significantly lowering the barrier to upgrade.
After migrating to the MQM8790-HS2F-based fabric, the organization documented three categories of improvements:
- Latency reduction: Average MPI ping-pong latency dropped from 2.1µs (Ethernet RoCE) to 0.82µs, with tail latency virtually eliminated.
- Job throughput: Distributed training jobs (NCCL-based) completed 37% faster due to reduced communication overhead and SHARPv3 acceleration.
- CPU offload: RDMA over InfiniBand reduced CPU utilization for networking from ~15% to under 2%, freeing cores for computation.
In a 128-GPU all-to-all communication benchmark, the MQM8790-HS2F InfiniBand switch solution sustained 198Gb/s per port with zero packet loss, compared to 112Gb/s with 1.2% loss on the previous Ethernet fabric. For financial simulations run by the same team, job variability was reduced by 78%, enabling tighter SLAs and predictable runtime.
This real-world deployment demonstrates that the MQM8790-HS2F is more than a specification sheet hero—it delivers tangible benefits for production HPC and AI workloads. The combination of 200Gb/s HDR throughput, 40 high-density ports, and advanced in-network computing transforms cluster economics by reducing both job completion time and operational overhead. For IT leaders evaluating MQM8790-HS2F price against performance gains, this case study suggests a sub‑12‑month ROI based on compute efficiency improvements alone.
As the organization plans to double its GPU count to 400+ nodes, they have already budgeted for additional MQM8790-HS2F for sale units to maintain a non-blocking fat-tree architecture. The switch’s ability to mix HDR and EDR speeds ensures a smooth migration path as older adapters are gradually replaced. For architects designing next-generation RDMA-centric clusters, the NVIDIA Mellanox MQM8790-HS2F offers a proven, production-ready backbone that scales from departmental AI research to exascale supercomputing.

