UnifabriX Uses CXL To Improve HPC Performance

CXL promises to remake the way computing systems are architected. It runs on PCIe and can extend the memory on individual CPUs, but its biggest promise is providing network-arbitrated memory pools that can allocate some higher latency memory as needed to CPUs or to software-defined virtual machines. CXL-based products begin to appear on the market in 2023.

CXL is looking to remake data centers but the benefits of higher latency memory for use in high-performance computing (HPC) applications weren’t obvious, at least until UnifabriX demonstrated bandwidth and capacity advantages with its CXL-based smart memory node at the Super 2022. Computing Conference ( SC22). There is a recently released video showing UnifabriX demonstrations for memory and storage HPC applications showing HPC advantages.

UnifabriX says the product is based on its Resource Processing Unit (RPU). The RPU is built into its CXL Smart Memory Node, shown below. This is a 2U mountable server with operable EDSFF E3 media. The product contains up to 64TB capacity in DDR5/DDR4 memory and NVMe SSDs.

The company says the product is CXL 1.1 and 2.0 compliant and runs on PCIe Gen5. They also say it’s CXL 3.0 ready and supports both PCIe Gen5 and CXL expansion. It also supports NVMe SSD access via CXL (SSD CXL over Memory). The product is intended for use in bare metal and virtualized environments in a wide range of applications, including HPC, AI and databases.

As with other CXL products, the memory node offers expanded memory, but it can also provide higher performance. In particular, at the 2022 Super Computer Conference (SC22) the memory node was used to run an HPCG performance benchmark against the benchmark without help from the memory node. The results are shown below.

For the conventional HPCG benchmark, as the number of CPU cores processing the benchmark increases, initially the performance increases approximately linearly with the number of processor cores. However, at around 50 CPU cores the performance flattens out without any performance improvements as the number of cores increases. When you reach 100 cores available, only 50 cores are used. This is because there is no additional memory bandwidth available.

If the memory node is added to provide additional CXL memory in addition to the memory directly connected to the CPU cores, we see that performance scaling with cores can continue. The memory node improves overall HPCG performance by moving lower priority data from the CPU near memory to the CXL remote memory. This prevents saturating the nearby memory and allows continuous scaling of performance with additional processor cores. As shown above, the memory node improved HPCG benchmark performance by more than 26%.

The company worked closely with Intel on its CXL solution and Intel mentions these results as well as 3 othersrd partial testing in its recent product report on it Infrastructure Processing Units (IPUs) (Intel Agilex FPGA Accelerators Bring Improved TCO, Performance and Flexibility to 4).th Gen Intel Xeon Platforms).

In addition to providing memory capacity and bandwidth improvements, the memory node can also provide NVMe SSD access via CXL as well. The company says their plans are to include memory, storage and networking via the CXL/PCIe interface, hence the name unifabriX. With networking also included, their boxes could replace top of rack (TOR) solutions as well as provide memory and storage access.

The UnifabriX memory node, utilizing the company’s Resource Processing Unit, provides a way to overcome direct link DRAM bandwidth limitations in HPC applications using shared CXL memory.

Source

Also Read :  Avinash Misra, CEO & Co-Founder of Skan.AI - Interview Series

Leave a Reply

Your email address will not be published.

Related Articles

Back to top button