CXL Gets Off the Drawing Board

//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

Despite hitting its third iteration, workloads that utilize the Compute Express Link (CXL) protocol are only now starting to become production ready.

Following the release of CXL 3.0 in August 2022, there’s been a slew of product announcements since its original inception with 2.0 compatibility. However, it’s early days when it comes to actual deployment. 

In an exclusive interview with EE Times, Sanketh Srinivas, a technical staff engineer for product marketing for Microchip Technology’s data center solutions business unit, said that the company is currently sampling its 2.0 products with customers and aims to enter production later this year, which is in line with other industry players. 

“Most of the other companies and competitors are working on 2.0 today with silicon, and they’re also in development with 3.0 devices, which enables additional functionality,” Srinivas said.

The additional capacity and bandwidth of CXL 2.0 will help process the high volumes of data that come with AI and machine learning (ML) workloads, he added, by reducing latencies associated with storage because datasets won’t have to move back and forth. “CXL provides you the ability to expand your memory and have the whole data set in the memory.”

He said the fetch time for data is cut down significantly because an AI engine, ML engine or even the CPUs can have low latency access to this data in the memory.

Microchip’s CXL-ready SSMC 2000 controller has been implemented into a memory module developed by Micron Technology that not only allows for capacity as high as a terabyte, but also enables it to handle an entire database—which Srinivas asserted will dramatically improve access and performance. 

However, CXL has suffered from a “chicken and egg” problem, according to Srinivas. Last year, there was a great deal of interest from hyperscalers and integrators, but there were no CXL devices ready.. Now that products are sampling, he said, they can work on applications and see benefits. “Now we are seeing practical implementation and how CXL technology can really help provide that additional memory capacity and bandwidth.”

CXL 2.0 platforms likely for 2024

Ryan Baxter, senior director for Micron’s compute and networking business unit, said it’s still early in the game for CXL—we’re not past the national anthem, let alone in the second or third inning. Aside from the collaboration with Microchip, Micron’s recent offerings include a CXL 2.0-compliant memory expansion module. He said the customer feedback so far is that the company’s CXL solutions are straightforward to implement.

Micron Technology’s its recent CXL offerings include a 2.0-compliant memory expansion module. (Source: Micron)

Customers also like the latency they’re seeing, Baxter said, even if its only a few dozen nanoseconds of difference, not thousands. “We’re pretty excited about where customers are taking this.”

He noted there were some CXL-capable development boards on display at the 2023 Flash Memory Summit in August, while Micron is up and running with actual workloads. “That all being said, it’s still very, very early days.” 

Baxter said expects some customers will be able to launch platforms with CXL 2.0 capabilities in 2024. “We’re just now seeing hardware with CXL capability.”

Aside from CXL products, Micron has launched an enablement program that acts as a one-stop-shop for anyone who’s thinking about using CXL, with guidance on thermal models, performance models and actual engineering to help them bring up Micron CXL devices on their systems. 

In terms of actual workloads, Baxter said CXL is showing good metrics when used for in-memory databases and online transactions. “It gives you some level of indication that these are the actual real world use cases that could benefit from CXL.” 

He explained that anything that can benefit from more memory capacity and bandwidth, like AI training and inference, could benefit from CXL. “AI is a bandwidth monster,” he said.

Hyperscalers are early CXL adopters

Mark Orthodoxou, a VP for Rambus overseeing its data center products, is seeing a similar adoption curve for CXL, even in the case of workloads with many products already announced—including the company’s retimers, which will go into production systems later this year. “What you have to remember is that the OEMs are building these servers themselves,” he said. “They have early versions of stuff that they’re experimenting with.”

Orthodoxou added that the OEMs’ end customers still don’t have the technology in their hands to sufficiently establish what the value proposition is—the proof-of-concept cycle is only now ramping up. 

The hyperscalers are a little different, Orthodoxou said, because they have well-understood workloads, so there will be earlier adoption in that segment. He explained that the hyperscalers have a clear understanding of what their total cost of ownership for implementing CXL solutions will be, but don’t want to talk about it openly. “It’s their secret sauce.” 

Orthodoxou noted that hyperscalers have more control over how CLX is going to be implemented, as well as earlier access to the silicon and the software. But if you are ordering a box that has CXL capabilities inside, there’s less control. 

When it comes to actual workloads and deployments, Orthodoxou said an important distinction is that there are CXL-capable servers and there are servers that have deployed CXL-attached memory, and they will have different adoption rates. In-memory databases like SAP-HANA are obvious candidates for early CXL adoption, he said.

CXL solutions are a collaborative effort

What CXL allows is composability, and the CXL Consortium overseeing the protocol’s specifications isn’t the only group bringing together industry players to explore how best to compose systems and optimize the use of resources.

Last fall, the Open Compute Project (OCP) Foundation announced the formation of the Composable Memory Systems subgroup of data center operators, application developers, and equipment and semiconductor companies to establish architecture and nomenclature that will be published by the group as part of the composable memory system specification.

MemVerge CEO and co-founder Charles Fan

As part of its recent pivot from Optane-enabling technologies to CXL, MemVerge recently introduced its Endless Memory solutions with both SK Hynix and Samsung, which introduced the first CXL memory module. Project Endless Memory addresses the challenge of memory exhaustion in data-intensive applications. In an interview with EE Times, MemVerge CEO and co-founder Charles Fan said memory exhaustion can lead to out-of-memory (OOM) crashes or poor performance due to swap usage.

Endless Memory uses MemVerge’s Elastic Memory Service software along with a Niagara Pooled Memory System from SK Hynix so that hosts can dynamically allocate memory as needed to mitigate OOM errors and improve application performance. 

MemVerge’s collaboration with Samsung also involves XConn, which delivered the industry’s first CXL switch, and H3 Platform, which integrated the hardware and software components in a 2U rack-mountable system with 2 TB memory capacity that can be dynamically allocated to the computing hosts.

Endless Memory uses MemVerge’s Elastic Memory Service software along with a Niagara Pooled Memory System from SK Hynix so that hosts can dynamically allocate memory as needed to mitigate OOM errors and improving application performance. (Source: MemVerge)

Fan said the availability of CXL hardware is closer to becoming reality so that a software maker like MemVerge can develop and test solutions that enable elastic memory pooling. “The role of software here is a key enabler of CXL capabilities,” he said.

The OOM problem, he explained, is a common error when running distributed workloads like AI training, data processing and scale-out databases. “It is difficult to predict the workload on each node. Some of the nodes would use more resources than the others,” Fan said, adding that a heavier load can lead to slow performance or crashes. 

Software enables pooling and tiering

Endless Memory can predict when a node might run out of memory and automatically provision additional memory from the pool, Fan said. So as long is there’s memory left in the pool, the node will never run out. “It’s a dynamic process of provisioning and deprovisioning.”

Another MemVerge solution, Project Gismo (Global IO-free Shared Memory Objects), is also enabling CXL capabilities. It improves the performance of distributed applications by eliminating network I/O and data copies by introducing what the company claims is the world’s first CXL-based multi-server shared memory architecture, according to Fan. “Whenever anything goes beyond memory and goes through I/O, that’s when things slow down by a couple orders of magnitude because of serialization,” he said.

By enabling real-time data sharing across multiple servers, Gismo eliminates the need for network I/O and reduces data transfer delays. “You basically get full memory speed access to the shared memory,” Fan said. “It makes another way for data to be transported and to be shared by direct memory access.”

Fan noted that hardware vendors share the same view that software is necessary for tiering and sharing. “It is similar to the role of an operating system to make a computer work,” he said. 

Tiering and sharing, meanwhile, is key to enabling distributed computing, which in turn enables AI workloads. In an exclusive interview with EE Times, Elastics.Cloud’s Bill Eichen said distributed architecture where everyone can use everybody’s memory is the holy grail.

An Elastics.Cloud CXL switch can connect memory modules in a distributed architecture to overcome the limits experienced by an in-memory database like SAP-HANA without the need for adding more compute (Source: Elastics.Cloud)

By sharing and pooling, it’s possible to have disaggregated racks. “We have multiple server blades inside a system sharing everybody else’s memory,” Eichen said. 

A year ago, Elastics.Cloud demonstrated what it claimed was the first symmetric host-to-host memory pooling with CXL, wherein two CXL-enabled servers equipped with FPGA cards running Elastics.Cloud IP were connected via a CXL interface over a cable.

Eichen said this configuration allows the first server to access not only its own direct-attached memory, but also expand CXL-attached memory within the same server, as well as CXL-attached memory in the second server. Concurrently, the second server in the pair can access its own direct-attached memory, its own expanded CXL-attached memory, and the first server’s CXL-attached memory.

DRAM optimization first, SSDs later

Eichen said the most immediate use case for CLX is memory expansion. An Elastics.Cloud switch could overcome the limits experienced by an in-memory database like SAP-HANA. Today, an RDMA connection is required to add more servers, even though the extra compute isn’t needed, he said, but the Elastics.Cloud switch can connect the memory modules. 

Phison CTO Sebastian Jean

Everyone’s on the bandwagon building CXL technology, Eichen added, although the reality today is that everything is 2.0 and 1.1—3.0 is a thing of the future. “You’re going to see most of the PCIe manufacturers ship a CXL device,” he said. 

Eichen explained that everything will eventually become CXL because it’s based on supported legacy standards. “This is much more evolution than revolution.”

If memory expansion for in-memory databases is at the top of the list for production-ready CXL workloads, SSDs are at the bottom. Phison’s Sebastian Jean said he isn’t seeing any solutions that incorporate them at this point. “There’s such thing as biting off more than you can chew,” he said. “I don’t think that the industry is ready to pull the SSD into whole kettle of fish.”

Jean did acknowledge there’s a place for the SSD in the CXL roadmap in the longer term. “Large language models will push us in that direction.”

The immediate focus is on making CXL a functional thing that scales and is usable. “The fundamental value proposition of CXL is DRAM only scaling so fast and you can only put so many DDR lanes on a motherboard,” Jean said. “For the applications that need more, what do you do? And, and that’s where CXL steps in.”

Source link