Rambus Unveils HBM4E Controller Enabling C-HBM4E


//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

Rambus has introduced one of the industry’s first memory controller IPs that support HBM4E memory, designed to handle data transfer rates of up to 16 GT/s to deliver bandwidth of 4 TB/s per HBM4E memory stack.

The controller IP supports various proprietary reliability, availability, and serviceability (RAS), along with telemetry capabilities designed to improve the reliability and efficiency of memory subsystems, the company said. It can be integrated into ASICs expected to emerge in 2027–2028, as well as custom HBM4E (C-HBM4E) base dies currently in development.

A stylized digital rendering of a Rambus semiconductor chip glowing with blue light, centered on a complex black and blue circuit board with geometric patterns.
A Rambus chip (Source: Rambus)

Rambus HBM4E memory controller

Rambus said its HBM4E memory controller can be integrated into a conventional ASIC and combined with a third-party HBM4 physical layer (PHY) implementation to build a complete memory subsystem, with HBM4 stacks communicating with the ASIC using an interposer. Alternatively, the controller can be integrated into emerging custom C-HBM4E base dies and works with HBM4E memory devices directly to save shoreline inside ASICs and lower power consumption. This flexibility enables Rambus to address accelerators with various memory-subsystem implementations.

The main selling point of the controller is its support for data transfer rates of up to 16 Gb/s per pin, enabling roughly 4 TB/s of memory bandwidth per HBM4E stack with a 2,048-bit interface. In large AI processors that integrate eight HBM stacks (such as Nvidia’s dual-chiplet B200, B300, and R200), this translates to a peak of 32 TB/s of aggregate memory bandwidth, which is dramatically higher than the 8 TB/s of aggregate bandwidth featured by Nvidia’s dual-chiplet B200 and B300 GPUs. As for maximum capacity, Rambus claimed its HBM4E controller is compliant with JEDEC’s HBM4E specification and can support up to 64 GB of memory, as defined by the standard.

How Small Electronics Manufacturers Manage Operational Complexity

By MRPeasy  04.01.2026

Next-generation LED Drivers for Exterior Lighting in the SDV

By Stefan Drouzas, Senior Application Marketing Manager at Rohm Semiconductor GmbH  03.31.2026

DEEPX Sets New Pace in Physical AI Commercialization—27 Global Deals in 7 Months

By DEEPX  03.27.2026

Technical block diagram showing an HBM4E Subsystem Top Level within an SoC, detailing the connection between 32 instantiations of an HBM4/4E Controller, an HBM4E PHY with 32 channels, and an external HBM4E 3D-stacked memory device.
A block diagram of an SoC with a Rambus HBM4E memory controller. (Source: Rambus)

Just like other HBM controller IP from Rambus, the HBM4E controller supports RAS capabilities designed to improve memory-subsystem reliability, as well as telemetry features designed to increase efficiency. RAS capabilities include link error-correcting code, cyclic redundancy checks to ensure there are no errors in stored memory, PHY condition monitoring using registers, and severity-pin monitoring (which is a de facto standard, but not mandatory, HBM feature).

“These are proprietary features, but all of them are very well accepted by our customers, so they explicitly ask for those,” Nidish Kamath, director of product management for memory interface IP at Rambus, told EE Times. “The RAS features are related to error correction coding and cyclic redundancy checks to either protect against link errors or errors in the memory when stored. There are also additional features, such as giving more information about the PHY condition as can be garnered from accessing the registers in the PHY, looking at the severity pins coming back from the device.”

Telemetry features enable system designers to monitor controller queues, memory access patterns, and link utilization to optimize memory traffic and maximize effective bandwidth.

“On the telemetry side, it is important to give insight to the end customer about the reason why a certain traffic results in a certain realized bandwidth or a certain latency, so visibility into the state of the queues within the controller, like what the delays are, what are the current backup in the different queues, what could be tweaked in the controller configuration to get those queues more in line with a streamlined access and what are the parameters that need to be changed to move the efficiency of link utilization higher,” Kamath said. “Those all are part of that telemetry functionality. And again, it is beyond the JEDEC spec, so it is proprietary to our controller.”

Rambus claimed it has secured more than 100 HBM design wins across previous generations, arguing that its experience supports first-time silicon success—a critical factor as designs move to leading-edge nodes and tape-out costs climb into the tens of millions.

An infographic titled "Rambus Silicon IP for AI" featuring a diagram of an AI accelerator/GPU chip connected to DRAM modules via HBM/GDDR interfaces. The diagram highlights various building blocks, including Interface IP (PHY and Controller) and Security IP (Root of Trust, IME, and IDE), with text describing key benefits such as industry-leading bandwidth and data protection for AI hardware.
Rambus silicon IP for AI (Source: Rambus)

Rambus positions its HBM4E controller IP as a component of next-generation AI, HPC, and other types of bandwidth-hungry data center accelerators or processors. In addition to HBM controller IP, Rambus said it offers a variety of IPs, including PCIe and CXL interconnect IP, as well as security technologies such as inline memory encryption and integrity protection. For now, encryption is not a part of the company’s HBM offering, partly because HBM is primarily aimed at data center accelerators rather than edge devices that typically rely on GDDR or LPDDR memory. Nonetheless, if a customer wants, they can add encryption, particularly in case of customized C-HBM4E offerings.

“[Customers can use] encryption, but not compression,” Steven Woo, fellow and distinguished inventor at Rambus, told EE Times. “Compression is something that will be a part of future offerings.”

While compression may make sense for HBM in the future (given complexities with expanding per-stack capacity beyond 64 GB), it may not be absolutely needed today or tomorrow as companies in the AI world adopt low-precision data formats.

“In the AI space, companies such as Nvidia and Google are increasingly supporting very small data types, such as FP4,” Woo added. “In a sense, this functions as a form of compression. Many AI developers view it as a more effective approach to increasing the usable capacity of memory. If you reduce the data type to half the number of bits, you effectively achieve two-to-one compression. What makes this attractive is that, in many AI workloads, the resulting loss of accuracy is minimal. In practice, this allows systems to process roughly twice as many parameters for the same amount of memory bandwidth.”

HBM4E at a glance

Compared with previous HBM generations, HBM4 introduces a wider memory interface and higher internal concurrency: Each stack now features a 2,048-bit bus and officially supports transfer rates of up to 8 GT/s per pin, down from roughly 9.4 GT/s available with HBM3E, to keep signaling manageable while still providing a tangible performance benefit due to a very wide interface. With HBM4E, JEDEC is expected to increase officially supported data rates to 12 GT/s–12.8 GT/s to boost memory bandwidth for future AI and HPC accelerators.

A comparison table titled "HBM Evolution Driven by AI Performance Needs" illustrating the technical progression across seven generations of High Bandwidth Memory (HBM to HBM4E). The data tracks a significant increase in performance, showing the interface width doubling from 1024 to 2048-bit starting with HBM4, and bandwidth per stack scaling from 128 GB/s in the first generation to 4096 GB/s in HBM4E. Additional columns provide metrics for data rate (Gb/s), stack height, and maximum device capacity (GB).
A comparison of HBM generations (Source: Rambus)

From an architectural standpoint, HBM4 and HBM4E significantly increase internal parallelism compared with prior generations to take advantage of the ultra-wide external interface. Each stack exposes 32 independent memory channels, and every channel is divided into two pseudo-channels, which reduces bank conflicts and improves utilization under highly parallel AI and HPC workloads. While HBM4E offers a higher data transfer rate, its internal architecture remains the same as HBM4. Just like the predecessor, the standard supports 24 Gb and 32 Gb DRAM dies and allows 4-Hi, 8-Hi, 12-Hi, and 16-Hi stacking, therefore enabling capacities of up to 64 GB per stack, which is in line with what the HBM4 specification offers.

In practice, HBM4 controller IP and PHYs from suppliers such as Rambus, Cadence, and Synopsys can reach 10 GT/s–12.8 GT/s speeds, whereas DRAM vendors have demonstrated operation of HBM4 stacks at 10 GT/s or beyond, which gives system designers additional timing and signal-integrity margin. With HBM4E, we are seeing a similar pattern: Rambus now offers an HBM4E controller supporting data rates of up to 16 GT/s. Now, it is up to developers of actual memory devices, ASICs, and IPs to develop HBM4E physical interfaces (PHY) that support such speeds.

“Cadence’s current HBM4E product (PHY + memory controller) at 12.8 GT/s performs well above the published HBM4 JEDEC spec of 8 GT/s,” Frank Ferro, group director of Silicon solutions group (SSG) at Cadence, told EE Times. “Even so, due to the increasing demand for memory bandwidth in AI training, processor and hyperscale companies are pushing the industry to deliver even more bandwidth.”

In an interview with EE Times, Brett Murdock, senior director of memory interface product line at Synopsys, said, “For those history buffs among us, you will remember that the HBM protocol has been unique in terms of published standards from JEDEC, as the vendors have tended to lead the data rates rather than the standard itself, which differs from other protocols like LPDDR6 and DDR5 where the standard provides for higher data rates much earlier than the vendors can supply devices supporting those data rates. HBM4 follows this tradition since the standard goes up to 8 GT/s but we have seen Micron indicate they have 11 GT/s devices while Samsung and SK Hynix both indicate they have 11.7 GT/s devices. Our IP will support these devices as our HBM4 IP has been targeted for operation up to 12 GT/s.”

Getting to 16 GT/s with HBM4E

Widening HBM4’s interface to 2,048 bits was challenging for the industry, as it required a complete re-architecture of HBM stacks and controllers. However, pushing higher data transfer rates in HBM4E—to 12 GT/s and 16 GT/s—is far more complex than simply widening the interface. First, it requires increasing internal memory clocks to 3 GHz-4 GHz, which is complicated as HBM devices are more capacious and therefore physically larger than commodity DDR5 or LPDDR5 devices. Such DRAM devices introduce timing constraints as row activation, sensing, restoration, and refresh operations must remain reliable at high frequencies. Second, electrical signaling challenges over signal travel distance escalate rapidly at multi-GT/s speeds.

A cross-sectional architectural diagram of an HBM (High Bandwidth Memory) assembly integrated with a GPU/CPU SoC die. The image shows a "Known Good Stack Die" (KGSD) consisting of four HBM DRAM dies vertically connected to a base Logic Die using Through Silicon Via (TSV) technology. Both the memory stack and the SoC die—which contains HBM PHY and Controller IP—are mounted onto a common Silicon Interposer with a 55 nm bump pitch, all of which sits atop a package substrate.
A high-bandwidth memory subsystem implementation using an interposer. (Source: Cadence)

As per-pin data rates climb toward the 12 GT/s–16 GT/s range, signals traveling through interposers, package traces, and TSVs suffer increasing attenuation, jitter, crosstalk, and reflections. These effects shrink timing margins and make it harder for receivers to distinguish valid bits, which forces designers to rely on stronger equalization, tighter impedance control, and more sophisticated PHY circuitry. The result is greater design complexity and rising power consumption just to maintain signal integrity.

“At 16 Gbps, the main limitation comes from interconnect physics—capacitance, parasitics, routing distance, and signal time-of-flight between the PHY and the memory device,” Kamath said. “Custom base dies and packaging innovations like hybrid bonding help reduce these effects, allowing higher performance over the same links.”

Indeed, to mitigate restrictions imposed by interposers, package traces, and PHY, the industry is exploring shorter routing paths, hybrid bonding, and custom base die designs that place interface logic closer to memory arrays.

Power delivery becomes another limiting factor because thousands of HBM I/O pins switch simultaneously at very high frequencies. I/O power scales with frequency and voltage, so faster signaling significantly increases PHY energy use and stresses on-package voltage regulation. Rapid current transients generate electrical noise that further complicates timing closure, which is why Rambus works closely with customers to ensure proper integration of its HBM controllers, the company said.

“With that kind of switching activity going through this memory interface, power and thermal are also concerns,” Kamath said. “So, we do work with our end customers to tweak the mix in terms of their physical implementations to get to lower power based on whatever the PHY and the process node offers.”

Thermal constraints are particularly severe for high-performance HBM4 stacks because HBM stacks are vertically integrated using TSVs and positioned immediately adjacent to high-power GPUs and AI accelerators. Faster signaling increases switching losses in PHY circuits and raises heat density within the memory stack that is hard to cool. Furthermore, elevated temperatures worsen leakage, increase refresh overhead, and reduce long-term reliability. Unlike CPUs, stacked memory devices cannot reliably operate above about 95°C, which complicates their cooling when paired with water blocks designed for ASICs that can exceed 105°C.

Finally, manufacturing variation and controller complexity further complicate speed increases. Minor differences in TSV dimensions, interposer routing, and package warpage can disrupt timing margins at extreme data rates, which lowers yield and raises costs. However, increasing the link width beyond 2,048 bits is impractical today, while bonding HBM4 arrays on top of logic dies is still a work in progress, according to various industry insiders.

Despite all challenges, the industry—spearheaded by Rambus, Cadence, and Synopsys—has managed to get to 16 GT/s for both the HBM4E controller and PHY. Customers can now license the Rambus HBM4E controller and equip it with a physical interface developed by Cadence or Synopsys.

“Although not formally announced, the latest HBM4E PHY and memory controller from Cadence support 16 GT/s of performance,” Ferro said. “The 16 GT/s HBM4E IP is available now for customer designs.”

To achieve 16 GT/s, Cadence not only had to work closely with partners among memory vendors, but also had to optimize the design of the interposer (which is the most common way to connect HBM stacks to host processor), memory controller, and the shape of its PHY.

“HBM4E performance of 16 GT/s is achieved through a combination of the PHY/controller memory subsystem design and the interposer design,” Ferro said. “Silicon interposers are the most commonly used solution to route all these signals. The design of the interposer is critical to meet the system performance. The interposer design team analyzes the number of routing and ground layers required, trace widths, signal and ground placement to minimize signal integrity and power integrity effects. The shape of the PHY is also important to the end customer in order to make the most efficient use of the die shoreline while minimizing the impact on the compute area. Today’s processors use reticle size dies to pack the most compute power and the maximum number of HBM PHY/controllers along the die shoreline, so Cadence works closely with customers to customize the shape of the PHY to ensure the best use of their silicon area.”

Murdock added, “Naturally, we are working very closely with the HBM vendors to understand their roadmap and ensure we have IP to enable the ecosystem—and this includes HBM4E up to 16 GT/s. In fact, our HBM4 controller was updated in January to operate at 16 GT/s, and we anticipate taping out our HBM4E PHY later this year.”

Building IP that goes beyond JEDEC’s recommendations

Notably, ‘E’ versions of HBM standards remain on the market for longer periods than ‘base’ versions. The latter largely remain transitional, while the former are more mature and use all yield and performance-boosting techniques developed for the base specification.

As Murdock noted, the industry continues to prioritize bandwidth and is willing to use faster HBM stacks to either get higher performance, or at least have some extra timing, signal integrity, or power margin. Yet developing memory controllers, physical interfaces, and HBM stacks that exceed standard specifications is not easy.

“The biggest challenge when building IP that goes beyond what the standard defines is ensuring there are no surprises when it comes to the operation of the memory,” Murdock said. “We need to collaborate very tightly with the DRAM vendors to ensure our design will support changes to the memory needed for the higher data rates, whether that be a little more voltage to a particular supply or an increase to a key timing parameter.”

An overhead view of a red Cadence HTC1 Rev 1.0 hardware evaluation board. The board features a central, large gray socket assembly for a semiconductor chip, surrounded by various electronic components including red and black test points, gold-plated connectors, and integrated circuits. A ribbon cable and power wires are connected to the top and right edges of the board.
An HBM test system board used to bring up HBM controllers and physical interfaces. (Source: Cadence)

Speaking of voltages and timings, the HBM4 specification already allows vendors to set their specific core (beyond typical Vddc 1.0V–1.05V), I/O (beyond Vddq 0.7V–0.9V), as well as I/O and Tx driver (other than typical Vddql 0.4V) voltages if they need to. Furthermore, they can define custom speed bins and scale timings accordingly, which somewhat simplifies work on semi-custom memory subsystems, but also makes collaborations between memory, PHY, and controller vendors mandatory rather than optional.

“As the DRAM vendors are leading the JEDEC standard rather than following, we must be proactive to enable the ecosystem,” Murdock added. “We do assess ongoing opportunities and needs to deliver data rates beyond industry standards based on our customer discussions [to achieve the] right balance providing higher bandwidth than the current standard without compromising energy efficiency and silicon footprint.”

Standard HBM4E or customized C-HBM4E?

One important detail about the Rambus HBM4E controller is its versatility as it can be integrated both into ASICs (i.e., a standard implementation) and into custom C-HBM4E base dies, which greatly enhances addressable markets for the company. Building an advanced HBM4E controller to be integrated into ASICs requires making it compatible with first-party or third-party PHY (in case of Rambus, there are only third-party PHYs), but building a sophisticated HBM4E controller for C-HBM4E dies requires the developer to work with HBM4E memory makers to ensure compatibility with TSV PHYs, which is why Synopsys, which offers both HBM4 controllers and PHYs, has to offer different controller IPs for HBM4/HBM4E and C-HBM4E.

“In a traditional JEDEC HBM implementation, the HBM controller is integrated directly with our HBM PHY where we control both sides of the interface,” Murdock said.” When integrating the HBM controller on a CHBM base die it must integrate directly with a TSV PHY provided by one of the DRAM vendors, and this interface is not something which the DRAM vendors have a standard to follow. [Hence] to successfully enable the ecosystem, we need to closely collaborate with each of the DRAM vendors to ensure our CHBM controller can be seamlessly integrated with their TSV PHY.”

An architectural diagram and comparison table showing the evolution of High Bandwidth Memory base die technology across HBM3E, HBM4, and C-HBM4E generations. The diagram illustrates a shift from a DRAM-process base die in HBM3E to advanced logic processes in HBM4 (TSMC N12) and C-HBM4E (TSMC N3P). A table below highlights the corresponding reduction in supply voltage ($V_{dd}$) from 1.1V to 0.75V, and the integration of a Memory Controller (MC) and Custom PHY (C-PHY) in the C-HBM4E configuration.
Comparison of HBM, HBM4E, and C-HBM4E (Source: TSMC)

Custom HBM is a rather big deal. Although Marvell has been the most vocal company about its approach to CHBM, virtually all developers of high-performance processors that had used HBM before, have evaluated CHBM4E too, according to Rambus. However, for now, the question about deployment depends on whether companies can amortize development costs across multiple product lines or whether they just need performance gains without additional ecosystem complexity associated with custom base dies and CHBM4E memory.

“Nearly all of our leading-edge customers targeting HBM4E speeds above roughly 12.8 GT/s have evaluated both custom base die and standard implementations,” Kamath said. “The decision ultimately depends on how effectively they can reuse the base die design across their product portfolio. For example, if a customer has multiple data center projects serving different use cases, but can deploy the same chiplet across those programs, they are more inclined to adopt a custom base die approach, even if it involves closer alignment with one or two memory vendors. In contrast, customers focused primarily on maximizing performance across a broader HBM fleet may find that interposer-based designs deliver comparable performance without the additional complexity of custom base dies.

Rambus said it can achieve 16 GT/s data transfer rate both with a standard implementation that involves a PHY and an interposer and using a custom base die implementation. However, because of tighter integration and a great simplification of physical interface, CHBM4E implementation may be more power efficient.

“When using a custom base die approach, power consumption typically decreases because the solution becomes more tightly integrated,” Kamath said. “This effectively opens up a different TDP or design envelope for the system. […] Most of the power in the memory interface subsystem is consumed by the PHY as the energy is required to drive signals across the interconnect. So, the PHY and the interconnect together account for the majority of the power in the memory interface. When this path is shortened using a custom base die approach, the PHY becomes simpler. Instead of driving signals across an interposer substrate, the PHY only needs to drive TSV connections, which reduces capacitance and simplifies the PHY design. As a result, PHY power consumption decreases. This also reduces the need for complex equalization.”

In addition to lowering PHY power consumption, custom base dies provide DRAM vendors opportunities to further tune performance and efficiency.

“At the same time, it gives memory vendors additional flexibility to adjust timing parameters in the custom base die implementation,” Kamath added. “With shorter TSV paths and reduced time-of-flight, vendors can potentially optimize their HBM devices to achieve lower end-to-end latency. Overall, a custom base die architecture opens up further opportunities to improve the performance and efficiency of the entire memory subsystem.”

Maximizing performance while reducing power consumption makes CHBM4E an interesting option for system developers compared with traditional implementations, which is why Rambus expects adoption to grow over time. However, the approach also adds complexity to the supply chain.

“The industry is still working through the logistics, as the supply chain becomes more complex with custom base die solutions,” Kamath said. “Instead of just the IP vendor, the end customer, and the design house, a fourth party—the memory vendor—also becomes involved. These four participants must coordinate timelines, target process nodes, and align different levels of ASIC design sophistication across the project. As a result, the industry is still going through a learning curve. Looking ahead to HBM5, I expect [custom base die] approach to become table stakes. Today, only leading data center customers seriously evaluate custom versus standard implementations, but over time we will likely see startups and other enterprise deployments of HBM exploring the same options.”

As a result, when choosing between traditional and custom base die HBM4E integration, system developers must take into account power efficiency, deployment strategy, and product fleet considerations rather than just the maximum performance, according to Rambus. One more factor could be HBM4E capacity in the final product.

HBM capacity increase?

The official HBM4E specification is set to increase data transfer rate from 8 GT/s to at least 12 GT/s, which has potential to increase per-stack memory bandwidth to 3 TB/s. With the joint work between Cadence, Rambus, Synopsys, and DRAM makers, even higher HBM4E data transfer rates are made possible, though it remains to be seen whether there will be mass produced HBM4E memory subsystem with a 16 GT/s memory speed. So far, the industry has never seen any production AI or HPC accelerator running HBM at its maximum speed, though this might happen with HBM4E, considering the fact that this memory standard is likely to stay around for a while. But while HBM4E extends performance of HBM4, it does not extend its per-memory device and per-stack capacities, which is arguably its main limitation. This is when C-HBM4E with a narrow interface may help.

A side-by-side product showcase of Samsung HBM4 memory. On the left is the enclosed chip package featuring a brushed metallic surface with "SAMSUNG HBM4" branding. On the right is the exposed silicon die, showing the intricate golden circuitry, microscopic patterns, and interconnects of the high-bandwidth memory architecture.
Samsung’s HBM4 stacks (Source: Samsung)

Both HBM4 and HBM4E specifications enable integration of up to 16 memory devices with up to 32 Gb capacity per stack, thus letting memory makers to build HBM4/HBM4E stacks of up to 64 GB. Initially, memory makers intend to offer 36 GB HBM4 (12Hi × 24 Gb) stacks with 48 GB HBM4 stacks following later. According to projections from Micron Technology and Nvidia (which plans to equip its Rubin Ultra GPU with 1 TB HBM4E memory using 16 stacks), 64 GB configurations are expected to become available only in late 2027 or even 2028, roughly coinciding with plans by Nvidia to pair its Rubin Ultra GPU with up to 1 TB of HBM4E memory.

There are several fundamental reasons that prevent HBM device and stack capacity increases, including architectural limits in the memory addressing scheme and declining channel efficiency as larger DRAM dies require longer refresh cycles.

“Die capacity is partly constrained by the JEDEC specification, particularly the number of addressing bits and supported operating modes,” Kamath explained. “This is one reason memory vendors do not aggressively pursue higher densities. Another limitation comes from refresh and other background operations. As die capacity increases, refresh cycles take longer and consume a larger portion of available memory time, reducing channel efficiency. This effect becomes more pronounced at the higher operating temperatures associated with 16 GT/s signaling, where refresh overhead increases further. From the memory-device standpoint, achieving 16 GT/s is a lower-hanging fruit than solving more fundamental challenges such as thermal constraints or refresh penalties. Together, these factors discourage vendors from pushing to significantly higher per-die capacities.”

Without the ability to increase HBM capacity per processor using memory stacks beyond 64 GB, system developers will likely need to increase the number of stacks per processor. However, this is difficult because reticle-sized ASICs can accommodate only a limited number of 2,048-bit HBM4/HBM4E interfaces. However, they can accommodate a larger number of narrower interfaces, such as Marvell’s C-HBM4E implementation with a 512-bit interface and connect a larger number of C-HBM4E memory stacks to the ASIC using the same shoreline and without sacrificing bandwidth.

Although expanding HBM4E capacity per memory subsystem beyond what is possible with 64 GB HBM4E stacks on a 2,048-bit I/O is not central to Rambus’s announcement, its controller could prove useful for this purpose in future C-HBM4E designs.


See here:
Micron’s HBM4 memory stacks.



Source link