//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
Leading chipmakers in recent years spent tens of billions of dollars on advanced-chip-packaging facilities—to prepare for building processors in multi-chiplet packages that will offer consistent performance increases and ensure continuity of Moore’s law.
Analysts interviewed by EE Times took pains to explain the spend. But before we go there, let’s first take a close look at the numbers.
Advanced-chip-packaging revenue is expected to grow from $44.3 billion last year to $78.6 billion by 2028, according to Yole Intelligence. Meanwhile, the traditional chip packaging market was valued at around $47.5 billion last year and is projected to grow to $57.5 billion by 2028. The whole chip packaging market is expected to reach $136 billion by 2028, Yole estimates.
Given rapidly expanding demand for advanced packaging, it is not surprising that that leading chipmakers and “outsourced semiconductor assembly and test,” or OSAT, companies spent around $14.5 billion on advanced packaging fabs and tools in 2022, Yole Development estimated. Intel was projected to lead the market with around $4 billion investment, followed by TSMC with about $3.6 billion, Samsung Foundry with circa $2 billion, and ASE Group with roughly $1.7 billion. Due to a chip market downturn this year, the companies are expected to reduce their advanced packaging CapEx budgets to $11.9 billion, which is still a massive sum.
There are many reasons why advanced packaging market and multi-chiplet designs are set to thrive in the coming years.
Firstly, chip production costs are increasing as foundries raise their quotes with every new production node—as fab equipment gets more expensive.
That makes production of large monolithic chips particularly expensive when you factor in defect density and yields. So, making two smaller chips and then stitching them together may be a lot cheaper than making one huge monolithic die.
“If you shrink the die, you get higher yield,” G. Dan Hutchinson, vice chair at TechInsights, told EE Times.
Secondly, it makes perfect sense to disaggregate designs.
While logic continues to scale in terms of power, performance and area with every new process technology, scaling of analog and SRAM circuits essentially stopped at 5-nm–3-nm nodes, which makes chip designers less inclined to adopt of the latest nodes. Instead, they disaggregate designs and then produce different chiplets using the most optimal process technology.
Furthermore, once you have enough different chiplets, you can mix and match them in a product to build a solution tailored for a particular workload.
“This is no longer a question of, if everybody’s going this [multi-chiplet] way,” Hutchinson said. “It is not even a question of when; when has already happened. The more they look at this, the more they find, there are advantages in the performance of the chips, there are advantages in your control of IP, there are so many market advantages.”
The third reason is perhaps less obvious. The maximum exposure field size of a contemporary EUV scanner is 26 mm (slit) by 33 mm (scan), or 858 mm² (which is sometimes called maximum reticle size). Next-generation High-NA EUV (0.55 numerical aperture) scanner will retain a 26mm slit, but will halve the scan to 16.5mm, so the exposure field size will be 429 mm2.
As a result, somewhere beyond 1.8nm, chip developers will have to use multi-chiplet or multi-tile designs for high-performance applications. In fact, Intel once planned to deploy High-NA at its 1.8nm fabrication technology. Instead, the company opted to use conventional EUV scanners with 0.33 NA optics with EUV double patterning and/or pattern shaping, so High-NA is coming to fabs beyond 1.8nm.
“When you go High-NA, the exposure field size gets cut in half,” Hutchinson said. “So, it is becoming really a big imperative to make [multi-chiplet designs with advanced packaging] happen. It is in terms of what are the strategic drivers driving them to do this.”
Intel bets big on EMIB
Intel in recent years made huge bets on its Embedded Multi-Die Interconnect Bridge (EMIB) for 2.5D integration and Foveros for 3D integration process technologies. Many of its important products rely on the company’s EMIB and Foveros advanced packaging technologies. The list includes: Ponte Vecchio compute GPU;, Sapphire Rapids and Sapphire Rapids HBM processors for datacenters and supercomputers; and Meteor Lake, Arrow Lake and Lunar Lake upcoming client CPUs.
The number of Intel’s multi-chiplet product’s already on the market and in the pipeline shows that it makes good sense financially for Intel to use such design, Jon Peddie, president of Jon Peddie Research, told EE Times. “Multi-chip, chiplet, stacked chips all exploit some of Intel’s intrinsic strengths — manufacturing at sub-atomic levels, materials science, and signaling technology. Chiplets is not just about trying to hook up a bunch of processors like a Lego kit. It is about correctly designing, managing, and manufacturing in volume.”
Intel is among the leaders when it comes to packaging technologies, and it is poised to use them extensively. Yet, the company remains relatively conservative, sources told EE Times.
“If you talk to any analyst, they will tell you that Intel is the world’s leader in the technology,” Hutchinson said. “And yet, they will also tell you the sad part is, Intel has not used it as much as it should.”
Because Intel bets so much on its advanced packaging technologies, it should not come as a surprise that it also spends billions on advanced packaging facilities. The company’s packaging facility near Rio Rancho, N.M., that is set to come online this year cost the company about $3.5 billion.
Meanwhile, this plant is not the only Intel advanced packaging facility in the U.S. The company’s CH4, near Chandler, Arizona, is also capable of using Intel’s EMIB and Foveros technologies. It has also been used to build samples of the company’s Meteor Lake CPU.
“Arizona is where the [packaging] research is done,” Hutchinson said. “Arizona has always been the packaging R&D center, and New Mexico is more of a manufacturing site. Then [Intel] also has packaging in other parts of the world, but it is not the most advanced.
In mid-June, Intel said it was also set to build a $4.6 billion, advanced packaging facility in Poland, by 2027.
The huge sum of investments gives an idea about the scale and capabilities of the upcoming plant, as well as how important this facility will be for Intel. In fact, it is completely logical for the chip giant to invest in advanced packaging heavily since the company has historically controlled production of its chips—from a silicon wafer to a complete device. Intel wants to keep it that way—for not only its own products but also its foundry services. So, the money it is spending on such facilities is set to be higher than the sums spent by TSMC.
“One of Intel’s real advantages for their foundry offerings will be the ability to do full-service, full flows from the wafer in to package out,” Hutchinson said.
TSMC: Billions on packaging facilities
In late June, TSMC said it planned to build a $2.87 billion facility for advanced chip packaging in Taiwan. The plant is expected to come online several years down the road.
“To meet market needs, TSMC is planning to establish an advanced packaging fab in the Tongluo Science Park,” TSMC said in prepared remarks, noting that it expects to create 1,500 job opportunities.
The upcoming Tongluo facility will probably be similar to TSMC’s recently launched Advanced Backend Fab 6, which supports all of the company’s 3DFabric packaging technologies, including frontend 3D stacking techniques such as chip-on-wafer (CoW) and wafer-on-wafer (WoW), as well as backend packaging technologies like integrated fan-out (InFO, chip first) and chip-on-wafer-on-substrate (CoWoS, chip last).
The $2.87 billion investment in the Tongluo advanced packaging facility is not the only big investment TSMC is making in packaging.
In the last few months, several media outlets have published reports claiming that TSMC can barely meet demand for its CoWoS packaging due to overwhelming demand for Nvidia’s compute GPUs used for artificial intelligence (AI) and high-performance computing (HPC) applications. To meet demand for its CoWoS technology in 2025, the company is buying new tools to install into existing facilities and increase CoWoS capacity by two times, by the end of next year.
“But for the back end, the advanced packaging side, especially for the CoWoS, we do have some very tight capacity to—very hard to fulfill 100% of what customers needed,” C.C. Wei, chief executive of TSMC, during the company’s earnings call in July. “So, we are working with customers for the short term to help them to fulfill the demand, but we are increasing our capacity as quickly as possible.”
He said the company somewhat expects the tightness to ease up toward the end of next year, noting that CoWoS capacity will be doubled in comparison with this year.
Being the world’s largest contract maker of chips, TSMC earns huge profits by making some of the world’s most complex processors, such as Nvidia’s H100 compute GPU.
Yet, the company understands that, in the future, many of its customers will rely on multi-chiplet designs and will not only need pieces of silicon but silicon on a silicon interposer and working in concert. That requires a big investment in advanced packaging technologies and appropriate production facilities.
“One of the reasons why TSMC has done really well with [advanced packaging] is because they got into it very early and they understood it very early,” Hutchinson said.
While TSMC can and will offer advanced packaging services to its clients, it wants traditional OSAT providers to catch up and provide similar services.
In fact, to popularize its 3DFabric packaging technologies in the industry, the company formed its 3DFabric Alliance, which encompasses developers of electronic design automation (EDA) tools, IP designers, contract chip design companies, memory makers, advanced substrate producers, fab tool makers, and OSATs.
In the best-case scenario, TSMC would like companies like ASE Technology, Amkor Technology, and JCET to offer advanced packaging to customers, which is why it is willing to license its methods to chip assemblers.
Furthermore, ASE already has a number of its own advanced packaging technologies akin to those used by TSMC (e.g., FoCoS, fan out chip on substrate). But OSATs may not be too inclined to offer such services just now since they require huge investments and pose huge risks as a failure with a multi-chiplet packaging renders several chiplets useless.
“We think [chipmakers] need to offer OSATs more technical support/profit incentives if they want OSATs to be more engaged in the CoWoS (Chip on Wafer on Substrate) market,} Szeho Ng, an analyst with China Renaissance Securities, wrote in a note to clients. “OSATs’ foray into the ‘WoS’ process tallies with our long-held view–it is generic die attach and doable under the traditional disaggregated foundry-OSAT model where both parties manage their own processes. We are skeptical about OSATs’ penetration in ‘CoW,’ as more front/backend process crossover raises execution risks. Any CoW rework costs are costly for OSATs, which unlike front-end fabs lack the attractive front-end wafer profit avenue to fund their backend forays.”
The issue could be mitigated if the reimbursement for any yield deficiencies from the OSATs is restricted to a cap that all participating parties have agreed upon, and clear responsibilities are established, the analyst notes. But since this has not been done yet, one of the reasons why TSMC has to invest billions in advanced packaging, is because its partners among OSATs are less inclined to offer such “margin dilutive” CoWoS services.
Another reason why TSMC has to pour in billions in advanced packaging is because companies like ASE Technology—which earned $21.831 billion in 2022—hve considerably lower CapEx budgets. ASE’s capital expenditures totaled $440 million in the first half of 2023, and the company has said that it is going to spend another $580 to $600 million on production tools in the second half of this year.
Samsung: Progressing Rapidly
Samsung Foundry is the third contract maker of chips that has both leading-edge lithography fabrication processes and advanced packaging technologies.
While the company does not invest in advanced packaging capacity as aggressively as Intel and TSMC, it has a number of sophisticated packaging technologies, including 2.5D I-Cube (interposer-based), H-Cube (hybrid interposer, or hybrid FCPBGA) and 3D X-Cube.
When it comes to production of logic in general and high-profile products in particular, Samsung Foundry is considerably smaller than Intel and TSMC, so its advanced packaging technologies may not be as well-known as those of Intel and TSMC.
Yet, Samsung is ramping up its advanced packaging services very quickly. The company earned $3.1 billion packing chips using its I-Cube, H-Cube, and X-Cube technologies in 2021, but its packaging division increased its revenue to $4 billion in 2022.
Justified and beneficial
Multi-chiplet designs are an inherent part of the semiconductor industry’s future: It is easier to maximize yields of smaller dies, it does not make sense to make analog and interface circuitry on leading-edge processes, and High-NA EUV scanners will cut maximum die size in half, making stitching compulsory for any high-performance design.
Furthermore, many designs are going to use 2x, 4x, 6x reticle size packages to pack all the logic they will need for AI and HPC applications in the near future.
“Chiplets are at the forefront of our industry and provide the most efficient and cost-effective solution for the world’s data infrastructure through their highly customizable nature,” said Sudhir Mallya, a marketing executive at Alphawave Semi, a contract chip designer and IP provider. “Given their smaller dies, they have higher yields, which lowers manufacturing costs and power consumption. Additionally, they provide a ‘more than Moore’s’ ability to address the compute needs of AI apps compared with traditional GPUs that have been used to train AI models, while also providing a more-flexible product configuration.”
Leading chip designers understood that smaller dies take less time to get to high yields years ago, so we have seen various forms of multi-chip module (MCM) processors for servers (e.g., IBM Power5), client (Intel Pentium D), and even game consoles (ATI Xenos with eDRAM) for decades.
In modern history, AMD led the way with its disaggregated Ryzen and Epyc designs comprising of multiple CCDs (core complex dies) and a single IOD (input/output die) connected using Infinity Fabric, a technology that AMD has perfected for its Instinct MI300-series datacenter APUs and compute GPUs.
Then Intel announced a slew of its multi-chiplet CPUs and GPUs in 2020-2021, after which Apple came up with its M1 Ultra and M2 Ultra dual-die processors.
Disaggregating compute logic and IO (e.g., Ryzen and Epyc) or producing two smaller dies instead of one big die (Instinct MI250, M1/M2 Ultra) are perhaps the most obvious disaggregation scenarios. In fact, design disaggregation can be very rewarding since in some cases chip designers can save 30% to 40% of costs by splitting up their designs, according to TechInsights’ Hutchinson—who a couple of years ago wrote a TechInsight’s Chip Insider paper about how “if you take a design and you fraction it up to critical and non-critical [chiplets], and then you factor in the differences in mask layers, you can actually save 30% to 40% of your cost,” he recalled.
In addition to disaggregating their designs, AMD and Apple solved two significant problems with their processors: implementing high-bandwidth interconnects between two or more dies with performance comparable with internal chip connections and presenting these processors as one to software (i.e., unify resources, memory pool, etc.).
While AI and HPC software is tailored to use all compute resources it can get, graphics applications are hard to scale across different GPUs. So, Apple did quite a job with its M2 Ultra: While many companies have tried, only Apple has clearly succeeded in making such a multi-GPU design work properly with its latest M2 Ultra system-in-package.
“Others already have, before Apple,” Peddie said. “AMD was the first with their heterogeneous software program HSA in 2013 that became the Radeon Open eCosystem (ROCm) umbrella in 2020. Intel initiated their oneAPI in 2019, and in SoC land Qualcomm has been doing it since 2007 with the introduction of Snapdragon and its multi-processor architecture.”
There are numerous high-end chips with massive die sizes on the market.
For example, Nvidia’s GH100 has a die size of 814 mm2 and based on unofficial information, the company’s H100 compute GPUs are in short supply not because of lack of silicon (i.e., low yields), but because TSMC’s advanced packaging capacities are fully booked.
“Companies like TSMC, Intel, and to some extent Samsung have gotten so good [that you can get] reasonable yields on a full field die,” Hutchinson said. “So, your integration limit becomes not just the limit of the resolution, but also the limit of the exposure field size and litho tool [reticle].”
Since companies like Nvidia can get reasonable yields with their massive compute GPUs or FPGAs, it may seem that they are less inclined to disaggregate their designs. But stitching two or more similar dies together is something that chip designers will have to do in the High-NA era anyway, Hutchinson asserted. Therefore, it makes a lot of sense to learn how to use such designs now, which is what AMD and Apple are doing.
“You are at a point now, where once you get to a full exposure field die, you are going to have to do [multi-chiplet] anyway with High-NA, you better start trying to figure out how it works today so you do not hit this technology and [it] kills you.”
Disaggregation of chip designs can be done in multiple ways. For example, Intel disaggregated not only compute tiles, but also Rambo cache and HBM memory PHY in Ponte Vecchio. AMD followed the suite with its Navi 31 GPU that got its Infinity Cache and GDDR6 memory PHY spread over six separate chiplets.
Enabling die-to-die interconnects with bandwidth and latencies comparable with internal interconnections is hard and expensive. But as long as chip designer gets higher yields and manufacturability and lower silicon cost offsets the cost of advanced packaging, companies will use it.
“When I did that [multi-chiplet cost] modeling several years ago, the big advantage was, you could offset the additional package costs by the fact that you were getting higher yield among all the dies,” Hutchinson recalled.
All three leading chipmakers and OSATs are advancing their 2.5D- and 3D-packaging technologies by making pitchers smaller to enable denser interconnects.
These next-generation advanced packaging technologies will be more expensive than existing versions of CoWoS, EMIB, Foveros, or X-Cube.
Meanwhile, so long as costs of packaging are offset by lower costs of silicon, chip developers are going to use them.
Furthermore, over time everything gets cheaper.
“Once the process is running smoothly, and if the volume is high enough, engineering (of almost anything) will find better, more efficient, and therefore less expensive ways of doing things,” Peddie said. “What keeps a process from being improved is the ROI for the improvement. If you are only making 10 of something a year, that does not support very much investigation in how to make it better or faster. But when you are punching out a million a day of something you have got budget (and necessity) to make it more efficient.”
Advanced packaging technologies bring a plethora of opportunities to chip designers. But they also present challenges. Among them: signal management, impendence characteristics of chips made on different nodes, power consumption, packaging yields and packaging costs.
One of the promises of multi-chiplet design ideology is that it enables developers of solutions to mix and match chiplets made on different nodes to get desired performance and features to meet demands of existing and emerging applications. But chips made on different nodes use different voltages, have different impendences, and can even feature different Z-height, making their integration a nightmare.
As Peddie puts it, Intel’s Meteor Lake and Ponte Vecchio system-in-packages (SiPs) are modern miracles built by different fabs on different production nodes in different parts of the world.
“[Multi-chiplet designs are all] about signal management engineering to get the correct propagation and impedance characteristics across a tiny piece of composite materials at GHz speeds and pico-second rise times,” he said. “We blithely talk about these modern miracles and take them for granted. But that is because very few people who do just talk about it have never struggled to even attach an oscilloscope or logic analyzer to a pin without influencing the behavior of the device they are trying to measure.”
To ensure that chiplets developed by different companies and made on different nodes are compatible with each other and can be combined in a single product package, leading chip designers and manufacturers developed Universal Chiplet Interconnect Express (UCIe), an open specification that defines chiplet-to-chiplet interconnection with the aim to build a ubiquitous ecosystem.
But UCIe is in its infancy and there are experts in the industry who have doubts that the UCIe 1.0 specification will be enough to build a robust ecosystem of chiplets.
Yet, the specification will continue to develop and only time will tell whether UCIe will allow to build an ecosystem that is even distantly as abundant as PCIe.
Power consumption of multi-chiplet designs compared to integrated designs is also a thing to consider. While it may be cheaper to build a multi-tile solution, a monolithic one may offer higher performance and lower power consumption and thus be more efficient.
“If you go out to a chiplet, it costs you a lot in power and speed,” Hutchinson said. “It is just not as much as if you go from the chip to another chip on the board, […] it depends on the design and the distance and all that, but you get these orders of magnitude [higher power consumption for interconnections when compared to] a fully integrated device. […] So there is always an inherent advantage to integrating everything into a single die for that reason. But at some point, it breaks because you try to pack too much into it.”
Meanwhile, higher per-chiplet yields may enable developers to throw in more transistors into their multi-chiplet designs compared to what they would have used for a monolithic design, which will ensure higher performance.
Since advanced packaging requires clean rooms, sophisticated equipment and tens of complex steps, the concept of yields is fully applicable to multi-chiplet SiPs, too.
Assembling a processor based on five or more chiplets (e.g., Intel’s Meteor Lake consists of five chiplets) made on different nodes sounds plausible from many points of view, high yield of the packaging process is crucial since if it fails, all chiplets go to bin, which means losses for the chipmaker and/or OSAT.
Both Intel and TSMC argue that their advanced packaging yields are very high, but they certainly do not disclose any numbers.
Intel acknowledged this year that large substrates, such as those used for massive system-in-packages like Ponte Vecchio, tend to warp, which poses yield risks and makes it difficult to assemble them onto a motherboard. For now, Intel seems to be satisfied with what it has. But, to ensure that its future SiPs do not bend and perform better (as they integrate things like optical interconnects), the company plans to implement glass substrates instead of organic substrates in the second half of this decade. Such a move requires a lot of changes and investment and to a substantial degree it is necessitated by the ongoing transition to multi-chiplet designs.
While advanced packaging methods in general, and multi-chiplet designs in particular, present ample opportunities for chip developers, these are very complex technologies that present numerous challenges to producers of microelectronics and OSATs.
Yet, every new technology pushes the limits of what is possible, and companies like Intel have learned how to solve problems and make breakthroughs over the last 60-plus years in microelectronics.
“TSMC, Intel, Samsung [and other chipmakers] have thousands of incredibly smart scientists and engineers who do not sit around all day sipping tea and playing Wordle,” Peddie said, noting that advanced packaging technologies did not just come out of the blue. “They came from hundred, maybe thousands of experiments. They were devised by thousands of hours of simulation runs, chalk talk, and sleepless nights.
“We are impressed with what they are doing today,” he added. “But in the labs, they are trying to solve the problems, and overcome the barriers of what will be manufactured three to five years from now, and thinking about what and how they will build transistors 10 years from now.”