Why network-on-chip has displaced crossbar switches at scale


In my first article of this series about interconnect design, I explained why on-chip communication has become central to a system-on-chip (SoC) architecture. These architectural decisions determine bandwidth, throughput, quality-of-service (QoS), power usage, safety, and cost. Here, the difference between a world-class achievement and a shortcoming starts with the communication architecture choice.

When the number of elements that need to communicate in a chip is small, a simple crossbar approach to the interconnect function is a possible choice. However, when the number of elements in the system starts to grow, and the distance between them becomes large with respect to the intended clock period, crossbars no longer work and a network-on-chip (NoC) approach is required. But let’s have a look at the crossbar first.

diagram of SoC crossbar connectionsFigure 1 A crossbar connects every input to every output. Source: On-Chip Networks, Second Edition

Crossbar switches

Crossbar architectures have been around for a long time, even going back to telephone switching systems of the 1930s. In a crossbar, there is a communication path between every source of traffic and every destination of traffic; and all paths can work in parallel as long as there is no contention—two sources that need to send traffic to the same target at the same time. Contention is managed by an arbiter for each target, and flow control is at the source. This architecture is the foundation of many legacy interconnect families like the Arm NIC-400 and the former Sonics SMX.

a photo of a Western Electric 100-point six-wire Type B crossbar switch and a black and white photo of telephone operatorsFigure 2 Telephone operators (right) managed banks of Western Electric 100-point six-wire Type B crossbar switches (left). Sources: Wikipedia, Smithsonian Institution

Crossbar-based switching has been successfully used in many designs and is still popular in some modern IP subsystems because it works well when the number of sources and the number of destinations is small. But as SoC designs grew, it became apparent that the crossbar architecture, which scales in a quadratic way, would become impractical and overdesigned—way too large—for its intended function.

A large modern SoC needs to ensure hundreds of IP blocks can communicate, and these communication traffic patterns do not often require every IP block to be able to communicate simultaneously. Using a large crossbar in modern designs leaves much of the crossbar underused with many paths idle at any one time. In addition, large crossbars are difficult to implement in modern chips because of the huge die area they consume. Wires do not shrink as fast as transistors in advanced processes, making wire-related congestion in a large crossbar a real challenge.

It is possible, of course, to partition a large crossbar into smaller units, and connect them to implement the desired topology. However, there are typically inefficiencies associated with combining crossbars, such as requiring a lot of logic at the interface between two crossbars to enforce the protocol rules chosen for the connection. This is required when requests and responses are combined in the crossbar, as is the case for most existing solutions based on the AMBA AXI protocol.

Other inefficiencies—such as the requirements for a uniform data width in a crossbar, a single clock, and a single protocol—make cascading crossbars a non-optimal choice for the communication infrastructure of the SoC.

Emergence of the NoC

The internet does not have the above-mentioned problems. Data is broken up into packets, transmitted over a distributed infrastructure made of interconnected switches, and reassembled at the appropriate destination. An NoC does similar things at the SoC level: it uses lightweight switches, a packetized protocol optimized for traveling long distances at the SoC scale, and uses a distributed implementation.

diagram of how a network-on-chip routes transactionsFigure 3 A network-on-chip converts CPU and other IP block transactions (reads, writes, etc.) into packets that are routed through a network that is optimized for SoC requirements like quality-of-service, power consumption, die area, and wire count. Source: Arteris IP

While implementing a communication topology with cascaded crossbars could be described as a “coarse grain” approach—because each crossbar is a big component—implementing the same function with an NoC can be described as a “fine-grain” approach to the problem. In an NoC, the protocol of the transport is chosen so that the switches that are required to carry the packets are very simple, which translates to small and fast.

For instance, the protocol will allow independent transport of requests and responses by independent networks, so the switches do not need to worry about things like tracking outstanding transactions, something that crossbars that combine request and response networks need to do. As a result, an NoC switch, for the same bandwidth, will be ~4× smaller than the equivalent AXI crossbar with all its logic required to track outstanding transactions. Because they are lightweight, combining switches in optimized topologies becomes much easier than doing the same with AXI crossbars.

Tracking of outstanding (in-flight) transactions is done once by the NoC, at the edge, in its network interface units, instead of being done at each crossbar of a crossbar cascade. The network interface units are in fact key components of the NoC, as they are in charge of the conversion of the transaction protocol as used by the IP block to the NoC internal transport protocol. Because of this decoupling between IP transaction protocol and NoC transport protocol, it is possible for the NoC to implement various “services” at the transport level, and these services will be available whatever the transaction protocol the IP uses to communicate.

For instance, packetization and serialization at the transport level will help deal with congestion over the long distances covered by the NoC: Two 64-bit wide IP blocks can communicate over a 32-bit wide transport without even knowing it. Other adaptations, such as clock and power domains, data width, protocol capabilities, like burst support and so on, can be done by the NoC. Other services such as bandwidth and latency control, security, safety, and debug are all implemented on top of the NoC protocol and are offered to every connected IP independent from the native capabilities of their connection to the NoC.

Another advantage of NoC, and its conversion of IP transaction protocol to an NoC transport protocol and back, is that it’s easier to change IP blocks for platform-based design approaches where multiple derivative chip designs are created from an initial SoC platform architecture. This platform approach to SoC design has become more common as companies seek to respond more quickly to changes in market requirements while reducing design costs and risks.

If the SoC design team used a crossbar, which requires all connected components to expose identical interfaces, changes in IP block endpoints would necessitate bridging the new component to the crossbar protocol. In other words, an NoC with its internal, independent transport protocol, isolates each endpoint where the incoming IP block transactions are converted to packets, allowing easier “plug-and-play” changes to SoC architectures.

Not just for big designs

Connectivity for large production SoCs is now almost exclusively based on NoC architectures. A good example of using NoC technology from Arteris IP is Mobileye’s EyeQ ADAS SoC family. However, an NoC can also be useful for smaller chips. There is an interesting NoC use trend toward smaller designs with large numbers of power domains and derivative/SoC platform design requirements.

Take derivative design families. IPs can be added or removed without making major changes to physical design and timing closure, providing the benefit derivative reuse should have. And, although we haven’t discussed power management yet, NoC technology makes it easy to create very sophisticated voltage and clock domain schemes that allow extremely low power consumption for IoT chips powered by small batteries and even MEMS.

Texas Instruments’ SimpleLink family of wireless MCUs is a good example, showing that the flexibility, scalability, and ease of IP reuse with NoC-based systems are finding a growing audience.

Next up: NoCs make power and safety management easier and more capable.

Benoit de Lescure is chief technology officer (CTO) at Arteris IP.

Other articles in this series:

Related articles:





Source link