October 23, 2006 -- Ask SOC designers to name the biggest problems they face and, invariably, timing closure and power dissipation are at the top of the list. There are several reasons for the existence of these problems, including increasing design and process complexity, but one of the major problems is the use of clocked, synchronous buses to connect the various IP cores on a chip. It is apparent that synchronous interconnect and the global clocking methodology needed to support it is just not able to handle the types of SOC designs typically done at 130nm and below, and no amount of increased clock frequencies or multicore architectures will remedy the situation. It’s time to move to clock-less, self-timed interconnect to enable communication between the IP blocks on a chip. A self-timed interconnect fabric on an SOC, with the right tool suite to generate the complex circuitry, offers many design and performance advantages to the chip designer beyond just power saving and timing-closure acceleration.
Where synchronous buses go wrong
Synchronous bus architectures use a global clock to synchronize data between the various storage and processing elements connected to the bus. Originally, clock-based synchronous interconnect simplified the task of the chip designer since data synchronization with a “master” clock was easy. But shrinking process nodes, more complex IP cores and higher on-chip data rates have complicated the job of the chip designer to the point where self-timed circuits now offer several, clear-cut advantages over their synchronous counterparts.
Among the problems associated with synchronous bus-based chips are the following:
- Clock skew and clock balancing – As systems become larger, an increasing amount of design effort is need to guarantee minimal skew in the arrival time of the clock signal at different parts of the chip. This results in longer design and verification cycles, especially for achieving timing closure.
- Worst-case design – In synchronous systems, performance is dictated by worst-case conditions. The clock period must be set to be long enough to accommodate the slowest operation even though the average delay of the operation is often much shorter.
- Processing and environmental variations – On-chip delay for an SOC can vary significantly due to processing variations, supply voltages, and operating temperatures. Synchronous designs must have their clock rates set to allow correct operation under expected PVT variations, which means slowing the global clock to achieve acceptable yield.
- IP core coupling – Since SOCs usually comprise a large number of complex IP cores, the task of coordinating the timing requirements of these cores, synchronized with a single clock across the entire chip, becomes very difficult.
- Excess power dissipation – Synchronous interconnect results in dynamic power dissipation during every clock transition, whether or not data is moving on that clock edge. Synchronous circuits also need additional clock drivers and buffers to limit clock skew, which also wastes power.
- Reduced EMI – In a synchronous design, all activity is locked into a very precise frequency. The result is that nearly all the energy is concentrated in very narrow spectral bands at the clock frequency and its harmonics, resulting in substantial electromagnetic interference (EMI) noise at these frequencies.
What self-timed interconnect can do for you
With a synchronous bus architecture, with different clock regions, every clock must be a derivative of a master clock (Figure 1).
Figure 1. Synchronous bus. (The different colors represent different clock regions.)
A self-timed interconnect, however, replaces the rigid bus hierarchy with an interconnect topology optimized for the performance goals of a particular design (Figure 2). Furthermore, each clock region is completely independent from the remainder of the chip.
Figure 2. Self-timed interconnect.
With self-timed interconnect, you don’t have to worry about minimizing clock skew and balancing clock signals between different cores on a chip, since there is no longer a need for a synchronized global clock edge at the various cores. In a self-timed circuit, the speed of the circuit can change dynamically, allowing chip performance to be governed by average-case and not worst-case delay. Due to their adaptive nature, self-timed circuits operate correctly under all variations and simply speed up or slow down as necessary. Adaptation also allows self-timed circuits to more easily scale to new process nodes than can their synchronous, clock-based counterparts.
Another benefit of self-timed interconnect is that it interfaces IP blocks without the difficulties associated with clock synchronization in a traditional bus-based system. By decoupling the IP cores on a chip, a self-timed interconnect fabric eliminates the design difficulty associated with long clock lines while also reducing clock jitter and noise problems. Each IP core, therefore, can run at its own “natural” frequency and the clocking rates of the individual cores do not have to be artificial derivatives of a “master clock.” The speed of data along the self-timed interconnect links is unaffected by the clocks of the endpoints since it is limited by wire speed and not by a global clock.
Self-timed circuits reduce synchronization power by not requiring additional clock drivers and buffers to reduce clock skew. They also automatically power-down unused components and do not waste power caused by spurious clock transitions that do not move data. This is because power in the interconnect, which is a significant portion of total chip power, is only dissipated when data is being transferred between endpoints and is not clock-dependent.
Self-timed interconnect is not dependent on a global clock frequency. This means that activity in a self-timed circuit is uncorrelated, resulting in a more distributed, random noise signature, lower peak noise amplitude and, thus, lower EMI.
With the correct design tools to generate the complex self-timed circuits, the biggest obstacle to implementing self-timed interconnect between IP cores on a chip – the need for designers to possess the expertise to generate the interconnect – is eliminated. Good tools let you define the specific self-timed interconnect topology you need to optimize the performance tradeoffs (power, area and speed) of your chip and reap the benefits of a self-timed interconnect fabric for your SOC.
In much the same way as multicore processor architectures have become an accepted way to halt the problems associated with runaway clock speeds, self-timed circuits are poised to mitigate the problems associated with the rising complexity and increasing performance requirements for data transmission within SOC designs. Giving designers the tools and means to make self-timed interconnect implementation a part of their SOC design flow will get them to stop butting their heads against the synchronous design wall.
By David Fritz.
David Fritz is the Chief Executive Officer at Silistix. Previously, David served as Vice President of Marketing and Business Development for ARC International developing the Asian market. David was previously founder and president of Production Languages Corp., a pioneer in configurable processor technology, where he was awarded a US patent covering fundamental processes related to configurable processors. Production Languages Corporation was subsequently acquired by ZiLOG in 1999, and he then became vice president of ZiLOG’s Advanced Cores R&D and ZiLOG’s Development Systems Group.
Go to the Silistix, Ltd. website to learn more.