September 1, 2005 -- Design teams are continually seeking ways of maintaining their competitive edge and improving profitability. The search for solutions that will provide faster time to market, lower cost and higher performance for successive generations of products is never-ending. While most technologies represent incremental advances over previous generations, occasionally a disruptive technology is developed, providing a quantum leap over its predecessors. Adopting such technologies early can often help companies gain a commanding market lead over competitors who are slower to realize the advantages offered.
FPGAs have been popular for designs where fast turnaround time and low NRE are desirable. There are significant costs associated with using FPGAs for large designs, however. FPGAs suffer from high per-unit cost, low performance, low levels of logic integration and high power consumption. Low performance also causes other problems. For instance, significant additional time may be required to hand-optimize RTL code to achieve the same level of performance as faster technologies such as Platform ASICs.
In contrast, Platform ASICs have considerably lower unit costs, offer higher performance and logic integration, and use less power. Plus, as the need for hand optimization of RTL is avoided, they provide a critical time to market improvement.
Cost
There are several fundamental technology limitations that affect the minimum per-unit cost of FPGAs as table 1 clearly shows. The programmable logic and routing found in an FPGA results in die sizes that are very large, reducing yield and significantly increasing manufacturing cost. Related additional package costs, external non-volatile programming devices, and extra PCB routing also add to the overall FPGA per-unit cost.
 |
Figure 1. Cost comparison |
Compared to FPGAs, Platform ASICs have a low per-unit cost. They are typically based on a fine-grained architecture that implements logic with ASIC-like efficiency, but without the high NRE costs of an ASIC
As a production technology, Platform ASICs address the key issues and provide designers with an excellent capability for medium volume production devices - typically in the range of 1,000 to 100,000 devices.
Performance
Platform ASICs are typically at least three times faster than an FPGA based on an equivalent process technology. Their advantage is so significant that they still outperform FPGAs implemented in a process technology with a smaller transistor size. For instance, as shown in figure 2, the RapidChip® Foundation™ family at 0.18 microns outperforms the industry's most advanced 90-nm FPGAs.
 |
Figure 2. Performance comparison. |
The Platform ASIC vs. FPGA performance gap is due to the limitations of an FPGA's logic cell and interconnect architecture. In contrast to the slow performance of an FPGA, empirical evidence shows that platform ASICs achieve around 80 percent of cell-based ASIC performance.
FPGA vendors claim their devices run at high frequencies. This may be true, but it is a misleading indicator of real-world performance. The reality is that FPGAs are very low performance in comparison to many Platform ASICs. For example, the 0.11-micron RapidChip Xtreme Platform ASIC family can achieve frequencies over 250MHz with more than 25 levels of logic. In an equivalent FPGA, only around 5-8 levels of logic can reasonably be expected at the same frequency.
Interconnect delay
The main reason for the performance gap between Platform ASICs and FPGAs is interconnect delay. Unlike the length-optimized metal routing of some Platform ASICs, FPGA routing is a combination of short, medium, and long fixed wire lengths connected via pass transistors. These active routing elements add significant delay to signal paths. This problem is exacerbated as design size increases. Large FPGA designs can have routing delays comprising up to 80 percent of the total path delay.
The complexity of FPGA routing makes it difficult to accurately predict timing until after routing is completed. The distance between two logic elements in an FPGA is not a good predictor of the timing delay between them. There are many possible routes between two points on an FPGA, and the timing variance between different paths can be considerable. In addition to the routing topology, the timing depends on the type of wire used and the number of buffers and pass transistors encountered. This explains the large discrepancy experienced between synthesis and post-layout/place-and-route results.
Unlike FPGA routing, Platform ASICs often use an optimally buffered interconnect scheme, so delays increase linearly and predictably with increasing wire length. For example, LSI's RapidChip technology has four to five routing layers and can route over logic. Therefore, it is possible to achieve highly optimal timing delays.
These desirable routing characteristics allow physical synthesis tools, such as Synplicity's Amplify, to accurately predict the timing during placement. There are no post-route timing surprises, and no need to iterate in the synthesis - place-and-route timing loop that is commonly faced by FPGA designers.
Fabric granularity
Another factor contributing to the Platform ASIC's 3x performance advantage is logic granularity. SRAM-based FPGA architectures have a fundamental limitation: A LUT-based architecture or some variation is most commonly used to implement FPGA logic functions.
Most RTL design code does not map efficiently into a coarse-grained LUT-based structure. This reduces both performance and density and ultimately increases unit cost.
A coarse-grained logic fabric degrades performance for a number of reasons. Timing delays suffer from large incremental steps. Even when adding a single extra input to a logic term, additional LUTs and interconnections may be required, causing a significant jump in signal delay. Timing optimization is difficult because of this abrupt, step-function behavior. It is not possible to tune FPGA paths in small timing increments using different combinations of drive strengths and logic.
The logic structure used for Platform ASICs is often constructed from very small base-units consisting of several transistor pairs. One or more base-units may be used to construct a single library cell, which is often similar to a standard ASIC cell. There may be up to 500 unique types of logic cells of varying functionality and drive strength. This ensures optimal resource usage for the required level of performance.
During synthesis, RTL is mapped into the logic cells. These cells are physically implemented by adding metal routing layers to a pre-diffused base die. The metal routing connects the base-units necessary to form library cells. It also connects the library cells together to efficiently implement user logic.
 |
Figure 3. RapidChip Platform ASIC vs. FPGA path optimization. |
Ease of timing closure
In a design with demanding timing (performance) requirements, it is much easier to close timing with Platform ASICs than FPGAs. The granular, flexible nature of most Platform ASIC architectures enables design tools to perform timing optimization automatically and effectively. A fine-grained fabric has many timing benefits, as shown in Figure 3:
- A broad range of available cells ensures optimum cell delay for each function mapped
- Optimal path tuning is possible, including:
- drive strength adjustment
- path buffering
- logic flattening and restructuring
|
In contrast, an FPGA's coarse-grained architecture limits these types of optimizations. In an FPGA, timing can change dramatically between synthesis and place and route, further complicating timing closure iterations. In some cases, meeting the system timing requirements is not possible without hand-optimizing the RTL. Techniques such as bus widening and pipelining may be required to meet performance goals.
Architectural modifications such as these can be very time consuming, and can add months to a project schedule where significant optimization is required. Even with these modifications, there is no guarantee that timing requirements will be met. In contrast, Platform ASIC architectures and tools allow maximum performance to be obtained with minimum effort. The higher-performance routing and logic fabric minimize the need for RTL optimization, reducing implementation time and complexity.
Another issue arises with frequent changes to the RTL such as bug fixes, feature enhancements, etc. Even small changes can have significant effects on the FPGA implementation, changing the critical paths and in some cases reducing the effectiveness of any previous optimization work.
Logic integration
Large designs often won't fit into a single FPGA. Due to the significant overhead of programmable routing, even the largest FPGA provides relatively low logic integration. For example, over 50 percent of an FPGA's die area may be occupied by fixed routing structures. Since Platform ASIC technology routes signals over the top of logic cells, it does not require dedicated routing channels that waste large portions of the die.
To work around the logic integration limitation, time consuming and complex partitioning tasks are required to fit a design into multiple FPGAs. In contrast, all but the largest designs will fit on a single Platform ASIC. Compared to FPGAs, Platform ASICs provide extremely high logic densities. As a result, the total number of available gates on a single Platform ASIC is many times higher than the largest FPGA.
 |
Figure 4. FPGAs waste internal LUT resources. |
FPGAs also waste a lot of internal LUT resources, as can be seen in Figure 4. RTL design code maps inefficiently into coarse-grained architectures, so only a proportion of the internal LUT resources will ever be utilized. Even if all the LUTs within an FPGA are used, significant amounts of the total logic available can remain unused. By contrast, the logic and routing in most Platform ASICs is not fixed. Resources are only used when needed, increasing logic integration levels and decreasing power consumption.
Power consumption
A further benefit of Platform ASIC's fine-grained fabric and point-to-point routing is highly efficient power consumption. Compared to Platform ASICs, FPGAs typically dissipate many times more power. There are several reasons for this discrepancy:
- The routing capacitance of an FPGA is typically many-times larger than a Platform ASIC. Compared to most Platform ASICs, FPGAs contain much longer routing tracks with significant parasitic capacitance, and the switching activity on these long routing tracks causes significant power dissipation.
- Unlike most Platform ASICs, FPGAs have fixed clock routing structures - all registers within a clock domain are connected to a clock tree, even if they are not used. When the clock is toggling, power is wasted in clock segments which connect unused registers.
- FPGAs require a significant number of transistors to configure LUTs and programmable routing. In a typical FPGA design, large portions of these transistors are not actually used; however they still draw leakage current.
|
 |
Figure 5. Power consumption (left to right: ASIC, RapidChip, FPGA.) |
Summary
Platform ASICs offer significant benefits in most areas when compared with FPGAs. FPGAs play an important role for prototyping and very low volume production. For medium volume production, however, Platform ASICs such as the RapidChip Platform ASIC from LSI Logic provide the best-in-class implementation vehicle.
High performance, cost-optimized and production ready, Platform ASICs ensure project goals are rapidly achieved, even when pushing the performance envelope. Superior performance alleviates the need for hand-optimized RTL, which can often significantly reduce front-end design complexity and effort. Platform ASICs enable design teams to reduce their overall time to market and therefore achieve rapid success in the marketplace.
By Greg Martin, RapidChip Technical Marketing, LSI Logic Corp.
Go to the LSI Corp. website to learn more. |