February 8, 2008 -- Today’s increasing design complexity has placed a heavy burden on design verification teams in terms of the amount of time spent on verification, as well as the huge file sizes that result from trying to gain enough visibility into the design for efficient debug and analysis. The often-quoted industry statistics — up to 70% of the actual total design cycle effort is spent in verification with nearly 40% of all designs still requiring at least one respin — point to the pain in verification, but where is the solution?
Companies designing complex systems and chips need an efficient means of identifying and fixing functional problems within the design in a way that meets the stringent schedule demands of market pressures. Unfortunately, even after the problems are identified, a huge amount of data is required for analysis and debug to trace the root cause, and ultimately fix the problem. With the acquisition and storage of all this data comes a whole new set of challenges.
Traditional methodologies are failing to keep pace with the increase in verification demands, both in terms of speed and larger dump files. Some solutions, such as today’s hardware accelerators, are often incompatible with software simulators and can be difficult to use. But, what if we could speed up simulation without expanding the server farm and enhance visibility without generating exponentially larger data files? The answer may be found through a combination of new hybrid simulation and visibility enhancement techniques, which can radically accelerate the process of finding, isolating, and debugging design problems while maintaining manageable data file sizes.
Verification bottlenecks
A critical aspect of verification is the actual simulation of test vectors on the register transfer or gate-level model, as well as the debug of any problems discovered by the simulation. Both simulation and debug have to be considered hand-in-hand, and neither is “complete” without the other. Unfortunately, today’s large designs are stretching both of these tasks to the limit.
Simulation performance, typically determined by the time it takes to run the simulation, has always been a key metric for engineering teams. Over the years, software simulators have continuously been optimized to address performance requirements. In addition, hardware-based verification engines have emerged, whereby the design model is transformed and mapped to the hardware so it can run much faster.
Verification complexity, however, is growing even faster than design size simply because there is more physical real estate and much more complex functionality embedded on chip to verify. Since simulators use general-purpose CPUs whose simulation performance is limited by a phenomenon known as "memory-cache miss" — a failure to access data in cache memory — the performance of software-only simulation approaches is limited. Transforming a design to run on a hardware emulator or accelerator is difficult to begin with, and is further compounded as the design gets bigger. In addition, critical components of the design — most of the testbench, for example — typically cannot be effectively mapped into hardware. These un-mappable elements are run on a software simulator that works in tandem with the hardware, and, consequently, is limited by the software simulator performance, as well as the speed of the “backplane” interface between the hardware and the software simulator.
While simulation speed is important, it is not the only aspect of the verification phase. The verification engine can detect failures and is designed to do that as fast as it can. When it detects a failure, however, an engineer has to find the root-cause of the failure. To identify the root-cause (the bug) of a functional discrepancy in a chip, engineers read out (the process is often called "dumping") the internal values of signals in the chip into a format such as Novas’ Fast Signal Database (FSDB). Employing debug tools they can then view the values over time, correlate the signal values to the chip logic, and trace back the unexpected behavior to its origin.
The visibility gained during data dumping is essential for the process of debug. Large designs may have tens of millions of gates and may simulate for tens of millions of cycles before a discrepancy is found. The amount of data associated with dumping every signal value for every cycle of simulation for such designs can easily require hundreds (or more) of terabytes of storage. The dumping of such large amounts of data also consumes significantly more simulation time and slows overall performance. As a result, designers working on large designs are caught in a vicious cycle where more verification is needed because the design is large and complex. This further need for verification then becomes prohibitively expensive and ever more difficult because large designs require long simulation runs that produce enormous amounts of data.
 |
Figure 1. Dumping every signal change for an entire simulation run to gain full visibility consumes large amounts of disk space and simulation time. |
Clearly, both verification speed and debug data size are adversely impacted by large designs. Added to this challenge, are the usually hundreds, or even thousands of test sets, with the net result of longer verification cycles. Not only are development schedules in jeopardy, but even more critical, is the determination of how much verification is feasible given ever-shrinking market windows. This directly influences how much of the functional testing is covered before tape-out.
Addressing simulation performance
Software-only simulation is a compute-hungry process. With exponentially increasing verification demands, the requisite compute resources must also grow exponentially. However, the historical CPU speed roadmap demonstrates incremental growth, which cannot track with the exponential growth in verification demands. Worse yet, simulation speed is also limited by memory cache misses and becomes the dominant factor in overall performance as design size grows. As a result, simulation is often partitioned into multiple testbenches that are executed on multiple host computers – the compute farm. The farm’s performance is limited by the performance of those same CPU’s, so the size of the farm, the number of simulation licenses, and the attendant costs must grow exponentially. In other words, conventional simulation is now in a state of diminishing returns. Consequently, some verification teams turn to hardware acceleration, where they encounter a new set of challenges.
The biggest problem of conventional FPGA-based hardware accelerator technology lies in its fundamental lack of compatibility with software simulators, particularly as it relates to testbenches and assertions. The accelerator works well with synthesizable logic, but not with behavioral logic. A conventional hardware accelerator achieves higher performance by actually implementing the design that is partitioned and mapped to either FPGA- or processor-based hardware resources. However, testbenches rely heavily on behavioral logic, performing behavioral verification through complex assertions and memory checks. Such checks are not always mappable onto fixed hardware, so they must reside outside the accelerator, and communicate with it via vector-level data. The consequential increase in data volume can reduce the accelerator’s performance gains. It also limits the verification team’s visibility into functional behavior. As a result, much of the time gained in acceleration is lost in debug. Therefore, vector-level checks are often curtailed, and much of the power of testbenches and assertions is lost. This fundamental incompatibility is also painfully apparent in the compile and set-up time necessary to interface the software simulator to the accelerator, which can be in the order of weeks to months.
Processor-based approaches mitigate the partitioning and compilation difficulties suffered by FPGA-based acceleration. However, the fundamental problem remains – the incompatibility with software simulators. Like a conventional compute farm, the accelerator’s cost increases with its processing capacity. While the processing capacity is initially impressive, its growth potential is just as incremental as that of a compute farm. And, there is still no guarantee that the desired verification coverage can be achieved. The performance growth potential of accelerators is also limited physically – the accelerator can deploy no more hardware than is permitted by size and thermal constraints. Consequently, accelerators suffer a capacity crunch at around 100M gates. The next evolutionary steps might be to consider accelerator clusters, and then potentially accelerator farms, and ultimately the same diminishing returns faced by simulation farms. Because of these bottlenecks, accelerators are generally restricted to post-RTL regression testing. This approach detects bugs late in the design flow, essentially de-valuing the early bug detection benefit of simulation.
Dealing with large files
The problem of large data sets for debug is getting worse, but is not new. Chip design and verification teams use creative approaches to reduce the size of data collected from simulation and to reduce the amount of time spent in simulation. All the approaches employed to that end share a common theme: they all dump a small portion of the data available at the price of reducing visibility into design behavior.
One approach used by engineers is to decide in advance which portions of the chip to target for debug and to dump verification data only from that portion of the chip. If the unexpected behavior and its root cause are enclosed within that region of the chip, the selective dump can save a lot of time and space. If, however, the bug is outside of the area visible, the effort is rather useless and must be repeated. Another approach is to do random sampling of signals in the entire chip with the hope that the visibility is gained in the right places. Again, should the bug be some place else, that effort too is wasted and must be repeated and refined.
 |
Figure 2. Current approaches may require several iterations of simulation. |
Yet, others have tackled the data management problem by dumping the signals from the entire design, but for only a limited slice of the simulation time. Unlike the other two approaches, the entire chip is visible, but only for a short simulation period. If the bug occurs outside the specific region of time, the effort will have to be repeated. Many verification teams have built elaborate environments that use heuristics to “best guess” the minimal data required. These ad hoc approaches produce only a subset of the required data across time and space domains. More importantly, ad hoc approaches introduce a level of uncertainty based on intuition and design know-how, which leads to unpredictability in verification coverage, costs and results.
Hybrid simulation – boosting your simulator
Clearly, the pre-requisites of a solution aimed at delivering faster, more efficient verification are that it must:
- Eliminate the need to implement the design.
- Boost simulation speed by deploying a processing technology that eliminates both the performance limitations of standard CPU architectures and the cache memory miss problem.
- Be completely transparent to the simulation software in order that all attributes of software simulation may be used without limitation.
- Scale affordably with design size.
|
Hybrid Simulation performs digital logic simulation by combining hardware and software techniques for simultaneous execution on two main processors maximizing overall execution performance. Hybrid simulation boosts software simulator performance by an order of magnitude, or more. This is achieved with a VLIW coprocessor architecture that employs multiple sub-processing elements accessing gigabytes of memory to virtually eliminate the memory-cache miss problem. It also preserves current debug methods and simulator use models so that large designs can be verified from multi-block to system-level with existing simulation environments.
In physical terms, hybrid simulation can be achieved by plugging a simple coprocessor card into your desktop or server computer. Since hybrid simulation is essentially still a simulation technology, the design is compiled into instructions for execution in memory supporting not only synthesizable, but also behavioral acceleration. Long setup and compile times are thereby eliminated. Moreover, this technology can accommodate huge design sizes on a single coprocessor card since design capacity is now determined by the amount of standard off-the-shelf memory on-board, and not by the logic gate capacity of several FPGAs. The cost savings compared to traditional acceleration technologies is tremendous.
Visibility enhancement – getting more for less
Just like astronomers who extrapolate the properties of the universe based on a small sample of observation data, verification teams must now consider the value of a more systematic, algorithmic approach to providing full signal visibility while minimizing file size. Such a solution involves several steps working hand-in-hand to provide chip full visibility at a lower cost.
First is the pre-analysis step to determine the minimal Essential Signal (ES) data from simulation that is guaranteed to provide full visibility later. A signal in the chip is added to the ES list because it is determined by the algorithm that this signal is essential and can’t be deduced or inferred later from other signals. On the other hand, signals that are considered non-essential and could be deduced are left off the list. During simulation, only values for signals that are on the ES list are dumped. The simulation is much faster and the size of the data set is much smaller.
 |
Figure 3. Essential signal analysis combined with other visibility enhancement techniques reduces the size of the data set, shortens the simulation time and provides full visibility. |
The next step in the process is to expand the dumped data for chip debug. Using the essential signal information, Data Expansion (DE) algorithms calculate the values of chip signals for which data was not collected. Obviously, it would not be practical to expand the ES data for the entire chip all at once, as the size of the data and the amount of time required would negate the savings initially gained by the ES step. Instead, the DE algorithm is optimized to act and execute locally, in real-time, during debug operations. With this approach, there is virtually no impact on user interactivity.
 |
Figure 4. Hybrid simulation and visibility enhancement. |
By applying the two technologies of hybrid simulation and visibility enhancement to the verification of large designs, the combined benefits are:
- Faster simulation with smaller, more manageable dump files;
- Signal data expanded on-demand to provide full design visibility during debug;
|
This hybrid approach reduces the need to replace long simulation runs with several shorter runs and removes the associated overhead. For instance, the task of dividing an original long run is usually complicated by the need to ensure that the start conditions of each sub-divided shorter run is initiated by the terminal conditions of the previous run in the sequence (or alternatively, obsolete initial conditions are assumed). This is no longer necessary.
In closing, it is accurate to say that simulation speed and debug-driven visibility requirements have traditionally been opposing forces as far as verification efficiency is concerned. Fortunately, this conundrum can now be overcome. A hybrid simulation-based approach coupled with advanced visibility enhancement techniques offers the scalable performance and data management benefits needed to verify and debug today’s large designs, and the even larger, more complex designs of the future.
By Bindesh Patel
and
Raj Kumar Mathur
Bindesh Patel is Technical Marketing Manager at Novas Software, Inc. where he is responsible for defining future products. His previous experience includes design and applications engineering at LSI Logic, Zycad, and Atrenta. Bindesh has a degree in computer engineering from the University of California, Santa Cruz.
Raj Mathur is Director, Technical Marketing for Liga Systems. His broad expertise stems from more than 17 years of technical and strategic marketing positions, and engineering roles in EDA. Prior to joining Liga Systems in April 2005, Raj served as senior technical marketing manager at Altera Corp. Prior to Altera, he held technical marketing and applications engineering positions Stanford Research Systems and Analog Devices. He holds a bachelor of science degree in electrical engineering from the Engineering Academy of Denmark.
Go to the Novas Software, Inc. website to learn more. |