November 2, 2009 -- Because of shrinking feature sizes and the decreasing faithfulness of the manufacturing process to design features, process variation has been one of the constant themes of IC designers as new process nodes are introduced. This article reviews the problem and proposes a "probabilistic" approach as a solution to analysis and management of variability.
Process variation may be new in the digital design framework, but it has long been the principle worry of analog designers, known as mismatch. Regardless of its causes, variation can be global, where every chip from a lot can be effected in the same way, or quasi-global, where wafers or dies may show different electrical characteristics. Such global variation has been relatively easy to model, especially if process modeling people have been able to characterize it with a single "sigma" parameter. Timing analyzers need to analyze a design under both worst case and best case timing conditions. Usually, two extreme conditions of "sigma" sufficed to provide these two conditions. With the new process nodes , however, not only it is necessary to have several variational parameters, but individual device characteristics on a chip could differ independently, known as on-chip variation (OCV).
At the device level, process variation is modeled by a set of "random" parameters which modify the geometric parameters of the device and its model equations. Depending on the nature of the variation, these may effect all devices on the chip, or certain types of devices, or they may be specific to each instance of the device. Because of this OCV, it is important that correlation between various variational parameters be accounted for. For example, the same physical effect is likely to change the length and width of a device simultaneously. If this is ignored, we may be looking at very pessimistic variation scenarios.
There are some statistical methods which try to capture correlations and reduce them to a few independent variables. Some fabs use use parameters related to device geometries and model parameters. The number of such parameters may range from a few to tens, depending on the device. If one considers global and local variations, the number of variables quickly can get out of hand. Variation is statistically modeled by a distribution function, usually Gaussian. Given the value of a variational parameter, and a delta-interval around it, one can calculate the probability that the device/ process will be in that interval and will have specific electrical characteristics for that condition. Instead of having a specific value for a performance parameter such as delay, it will have a range of values with specific probabilities depending on the variational parameters.
To analyze the performance of digital designs, two approaches have emerged: statistical static timing analysis (SSTA) and multi-corner static timing analysis. SSTA tries to generate a probability distribution for a signal path from delay distributions of individual standard cells in the path. This is usually implemented by using variation-aware libraries, which contain a sampling of cell timing at various discrete values of the variational parameters. Because of the dependence on a discrete library, this approach is practically limited to only few global systematic variables, with a very coarse sampling of the variation space. Since it is a distribution-based analysis, it depends on the shape of primary variables. It is generally assumed these are Gaussian, but there is no reason to assume this. In fact, most process models may not even be centered. In addition, it becomes difficult to do input slope dependent-delay calculation. Assumptions and simplifications could quickly make this approach drift from the goal. Since it has the probability distributions, one can report a confidence level about a timing violation. Implicit in this approach is the assumption that any path has a finite probability of being critical.
Multi-corner timing analysis is kind of Monte-Carlo in disguise, and has been gaining popularity as a brute-force method. Someone who knows what he/she is doing decides on a set of extreme corner conditions. These are instances of process variables, and cell libraries are generated for these conditions. Timing analysis is performed using these libraries. The number of libraries may be 10 to 20 or more. Naturally, this approach is still limited to few global variational parameters. It is also difficult to ascertain the reliability of timing analysis, in terms of yield. The only way to increase the confidence level is by building more libraries and repeating the analysis with them. This process increases verification and analysis time, but does not guarantee coverage.
What we propose instead, is probabilistic timing analysis. It can address both global and local variations, and we can have a lower confidence limit on timing analysis results which can be controlled by the designer. This turns the problem upside down. Since timing analysis is interested in worst-case and best-case timing conditions of a chip, we ask the same question for individual cells making up a design. We want to find the best/ worst case timing condition of a cell. While doing this, we need to limit our search and design space. For example, the interval (-1,1) covers 68.268% of the area under the normal bell curve distribution. If we search this interval for sigma with maximum inverter delay and later use that value, we can only say that the probability that this value is the maximum delay is 0.68268. For the interval (-2,2), it is 0.95448. If we had searched a wider interval, our confidence level would go up even higher. If there were two process variables, and if we had searched (-1,1)(-1,1), our confidence would drop to 0.68268X0.68268, or 0.46605.
Although lower confidence limits are set by the initial search intervals, the actual probabilities may be much higher. If the maximum had occurred at extreme corners, one could expect that as the search interval expands, we might see new maximum conditions. On the other hand, if the maximum had occurred at a point away from the corners, most likely this is the absolute value. Typically, only one of the parameters, the one most tightly coupled to threshold voltage, for example, takes up the extreme values, and most others take intermediate values. In these cases it is effectively the same as if we searched the interval (-inf, +inf). This behavior is consistent with the traditional approach, where a single parameter is used to control best and worst timing corners.
One of the conceptual problem with our probabilistic approach is that each cell may have different sets of global variables, which contradicts the definition of such variables. A flip-flop may have different global variables than an inverter. Even inverters of different strengths may have different sets. They are typically close to each other, however. There may be some pessimism associated with this condition.
It is easy to establish confidence levels on critical path timing. If for example, global variables have a confidence level of 0.9, and local random variables have 0.95, the confidence level for a path of 10 cells is 0.9X0.95*10= 0.5349. Since local variations of each gate are independent of each other, intersection rule of probability should be followed, probability of having 0.95 coverage for two independent cells is 0.95X0.95, for three is 0.95X0.95X0.95, etc. In reality though, minimum and maximum conditions for local variations are clustered around the center, away from the interval end points, which brings confidence level to 0.9, confidence level for global variations. Alternatively, one can expand the search interval to cover more process space. Also keep in mind, the variation range of "real" random variables is much narrower than (-inf, +inf).
Library Technologies has implemented this probabilistic approach in its YieldOpt product. The user defines the confidence levels her/she would like to see, and identifies global and local random parameters for each device. Confidence levels are converted to variation intervals assuming a normal distribution. This is the only place we make an assumption about the shape of the distributions. As a result, our approach has a weak dependence on probability distribution. In the probabilistic approach, we view timing characteristics of a cell as functions of random process variables. For each variable, we define a search interval. The variables could be global and local random variables. Maximum and minimum timing conditions for each cell are determined for typical loads and input slopes. Two libraries are generated for each condition. Normally, we couple worst process condition with high temperature, low voltage; and best process condition with low temperature and high voltage.
Timing analysis flow is the traditional flow, and depending on the number of random variables, searching for extreme conditions becomes a very demanding task. We have developed methods and tools which can achieve this task in a deterministic way. The YieldOpt product determines appropriate process conditions for each cell and passes it over for characterization and library generation. Determining worst/best case conditions may add about 0.1X to 2X overhead on top of characterization.
By Mehmet Cirit.
Mehmet Cirit is the founder and president of Library Technologies, Inc. (LTI). LTI develops and markets tools for design re-optimization for speed and low power with on-the-fly cell library creation, cell/ memory characterization and modeling, circuit optimization, and process variation analysis tools such as YieldOpt.
Go to the Library Technologies, Inc. website to learn more.