July 11, 2005 -- Increasingly, designers begin with high-level models to partition and verify system functionality. Best-practicing teams reuse these models within testbenches to verify the resulting RTL designs. However, the cost and brittleness of these testbenches limit the design team's ability to explore and optimize their RTL. Sequential equivalence checking technology reduces testbench requirements, improving productivity and giving designers more opportunities to optimize their RTL designs. This article traces the development of a DES encryption design to demonstrate the advantages of doing RTL verification without testbenches.
Relentlessly growing design complexity and time to market pressures are putting a lot of stress on the SoC development flow. Increasingly, designers are starting with a high-level model of their design to verify overall functionality and algorithm correctness, as well as, to evaluate architectural choices and make early hardware-software partitioning decisions.
Often, a hardware task is modeled and verified at the untimed functional level using Matlab, C, or SystemC descriptions. The verified functionality, often with only anecdotal interface and performance specifications, is passed on to an RTL design engineer for implementation.
Best-practice development teams leverage their high-level models during RTL verification, incorporating them as reference models in simulation testbenches.
Unfortunately, the cost of developing a good testbench frequently exceeds the cost of the RTL design it is intended to verify! Because of this, testbenches are usually created only at the chip or major subsystem level, with these consequences:
- Thorough verification is delayed until enough design blocks are available to assemble within an available testbench.
- As a simulation-based approach, testbenches do not cover the entire design functionality, and require complex and long-running regression suites.
- At the testbench level of integration, random sequences, limited controllability and poor observability make detecting and then diagnosing bugs more difficult.
- With many resources tied up in verification, block interface and performance changes, especially those which impact the testbench, are discouraged.
The time and cost associated with verification prohibit a reasonable exploration of the RTL implementation space. Often times a poor architectural choice won't manifest itself until late in the design flow, and with advanced manufacturing processes, the risks greatly increase that a poor architectural choice will fail to achieve acceptable design closure during physical synthesis.
Many of the disadvantages of testbenches can be avoided by using equivalence checking techniques to verify individual block functionality early on in the design flow. RTL designs can be compared with corresponding high-level models. While the RTL designs and high-level models have matching functionality, they typically have differences in timing and interfacing.
Sequential equivalence checking technology, such as Calypto's SLEC, are targeted at targets verifying system-level to RTL and RTL to RTL functionality in the presence of timing and interface differences. Existing combinational equivalence checkers, used for RTL to gate verification, cannot handle the sequential differences.
To demonstrate the advantages of sequential equivalence checking, this paper traces the development of a DES encryption circuit, verifying RTL functionality using sequential equivalence checking instead of waiting for testbenches.
The Data Encryption Standard (DES) was originally adopted by the US National Institute of Standards and Technology (NIST) in 1977. The core DES algorithm takes a single 64-bit data value and a 64-bit key value and produces a 64-bit encrypted data result. Figure 1 shows an overview of the algorithm. After initial permutations of data and key, 16 rounds of computation encrypt the data, and a final permutation produces the encrypted result. The algorithm is symmetric, so the same key which encrypts the data can be used to decrypt and recover the original data block.
Figure 1. Algorithm/compute block.
At the system level, the core DES algorithm can be modeled in C. The model is untimed, and because the algorithm uses various bit widths internally, SystemC datatypes are used to unambiguously specify the arithmetic operations. For simulation, the design can be wrapped in a simple SystemC harness. Figure 2 shows part of this design.
Figure 2. Portion of C/SystemC DES model.
NIST publishes some test data which can be used to validate the algorithm.
The core DES algorithm is used within a larger module to encrypt entire message streams. Additional modes of operation, such as electronic codebook (ECB) or cipher block chaining (CBC), are layered on-top of the algorithm to process full message streams. Because the original encryption algorithm is becoming vulnerable to brute-force attacks, triple-DES combinations are recommended for better security.
Given the costs and time required for testbench development, it is likely that a complete testbench will only be developed for the integrated message stream module. Any core DES implementation would be verified within this larger context. Figure 3 shows an example testbench setup. The testbench contains a reference model including the core C/SystemC DES model along with a high-level model of the message encryption streamer. This is compared with the corresponding implementation modules, wrapped with both input and output transactors to account for the differences in timing and interface protocols.
Figure 3. Extended testbench/DUT.
Simulation of the high-level portion of the testbench gives the following results:
Because of the level of integration required for the testbench, any RTL implementation of the core DES algorithm will not be thoroughly verified until all other blocks are available.
There are a variety of power and performance options possible for the DES RTL design. Because of the time required to develop a testbench, implementation timing and interfaces are chosen early, yet functionality is verified late. If functional verification requirements could be removed from the testbench and shifted earlier in the design process, more optimal RTL implementations could be explored.
Testbench not required
Sequential equivalence checking, embodied in tools such as SLEC, enables functional verification of RTL designs to be done without a testbench.
Two designs are compared for functional equivalence. Typically, the specification design is an abstract system level or register transfer (RTL) model, while the implementation design is a more concrete RTL design. Equivalence checkers provide:
- Functional verification without test vectors
- Corner case bug detection and counterexamples when differences are found
However, unlike existing combinational tools, equivalence checkers with sequential and data abstraction capabilities can verify designs with common functionality despite differences in:
- State representation
- Interface timing and protocols
- Resource scheduling and allocation
Sequential analysis capabilities and the ability to bridge abstraction gaps enable these equivalence checkers to be used far earlier in the design process, resulting in:
- Early detection of system-level and RTL functional bugs, without the need for block-level testbenches.
- Complete functional verification of the RTL with respect to a system-level or other RTL reference models.
Sequential design variations
Throughout the RTL design process, sequential equivalence checkers can verify all design refinements which do not change the functional behavior of the design. Some common RTL design refinements include:
- Pipelining - pipelines are often added to a design to meet throughput requirements. Pipeline refinements include inserting or modifying the number of pipeline stages in data and control paths. A common scenario might be adding a pipeline stage to a key datapath, increasing the implementation latency by one.
- Resource Scheduling - resources are allocated and scheduled to meet cost and performance targets. A single-cycle computation in a design specification may become multi-cycled in the implementation, changing the sharing and timing of required resources.
- Register Retiming - register retiming is a common RTL optimization used to balance the amount of logic between flip-flops. Although the state of the two RTL models is different, the interface behavior of the two designs remains identical.
- Interface Refinements - as designs are refined, block interfaces are changed from abstract data types, such as integers, to bit-accurate busses. While preserving the core functionality, interface protocols, timings, and data sizes may be changed, for example, from full parallel to byte-serial or bit-serial interfaces.
- State Recoding - state machine encodings may be changed to optimize implementation area, timing, and/or dynamic power. A typical recoding might change from a binary-encoded machine to a one-hot implementation.
- Additional Operating Modes - additional modes of operation may be present in an RTL implementation. Scan path, for example, is often added to an RTL implementation. High-level behaviors can be verified by constraining the RTL so that the additional modes of operation are disabled.
Serial DES implementation
To implement the core DES algorithm, the first RTL design is a straightforward, minimal hardware, serial-compute approach, written in Verilog. Figure 4 shows a schematic and timing diagram for the implementation. Each compute round takes one cycle.
Figure 4. Serial DES schematic and timing diagram.
To functionally verify the RTL implementation with the system-level description, the reset state, throughput, and output latency of the two designs must be specified. Since the system-level model is untimed and stateless, there is no reset needed and the circuit can be considered combinational. The RTL design accepts new inputs every 16 cycles and outputs a result every 16 cycles, so the throughput and output latency are both 16 cycles. Running an equivalence check with this information results in the following:
When a functional difference is found, a counterexample is generated which demonstrates the difference. Typically, the counterexample will be the shortest sequence from reset of the two designs to a state which demonstrates an output difference. Figure 5 shows the two waveform sequences.
Figure 5. Counterexample VCD's.
There is only one bit difference in the output. Considering all the exclusive-ors and permutations inside the compute modules, it's likely the bug is in the final output stage. The NIST specification specifies all the bit locations starting with 1 and numbering left to right, while the Verilog implementation chose the number from 0 going right to left. One of the permutations ended up off by 1. Figure 6 shows the source of the bug.
Figure 6. DES design correction.
Fixing the error and rerunning SLEC achieves full functional equivalence:
Without equivalence checking, functional verification must often be delayed until all RTL blocks are available. Rerunning the sample message with the initial design yields:
Diagnosis of the difference, if detected, is more difficult because the differences must be traced through more layers of logic before the bug is reached. Additionally, the stimulus, which often is random, will contain extraneous vectors which do not contribute to the design difference.
Assertions, such as OVL or formal property specifications, can be added to the designs to help observe and localize errors. Unlike the high-level model, these assertions need to be checked and rewritten with each implementation change. They are incomplete and introduce an additional error source.
Alternative Sequential Designs
In addition to the serial implementation, three other alternatives were implemented as shown in figures 7a, 7b, and 7c.
The first design, in figure 7a, inserts a pipeline stage every 4 compute rounds. The output latency remains 16 cycles. The compute hardware is replicated 4 times, but new encrypton inputs (throughput) now occur every 4 cycles.
Figure 7a. Quarter pipelined design.
The second design, in figure 7b, is a fully pipelined implementation. The compute rounds are fully unrolled, but new encryption inputs occur every cycle.
Figure 7b. Fully pipelined design.
The third design, in figure 7c, is super-pipelined, with a pipeline stage inserted in each compute round. The output latency increases to 32 cycles. Registers are added, but since there is less work done each cycle, either a higher clock frequency could be used, or the power could be reduced at the same clock frequency.
Figure 7c. Double pipelined design.
In all cases, sequential equivalence checking demonstrates functional equivalence with the original high-level design. Since the designs all start in the same reset state, throughput and output latency values are the only setup changes needed between verification runs.
Using SLEC, four different DES implementations were designed and proven functionally equivalent without testbenches. The implementations have different throughputs and areas as shown in figure 9.
Figure 8. RTL design metrics.
Sequential equivalence checking tools such as SLEC enable designers to verify their blocks earlier in the design process without having to wait for testbenches or other blocks to be ready for verification.
Full functional verification can be completed without test vectors. With a testbench approach, even after the initial testbenches are developed, stimulus generation must be continually analyzed and extended to uncover as many bugs as possible.
When SLEC uncovers bugs, counterexamples are typically the shortest sequences possible to excite the bugs and propagate them to the outputs. Conversely, random-stimulus testbenches have long, often extraneous, sequences and require assertions to improve the chances of observing bugs.
Although this paper focused on block-level equivalence checking, sequential equivalence checkers can also be used to improve functional testing at the integration level as well. At whichever level, when testbenches are used, their functional testing burdens are greatly reduced when used in combination with sequential equivalence checkers.
Sequential equivalence checkers, such as Calypto's SLEC, put early functional verification back in the hands of the designer. This capability enables designers to explore more RTL variations during design. The flexibility to evaluate and productively verify sequential changes throughput the design process improves the quality of the RTL signed off for subsequent physical implementation.
Go to the Calypto Design Systems, Inc. website to learn more.