May 20, 2010 -- Clock trees are an integral part of any chip, and making them do what they should be doing is far less the expectation when designers try to build clock trees. Traditionally, clock trees are built to distribute clocks from the clock generator or clock port to the flip-flops or sink elements in the most efficient way; i.e., with minimal delay and without degrading the clock characteristics. But now, the demand for clock trees is entirely different. They not only need to efficiently deliver the clock from source to sink points, but also use a fewer number of cells to do it, use lower power and also aid the datapath in meeting timing. Add to this the need to meet basic requirements of latency and skew in multiple operating corners and across multiple operating modes.
Since there's a lot of buzz around skew-based timing closure, I'm going to focus on using clock tree synthesis (CTS) to help meet datapath timing by tuning the arrival or required time of the clock tree. Some companies try to claim this as clock optimization or CTS-based optimization, which I will refer to as "useful" skew. While most of the commercial place-and-route tools in the market support useful skew, they only do it after the fact; i.e., after the clock tree is built. More recently, some companies have claimed to do this during CTS, which can enable post-CTS timing closure. I started looking more into this and realized that this is still a half-hearted solution.
If you're trying to make use of useful skew to achieve timing closure, you need to do this during the entire flow. Claiming to do it during CTS or at any other step is a half-baked pie, since you are not taking advantage of all the engines to do this. Also, if all the engines in your tool are not aware of the changes you're attempting during CTS, it may back-fire and give you complete garbage results. If you're planning to allow useful skew in the design, you should have the freedom of doing it right from the placement step. The placement and optimization engine before CTS should be aware of this, and should automatically come up with necessary budgets to allocate for useful skew on paths, for which timing is hard to meet, even with an ideal clock. Doing this takes the burden off the placer and optimizer to unnecessarily keep cranking on some hard-to-meet paths, which will never achieve timing closure without help from the clock tree.
Now, when doing CTS, the tool should be able to build the clock tree while considering the budgets set by the placement-based optimization engine (place_opt). This will help build a correct tree by construction, which is useful skew-aware. Post-CTS, there has to be one more optimization step that is aware of this tree and makes necessary datapath optimization as well as doing local useful skew to achieve timing closure. Claiming to meet datapath timing while building the clock tree is marketing stuff, and in reality, you need a clean-up step to get an accurate post-CTS timing-closed design.
But it doesn’t end here, as most companies would like it to. Router and post-route optimization have to be an integral part of this as well. The router needs to understand the exact topology for routing the clock tree to really achieve skew/ latency claims, while building the clock tree and post-route optimization steps need to also do useful skew-based optimization to achieve complete timing closure. Without this crucial step, you can achieve very good results during and after CTS, but once you route the design, due to routing topology changes and SI, your datapath timing can degrade even more than place_opt can predict. The only way to achieve true timing closure is to allow post-route optimization to do more changes to the clock tree. This again requires accurate and tightly coupled engines that can do the right changes, or else one small change to the clock tree can result in 10’s or possibly 100’s of timing violations.
Now add to this MCMM (multi-corner and multi-mode)-based useful skew. This will add to the complexity of how much useful skew can be used, and has possible implications on other operating corners. The only way to address this is an engine that can understand the effect of applying useful skew in one corner on other corners and make necessary adjustments.
To summarize, unless a tool understands and applies useful skew to the full place-and-route flow, you may not be taking full advantage of useful skew. Please leave your comments and let me know how you liked this article.
By Alpesh Kothari.
Alpesh Kopthari is with ATopTech, Inc.
Go to the ATopTech, Inc. website to learn more.