Streamlining TNS Data Collection for ML-Based RTL QoR Prediction
No Thumbnail Available
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Chip designs must meet several requirements before they are ready for fabrication. One of these requirements is achieving convergence on timing (frequency). Meeting this requirement is a time-consuming task for chip designers in the industry for two reasons. First, the standard approach to procuring this metric involves running logic synthesis and placement, both of which can take hours to weeks on larger RTL designs. Second, since the timing requirement is rarely met after one design iteration, these processes need to be rerun multiple times to recalculate the metric to ultimately converge on the design’s requirements. A critical measure of timing convergence is the total negative slack, commonly referred to by its acronym TNS. It indicates the sum of timing margins of all ‘negative slack’ paths that fail to meet the target clock cycle time. To expedite design convergence, our research team previously presented a machine learning-based approach to estimate the TNS values for chip designs expressed in Verilog hardware description language. This technique was orders of magnitude faster than running logic synthesis and placement on those same chips. In this work, we build on the previous approach by improving the initial data generation process. Getting “true” TNS values for training the machine learning models involves running logic synthesis and placement with hundreds of synthesis recipes for each design, resulting in tens of thousands of synthesis and placement runs. Driven by the need to create a rich training data set, since new designs will be continuously added to the RTL developer’s set of training designs, it behooves to reduce the number of synthesis and placement runs necessary to generate machine learning (ML) training data. By taking advantage of similarities in the distributions of TNS values across chip designs, the number of required synthesis and placement runs for n Verilog RTL designs and m unique synthesis recipes can be reduced from O(nm) to O(n+m) without meaningfully compromising the integrity of the training data and the accuracy of ML predictions. We present two methods for achieving this, both of which involve finding the common TNS distribution, then normalizing and computing missing values in the data set. The discoveries made by our research team have the potential to drastically reduce the time to market for a variety of semiconductor computing products, including but not limited to graphics processors, motherboards, and flash memory.
Description
Keywords
Machine Learning, Verilog RTL, Total Negative Slack, Logic Synthesis and Placement