Published on Oct 02, 2016
A code sign methodology incorporates timing speculation into a low-power microprocessor pipeline and shaves energy levels far below the point permitted by worst-case computation paths.
An old adage says, "If you're not failing some of the time, you're not trying hard enough." To address the power challenges that current on-chip densities pose, we adapted this precept to circuit design. Razor,(1) a voltage-scaling technology based on dynamic detection and correction of circuit timing errors, permits design optimizations that tune the energy in a microprocessor pipeline to typical circuit operational levels. This eliminates the voltage margins that traditional worst-case design methodologies require and allows digital systems to run correctly and robustly at the edge of minimum power consumption.
Occasional heavy weight computations may fail and require additional time and energy for recovery, but the overall computation in the optimized pipeline requires significantly less energy than traditional designs.
Razor supports timing speculation through a combination of architectural and circuit techniques, which we have implemented in a prototype Razor pipeline in 0.18-micrometer technology. Simulation results of the SPEC 2000 benchmarks showed energy savings for every benchmark, up to a 64 percent savings with less than 3 percent performance impact for error recovery.
Both circuit speed and energy dissipation depend on voltage. The speed or clock frequency, f of a digital circuit
is proportional to the supply voltage, Vdd:
f ? Vdd
The energy E necessary to operate a digital circuit for a time duration T is the sum of two energy components:
E = SCV2dd + VddIleakT
where the first term models the dynamic power lost from charging and discharging the capacitive loads within the circuit and the second term models the static power lost in passive leakage current-that is, the small amount of current that leaks through transistors even when they are turned off.
The dynamic power loss depends on the total number of signal transitions, S, the total capacitance load of the circuit wire and gates, C, and the square of the supply voltage. The static power loss depends on the supply voltage, the rate of current leakage through the circuit, Ileak , and the duration of operation during which leakage occurs, T.
Dynamic voltage scaling has emerged as a powerful technique to reduce circuit energy demands. In a DVS system, the application or operating system identifies periods of low processor utilization that can tolerate reduced frequency. With reduced frequency, similar reductions are possible in the supply voltage. Since dynamic power scales quadratically with supply voltage, DVS technology can significantly reduce energy consumption with little impact on perceived system performance.
Razor is an error-tolerant DVS technology. Its error- tolerance mechanisms eliminate the need for voltage margins that designing for "always correct" circuit operations requires. The improbability of the worst-case conditions that drive traditional circuit design underlies the technology.
• Noise margin
• Ambient margin
• Process margin
• Critical voltage
Fig.1 shows margins for factors that can affect the voltage required to reliably operate a processor’s underlying circuitry for a given frequency setting. First, of course, the voltage must be sufficiently high to fully evaluate the longest circuit computation path in a single clock cycle. Circuit designers typically use static circuit-level timing analysis to identify this critical voltage. To the critical voltage, they add the following voltage margins to ensure that all circuits operate correctly even in the worst-case operating environment:
• Process margins ensure that performance uncertainties resulting from manufacturing variations in transistor dimensions and composition do not prevent slower devices from completing evaluation within a clock cycle. Designers find the margin necessary to accommodate slow devices by using pessimistically slow devices to evaluate the critical path’s latency.
• Ambient margins accommodate slower circuit operations at high temperatures. The margin ensures correct operation at the worst-case temperature, which is typically 85-95°C.
• Noise margins safeguard against a variety of noise sources that introduce uncertainty in supply and signal voltage levels, such as di/dt noise in the supply voltage and cross-coupling noise in logic signals. The sum of these voltages defines the minimum supply voltage that ensures correct circuit operation in even the most adverse conditions.
Given the improbability of worst-case operating conditions, an opportunity exists to reduce voltage commensurate with typical operating conditions. The processor pipeline must, however, incorporate a timing-error detection and recovery mechanism to handle the rare cases that require a higher voltage.
A pipeline recovery mechanism guarantees that any timing failures that do occur will not corrupt the register and memory state with an incorrect value. We have developed two approaches to recovering pipeline state.1 The first is a simple method based on clock gating, while the second is a more scalable technique based on counter flow pipelining.3
Fig.5 illustrates pipeline recovery using a global clock-gating approach.
In the event that any stage detects a timing error, pipeline control logic stalls the entire pipeline for one cycle by gating the next global clock edge. The additional clock period allows every stage to re-compute its result using the Razor shadow latch as input. Consequently, recovery logic replaces any previously forwarded errant values with the correct value from the shadow latch.
Because all stages reevaluate their result with the Razor shadow latch input, a Razor flip-flop can tolerate any number of errant values in a single cycle and still guarantee forward progress. If all stages fail each cycle, the pipeline will continue to run but at half the normal speed. In aggressively clocked designs, implementing global clock gating can significantly impact processor cycle time. Consequently, we have designed and implemented a fully pipelined recovery mechanism based on counterblow pipelining techniques.
Figure 6 illustrates this approach, which places negligible timing constraints on the baseline pipeline design at the expense of extending pipeline recovery over a few cycles. When a Razor flip-flop generates an error signal, pipeline recovery logic must take two specific actions.
First, it generates a bubble signal to nullify the computation in the following stage. This signal indicates to the next and subsequent stages that the pipeline slot is empty. Second, recovery logic triggers the flush train by asserting the ID of the stage generating the error signal. In the following cycle, the Razor flip-flop injects the correct value from the shadow latch data back into the pipeline, allowing the errant instruction to continue with its correct inputs.
To obtain a realistic prediction of the power overhead for detecting and correcting circuit timing errors, we implemented Razor in a simplified 64-bit Alpha pipeline design, using Taiwan Semiconductor Manufacturing Co. 0.18-micrometer technology to produce the layout.1 In addition to gate- and circuit-level power analysis on the error-detection-and-recovery design, we performed architectural simulations to analyze the overall throughput and power characteristics of Razor based voltage reduction for different benchmark test programs. The benchmark studies demonstrated that, on average, Razor reduced simulated power consumption by nearly a factor of two—a greater than 40 percent reduction—compared to traditional design-time dynamic voltage scaling and delay chain-based approaches.