# A Robust Pulsed Flip-flop and its use in Enhanced Scan Design

Rajesh Kumar\* Kalyana C. Bollapalli\* Rajesh Garg<sup>‡</sup>
Tarun Soni\* Sunil P. Khatri\*

\* Department of ECE, Texas A&M University, College Station TX 77843.

‡ Intel Corporation, Hillsboro, OR 97124

Abstract-Delay faults are frequently encountered in nanometer technologies. Therefore, it is critical to detect these faults during factory test. Testing for a delay fault requires the application of a pair of test vectors in an at-speed manner. To maximize the delay fault detection capability, it is desired that the vectors in this pair are independent. Independent vector pairs cannot always be applied to a circuit implemented with standard scan design approaches. However, this can be achieved by using enhanced scan flip-flops, which store two bits of data. This paper has two contributions. First, we develop a pulsed flip-flop (PFF) design. Second, we present an enhanced scan flipflop design, based on our PFF circuit. We have compared the performance of our pulse based flip-flop with recently published pulse based flip-flop designs, as well as a traditional masterslave D flip-flop. Our PFF shows significant improvements in power and timing compared to the other designs. Our pulse based enhanced scan flip-flop (PESFF) has 13% lower power dissipation and 26% better timing than a conventional D flipflop based enhanced scan flip-flop (DESFF). The layout area of our PESFF is 5.2% smaller than the DESFF. Monte Carlo simulations demonstrate that our design is more robust to process variations than the DESFF.

## I. INTRODUCTION

The constant advances in VLSI design and the increased number of battery based applications have set us a goal of higher performance with lower power consumption [1]. Flip-flops and latches are fundamental building blocks of sequential digital circuits. The timing of a design significantly depends on the speed of these flip-flops, particularly in heavily pipelined designs. Flip-flops also have a major contribution in the total power consumption of the design [2]. Traditionally, flip-flops are made up of a master-slave latch, with data being latched at the master at the clock latching edge and delivered to slave at the releasing edge of the clock. Such an implementation has a positive setup time and the sum of clock to Q delay  $(T_{cq})$  and setup time  $(T_{su})$  is high. This sum is the figure of merit for a flip-flop, since these two delays, added with the combinational logic delay, determines the operating frequency of a design. The desire to reduce this figure of merit  $(T_{cq} + T_{su})$  motivates to develop a pulse based flip-flop. A pulsed flip-flop consists of a pulse generator circuit and a latch as shown in Figure 1.



Fig. 1. Block diagram of a pulsed flip-flop

The latch becomes transparent for the time duration in which the pulse is high. The pulse is derived from the input clock edge and hence is generated *after* the clock edge. This allows the data to arrive even later than the clock edge, hence making  $T_{su}$  negative. This facts help to reduce the  $T_{su} + T_{cq}$ . The pulse generator circuit can be shared across several flipflops, amortizing its area and power cost.

Modern VLSI circuits routinely contain hundreds of millions of transistors operating in the gigahertz range. Deep sub-micron technologies exhibit significant inter and intra die process variations. Hence, in order to ensure correct logical and temporal functionality, semiconductor manufacturers need to carry out both functional as well as timing checks on the fabricated chip. The semiconductor industry relies heavily on scan based design for testing digital circuits. In scan based design, all the storage elements are replaced with scan cells. Figure 2 shows the block diagram of a scan flipflop, along with the D flip-flop it replaces. It consists of a multiplexer and a D flip-flop. D and SI are the inputs of the multiplexer and SE acts as select signal. The SE signal is low during normal operation of the scan flip-flop. In scan mode SE is high. Now the D input of the flip-flop is driven by the scan-in signal SI. The SI of scan flip-flop i is driven by the Q signal of the scan flip-flop i-1. The Q signal of scan flip-flop i drives the SI of scan flip-flop i+1. In this way, scan cells are connected to form one or more shift register chains called scan chains, which can be accessed through the IO pads. With external access, one can control the internal states of a digital circuit by simply shifting a test vector into the scan chain. After driving the vector to the combinational logic one can observe test response by shifting out the data from the scan chain.



Fig. 2. Scan flip-flop

Figure 3 shows a scan based design in which 3 scan flipflops are connected to form a scan chain. This design can be operated in functional or test mode using the *SE* signal. When *SE* is low, a scan cell operates as a normal D flipflop. When *SE* is high, the scan cells are connected to form a shift register. The test vectors can be scanned in using the *SCANIN* signal typically (a chip-level I/O port) into the scan



Fig. 3. Scan design

chain, during test mode.

Figure 4 shows the corresponding timing waveforms of the CLK, SE and  $Q_{0-2}$  signals. Initially, SE is asserted high and in first three clock cycles, test bits are scanned in through the SCANIN port. At this point, the response of the combinational logic is available at the D input of the flip-flops. The corresponding response is captured into scan chain by asserting SE signal low in the next clock cycle (such a cycle is called a *capture* cycle). In order to shift out the response of the circuit from the scan chain, SE is again made high and data is shifted out in next three clock cycles, through the SCANOUT port.



Fig. 4. Timing waveforms for scan design shown in Figure 3

It is commonly accepted that it is not enough to just perform stuck-at fault tests for a design. Increasingly, atspeed tests are performed for higher delay fault coverage [3]. At-speed testing is used to identify the *delay faults* in the circuit. At-speed testing requires two test vectors,  $V_1$  and  $V_2$ .  $V_1$  is used to initialize the inputs of combinational logic, before applying test vector  $V_2$  at speed. After initialization,  $V_2$  is launched in to the combinational logic, the corresponding transitions driven to the circuit under test (CUT). The

propagated transitions are then captured back into the scan chain during a capture cycle. The second vector  $V_2$  can be applied in two ways – launch-on-shift (LOS) and launch-on-capture (LOC). For a LOS delay test, the second vector  $V_2$  is a one-bit shifted version of the first vector  $V_1$ . For the LOC delay test, the second vector  $V_2$  is the CUT's response to the vector  $V_1$ .

The timing waveforms corresponding to LOC delay test are shown in Figure 5. In LOC, SE must be held high for the duration when the first vector  $V_1$  is scanned into the scan chain. This is done by using a slow clock. SE is then made low and enough time is allowed for the transitions to get stabilized throughout the combinational logic. At this point, the second vector  $V_2$ , which is the CUT's response to vector  $V_1$ , is available at the D input of flip-flops. After this, two fast clock pulses are applied to launch the vector  $V_2$  and to capture the CUT's response corresponding to the  $V_1 \rightarrow V_2$ input change. The circuit operation can be described in two steps. In the first fast clock edge (launch edge), the vector  $V_2$ is captured into the flip-flops. This new input vector at the output of the scan flip-flops is propagated through CUT. Now, the second fast clock edge is used to capture the response of CUT to the  $V_1 \rightarrow V_2$  transition, into scan chain. The fast clock period is the same as that of functional clock. These captured test results are finally scanned out with the slow scan clock, while SE is held high.



Fig. 5. Timing Diagram for LOC

In the LOS delay test, the second delay test vector  $V_2$  is one bit shifted version of  $V_1$ . The timing waveforms for LOS are shown in Figure 6. The SE signal must remain high until after the first (launch) fast clock edge. After the first fast clock edge, vector  $V_2$  is present at the output of flip-flops. SE must be switched low (to functional mode) so that the CUT's response to  $V_2$  can be captured back into the scan chain in second (capture) fast clock edge. The SE signal must switch within a fast clock period. The SE signal is a global signal shared among many scan flip-flops. Hence it is typically hard to meet this tight timing constraint, making the SE signal for LOS a timing critical signal.



Fig. 6. Timing Diagram for LOS

In both LOC and LOS, there is a restriction on vector  $V_2$ 

respectively. In the case of LOC, the vector  $V_2$  is the response of the CUT to  $V_1$ . In the case of LOS, the vector  $V_2$  is the one bit shifted version of vector  $V_1$ . This restriction limits the transition delay fault coverage. The above mentioned limitations on vector  $V_2$  can be overcome using enhanced scan design. In enhanced scan design, one additional flipflop is interleaved with each of the functional flip-flops in the design. The vectors  $V_1$  and  $V_2$  can now be *simultaneously* scanned in and loaded into the scan chain, in an interleaved manner. At the SCAN\_IN stage of the test, bits of the vector  $V_1$  are loaded in the functional flip-flops, while bits of vector  $V_2$  are loaded in the corresponding additional flip-flop. Since the bits in the additional flip-flops is chosen arbitrarily we can achieve any combination of  $V_1 \rightarrow V_2$  and the delay test can be applied in LOS mode. Enhanced scan design thus addresses the problem of low delay fault coverage by removing the restrictions on vector  $V_2$ .

The key contributions of this paper are:

- A pulse generator circuit is presented.
- The proposed PFF operates at higher speed and consumes less power as compared to existing pulsed flipflop designs.
- We utilize a special circuit which simplifies the distribution of the otherwise timing critical scan enable (SE) signal of an LOS based pulsed enhanced scan flip-flop.
- We propose a pulsed flip-flop based enhanced scan cell (PESFF) which dissipates 13% lower power and 26% better timing than a conventional D flip-flop based enhanced scan flip-flop (DESFF).
- Our PESFF has 5.2% lower area overhead than a DESFF.
- Monte Carlo simulations demonstrate that the proposed PESFF is more robust to process variations than the DESFF.

## II. PREVIOUS WORK

Several flip-flop designs have been proposed over the years, which aim to minimize power dissipation and increase the speed of operation. The transmission gate based master-slave flip-flop [4] is one of the simplest implementations. In order to reduce the delay, area and power of the flip-flop, the pulsed flip-flop was introduced. In most pulsed flip-flops, a separate pulse generator and latch are used. In [5], the authors present a dual pulse generator circuit and a NAND keeper latch. However, their design occupies a larger area and also consumes higher power. In [6], the authors proposed an explicit pulsed flip-flop. Their latch circuit is clocked using a single transistor and the pulse generator circuit simply delays the clock and inverts it before ANDing the inverted delayed clock with the original clock for pulse generation. The pulse generator circuit is very simple but the  $T_{cq}$  of the flip-flop is high which leads to a higher value of the figure of merit.

The pulsed flip-flop proposed in [7] has a dynamic pulse generator circuit and a static latch. By using a dynamic pulse generator, the authors achieved a better setup time. Also,  $T_{cq} + T_{su}$  is low which means that their circuit can

operate at higher speed. However, their layout area is large and also their power consumption is high, as we will show in the sequel.

In [8], the pulsed flip-flop has a dynamic master stage and a static slave stage. In the dynamic stage, precharge and discharge occur alternatively every clock cycle. This happens regardless of the output transition, resulting in unnecessary power consumption. In [9], authors proposed an improved hybrid latch flip-flop with reduced power consumption. By modifying the dynamic master stage of the hybrid latch flip-flop, they reduce the power consumption significantly. However, due to higher  $T_{cq}$ , their speed of operation is slower. In contrast to all these designs, our proposed circuit has a significantly lower  $T_{cq} + T_{su}$  and still consumes very low power. Our design also occupies less active area and is very robust, as evidenced by Monte Carlo simulations. We re-implemented [7] and [9], and compared our approach with these designs as well as with a master-slave flip-flop (all of which were implemented in 100nm process technology).

Various enhanced scan approaches have been introduced to remove the restrictions on the  $V_2$  vector, thereby allowing arbitrary  $(V_1, V_2)$  combinations for higher coverage in delay testing. The simplest enhanced scan scheme would be to include an additional flip-flop with each of the functional flip-flops in the design (thus doubling the scan chain length). The  $V_1$  and  $V_2$  vectors are scanned in an interleaved manner, so that the bits of vector  $V_1$  are stored in functional flip-flops and the bits of vector  $V_2$  are stored in the redundant flip-flops. Delay testing can be performed using LOS, since the vector  $V_2$  can be chosen completely independent of vector  $V_1$ . This technique has a high cost due to duplication of all the flipflops. An alternative technique is presented in [10], where flip-flop sharing between different state machines is done to reduce the number of flip-flops. Here, an extra hold latch is implemented in parallel with the slave latch of the scan flip-flop by using transmission gates to demultiplex the signal paths. The drawback of this method is that the testing session time gets increased by a large amount. Another technique called First Level Hold [11] uses supply gating at the first level of logic gates to hold the state of a combinational circuit, instead of using an extra latch as in the other enhanced scan methods. This method claims to have a lower area overhead but the amount of logic added is actually dependent on the number of first level gates connected to the flip-flops. It also slows down the logic gates considerably, leading to additional delay in the combinational logic.

Although enhanced scan techniques have been around for a long time, they have rarely been used in practice so far because of the prohibitive area overhead. However, recent interest in achieving high delay test coverage from scan based tests (beyond what is possible from traditional LOC tests) to detect small delay defects and possibly also to avoid the need for at-speed functional tests, has revived interest in such schemes [12]. A technique for realizing most of the achievable transition delay fault coverage gains from enhanced scan (at minimal cost by implementing partial enhanced scan) is presented in [13]. The enhanced scan

design approach needs an SE signal at high speed for LOS, which would increase the cost considerably.

In this paper, we propose a novel approach using a pulsed flip-flop to create an enhanced scan cell design. This design has a lower area and timing overhead over a normal D flip-flop based enhanced scan flip-flop, and also removes the need for a high speed scan *SE* signal. Our design has better area, timing, power and reliability characteristics compared to a D flip-flop based enhanced scan cell design.

## III. PROPOSED PULSED FLIP-FLOP (PFF)

A pulsed flip-flop consists of a pulse generator circuit and a latch. The pulse generated from this pulse generator circuit determines many of the important characteristics of the pulsed flip-flop such as setup time  $(T_{su})$ , hold time  $(T_h)$ and clock to Q delay  $(T_{cq})$ . The figure of merit of a flip-flop is  $T_{su} + T_{cq}$ . We first explain why  $T_{su} + T_{cq}$  is a useful figure of merit for a flip-flop. Consider the circuit shown in Figure 7. Let D be the maximum delay of the combinational logic between the two flip-flops.  $T_{su}$  is the setup time of the flipflop and  $T_{cq}$  is the clock to Q delay. If T is the clock period then  $T > T_{su} + T_{cq} + D$  is required for the data to be sampled correctly. Since D is circuit dependent, the figure of merit for a flip-flop is  $T_{su} + T_{cq}$ . If this quantity is lower, the speed of operation will be higher and vice-versa. Note that we could potentially include the effect of hold time  $(T_h)$  as well. The governing equation is  $\Delta + T_h < T_{cq} + d$ , where  $\Delta$  is the clock jitter and d is the minimum circuit delay. In this case the flip-flop is most tolerant to jitter if  $T_{cq} - T_h$  is maximized. However it is common practice to avoid short paths by introducing dummy delays (and effectively increasing d), thereby eliminating the need to consider  $T_h$ . Therefore, we do not include  $T_h$  in our figure of merit.



Fig. 7. A sequential circuit

Keeping this in mind, we use the a pulse generator circuit shown in Figure 8. The pulse is generated by ANDing the clock delayed, with a inverted clock signal. Figure 9 shows the timing waveforms for *CLK*, *CLKB* and *PULSE* signals. In this case, coincident with the *PULSE* is the rising edge of the *CLK*. The pulse width can be easily controlled by appropriately sizing the inverters. For our latch, the pulse width required was generated by slowing down the second inverter in the inverter chain by using long channel devices.



Fig. 8. Proposed pulse generator

The other component of a pulsed flip-flop is a latch. We have used the latch shown in Figure 10. This latch is transparent when the pulse is high. The latch circuit is a



Fig. 9. Waveforms obtained at various nodes in pulse generator circuit

tristate inverter with a static keeper. The pulse signal is fed to the lower NMOS transistor (N2), while it's compliment is fed to the upper PMOS transistor (P1). The input D is fed to the gates of transistors P2 and N1 as shown in Figure 10. The keeper circuit consists of two back to back inverters. The feedback inverter uses long channel devices. Since we have negative setup time and a smaller delay from D to Q, we will also have a better  $T_{su} + T_{cq}$ , and hence a higher speed of operation. As compared to masterslave latches, pulsed flip-flops require only one latch per flip-flop. Also, the pulsed flip-flop structure uses a pulse generator which consumes a considerable amount of power. We share the pulse generator among ten flip-flops. In this way, the additional area and power overhead is reduced significantly. This makes our pulsed flip-flop ideally suited for low power and high speed applications. Additionally, for robust operation under process, voltage and temperature (PVT) variations, the devices were carefully sized. Monte Carlo simulations were performed to test the robustness of the design under PVT variations.



Fig. 10. Latch structure

# IV. PROPOSED PULSE BASED ENHANCED SCAN FLIP-FLOP (PESFF)

Figure 11 shows a conventional D flip-flop based enhanced scan cell (DESFF). It consists of two D flip-flops and a 2-input multiplexer. D, SCANIN, CLK, SE and Q are the external pins. The multiplexer has two inputs D, SI and one select signal SE. This enhanced scan flip-flop can store two test bits (one bit from each of the vectors  $V_1$  and  $V_2$ ). In test mode, SE is high and two bits of the independent test

vectors  $V_1$  and  $V_2$  are scanned into the scan cell. In functional mode, SE goes low, the D flip-flop on the right acts as a normal D flip-flop and the left D flip-flop is not used. An alternative design [14] utilizes 3 latches per enhanced scan cell, however it has a significantly high  $T_{cq}$ , and hence we compare our PESFF with the DESFF of Figure 11.



Fig. 11. Block diagram of a conventional enhanced scan cell

Figure 12 shows our proposed pulsed enhanced scan flipflop. It consists of two tristate inverter based latches (with outputs SI and Q), a pulse generator and three transmission gates. The two latches each store a bit of the vectors  $V_1$  and  $V_2$ . The pulse generator used for enhanced scan design is the same circuit discussed in Section III (Figure 8), and is not shown in Figure 12. The two transmission gates on the right in Figure 12 are used to implement a 2-input multiplexer. SI and D are the two inputs of the MUX, and SCAN is the select signal of the multiplexer. The input, SCANIN of the first tristate inverter is selected only in the test mode. The purpose of using a transmission gate (left of Figure 12) in the first latch is to reduce unnecessary toggling power. In a scan based design, the input SCANIN of a flip-flop is connected to the output of the next flip-flop in the scan chain. During normal mode, the output of a flip-flop may toggle at each clock cycle edge, which results in unnecessary power dissipation if the first latch (left of Figure 12) is transparent.



Fig. 12. The proposed pulsed enhanced scan flip-flop

Note that we use a local signal (SCAN) instead of the using a global signal SE, to control the transmission gates. The SCAN signal is generated from SE and PULSE using the circuit shown on the top part of Figure 12. The SCAN signal is high whenever SE goes high. However, SCAN does not go low with the falling edge of SE. The SCAN signal goes low at the rising edge of PULSE which occurs after the falling edge of SE. This extra circuitry is used to avoid the need for high speed SE signal which is otherwise necessary for LOS.

The timing waveforms for CLK, PULSE, SE and SCAN signals are shown in Figure 13. In test mode, SE is kept high and test bits of vector  $V_1$  and  $V_2$  are scanned in alternately. PULSE is generated at every rising edge of CLK. The two tristate inverters become transparent when PULSE goes high. After scanning in all the test bits, vector  $V_1$  is applied at the inputs of the combinational logic, the bits of vector  $V_2$  are available at the SI of the scanned flip-flops. Vector  $V_2$  is launched using the at fast clock edge, with a high SCAN signal. To capture the data at the next fast clock edge (capture), SCAN needs to go low.



Fig. 13. Timing diagram of proposed enhanced scan flip-flop

## V. EXPERIMENTAL RESULTS

We simulated the proposed PFF and PESFF in HSPICE [15], using 100nm BSIM [16] model card. The performance of our PFF design was compared with the flip-flop designs of [9], [7]. We also compared the results of our design with a traditional master-slave D flip-flop. We compared the performance of our PESFF with a conventional D flip-flop based enhanced scan design. Monte Carlo simulations were performed in order to verify the robustness of our design against process variations. The Monte Carlo simulations were performed for variations in L (channel length),  $V_t$  (threshold voltage) of transistors and VDD (supply voltage). A total of 500 simulations were run with  $3\sigma$  variation value as 10% of the nominal value for each parameter.

Table I and Table II show the experimental results for the PFF. The experimental results for enhanced scan designs are shown in Table III and Table IV. When computing the area of our PFF (PESFF), the pulse generator is assumed to be shared between 10 PFF (PESFF). We found that using 10 PFFs (PESFFs) per pulse generator, minimized the dynamic power per PFF (PESFFs). The proposed PFF is 31% faster than a conventional D flip-flop. Our PFF has 18% better timing compared to Explicit PFF, and has 3.3% better timing as compared to a Hybrid flip-flop. The proposed design consumes 45.4% lower power compared to the Explicit PFF, 28% lower power than the hybrid and 18% lower power than the conventional D flip-flop. The  $\sigma$  value of the delay of the proposed design is 50% lower than the Explicit flipflop, 20% higher than the conventional D flip-flop and 37.6% lower than the Hybrid flip-flop.

Our PESFF is 26% faster than a conventional D flip-flop based enhanced scan flip-flop. The power dissipation of our

PESFF is 13% lesser than the reference design considered. We also compared our design with a conventional enhanced scan design in terms of layout area. Our proposed PESFF occupies an area of  $53.2\mu m^2$ . In contrast, a D flip-flop based enhanced scan cell occupies an area of  $55.9\mu m^2$ . The layouts of conventional enhanced scan cell and the proposed PESFF are shown in Figure 14 and Figure 15 respectively. Figure 15 illustrates the pulse generator (top) and the PESFF structure (bottom). Note that the pulse generator is shared between 10 PESFFs. Table IV indicates that the mean ( $\sigma$ ) of the figure of merit of the PESFF is 26.0% (14.80%) better than that of the DESFF. The  $\mu$  + 3\* $\sigma$  of the figure of merit of the PESFF is 27.0% better than that of the DESFF.

| Flip-Flops               | $T_{cq}$ (ps) | $T_{su}$ (ps) | $T_{cq} + T_{su}$ (ps) | Power (µW) |
|--------------------------|---------------|---------------|------------------------|------------|
| Our PFF                  | 128.6         | -57.6         | 71.0                   | 13.2       |
| Hybrid PFF [9]           | 75.8          | -2.15         | 73.45                  | 18.4       |
| Explicit PFF [7]         | 110.9         | -24.8         | 86.12                  | 24.2       |
| Master-Slave D Flip-Flop | 76.5          | 26.93         | 103.4                  | 16.2       |

TABLE I
Nominal simulation results for different flip-flop designs

| Flip-Flops               | $T_{cq} + T_{su}$ (ps) |       |
|--------------------------|------------------------|-------|
|                          | mean                   | sigma |
| Our PFF                  | 71.2                   | 4.01  |
| Hybrid PFF [9]           | 74.13                  | 6.41  |
| Explicit PFF [7]         | 87.6                   | 7.98  |
| Master Slave D Flip-Flop | 105.5                  | 3.31  |

TABLE II  $\label{eq:monte_carlo} \text{Monte Carlo simulation results for different flip-flop } \\ \text{designs}$ 

| Flip-Flops | $T_{cq}$ (ps) | $T_{su}$ (ps) | $T_{cq} + T_{su}$ (ps) | Power (µW) | Layout Area (µm²) |
|------------|---------------|---------------|------------------------|------------|-------------------|
| PESFF      | 149.6         | -56.2         | 93.4                   | 14.6       | 159.5             |
| DESFF      | 77.8          | 48.1          | 126.0                  | 16.8       | 167.7             |

TABLE III
NOMINAL SIMULATION RESULTS FOR PESFF AND DESFF

| Flip-Flops     | $T_{cq} + T_{su}$ (ps) |       |
|----------------|------------------------|-------|
|                | mean                   | sigma |
| PFF based ESFF | 92.0                   | 5.2   |
| DFF based ESFF | 129.36                 | 6.11  |

 $\label{thm:table_iv} \textbf{TABLE IV}$  Monte Carlo simulation results for PESFF and DESFF

#### VI. CONCLUSIONS

In this paper, we present a pulsed flip-flop, along with an enhanced scan flip-flop designed using the proposed pulsed flip-flop. Our pulsed flip-flop consumes lesser power and occupies lower area while achieving higher performance compared to existing pulsed flip-flop. We compared our PESFF design against traditional DESFF. The robustness of our design was verified by performing Monte Carlo simulations. Our PESFF has 26% lower  $T_{cq} + T_{su}$  delay and also consumes 13% lesser power compared to DESFF. Our design also has 5.2% lower layout area compared to DESFF. This area can be further reduced by using partial enhanced scan design technique [13].



Fig. 14. DESFF layout



Fig. 15. Pulse generator (top) and PESFF (bottom) layout

### REFERENCES

- A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low power cmos digital design," *IEEE Journal of Solid State Circuits*, vol. 27, pp. 473–484, 1995.
- [2] J. Rabaey and M. Pedram, Low Power Design Methodologies. Kluwer Academic Publishers, Norwell, MA, 1996.
- [3] J. Savir, "Skewed-load transition test: Part 1, calculus," pp. 705–713, 1992.
- [4] S. V and V.Oklobdizija., "Comparative analysis of master-slave latches and flip-flops for high performance and low power systems," *IEEE jnl* of Solid State Circuits, pp. 636–548, April 1999.
- [5] W. Kim, D. Shin, H. Yun, J. Kim, and S. Min, "Performance comparison of dynamic voltage scaling algorithms for hard real-time systems," in *Proc. of IEEE Real-Time and Embedded Technology and Applications Symposium*, pp. 219–228, 2002.
- [6] D. T. Zhao, P. and M. Bayoumi, "Low power and high speed explicit pulsed flip-flops," *The 45th midwest Symp. on Circuits and Systems, Tulsa, OK, USA*, Aug 2002.
- [7] W. M.W.Phyu and K.S.Yeo, "Low-power/high performance explicitpulsed flip-flop using static latch and dynamic pulse generator," *IEE Proc. Circuits Devices and Syst.*, vol. 153, June 2006.
- [8] e. a. H.Partovi, "Flow through latch and edge triggered flip-flop hybrid elements," *International solid state circuit conference Digest* of technical papers, pp. 138–139, 1996.
- [9] S. Goel and M. Bayoumi, "Improved Hybrid Latch flip-flop for low-power VLSI systems," tech. rep., Electronics Research Lab, VLSI Research Lab, Centre for Advanced Computer Studies, University of Lousiana at Lafayette.
- [10] J. P. Hurst and N. Kanopoulos, "Flip-flop sharing in standard scan path to enhance delay fault testing of sequential circuits," in ATS '95: Proceedings of the 4th Asian Test Symposium, (Washington, DC, USA), p. 346, IEEE Computer Society, 1995.
- [11] S. Bhunia, H. Mahmoodi, A. Raychowdhury, and K. Roy, "A novel low-overhead delay testing technique for arbitrary two-pattern test application," in *Design Automation and Test in Europe (DATE)*, March 2005
- [12] I. Pomeranz and S. M. Reddy, "Improving the transition fault coverage of functional broadside tests by observation point insertion," vol. 16, (Piscataway, NJ, USA), pp. 931–936, IEEE Educational Activities Department, 2008.
- [13] G. Xu and A. D. Singh, "Low cost launch-on-shift delay test with slow scan enable," in ETS '06: Proceedings of the Eleventh IEEE European Test Symposium, 2006.
- [14] L.-T. Wang, C. E. Stroud, and N. A. Touba, System-on-Chip Test Architectures. Morgan Kaufmann Publishers, Burlington, MA, 2007.
- [15] I. Meta-Software, "HSPICE user's manual," Campbell, CA.
- [16] "BSIM3 Homepage." http://www-device.eecs.berkeley .edu/~bsim3/intro.html.