ICCD 2004 Final Program



Sunday, October 10

Registration : 8:00AM - 4:00 PM

Special Conference Workshop :  Tribute to Professor Edward McCluskey, Stanford University. SantaClara/SanJose Conference Rooms

 

9:00 - 10:30: Digital Testing

Session Chair : Dr. Tom Williams, Synopsys

Dr. Bernd Koenemann, Cadence Design Systems, "ELF + MURPHY = Leprechaun?"

Dr. Rohit Kapur, Synopsys, "Evaluating Test Compression Methods"

Prof. Jacob Abraham, University of Texas-Austin, "Quadruple Time Redundancy: Efficient Error Correction for Datapaths"

11:00 - 12:00: Logic Synthesis

Session Chair : Prof. Giovanni De Micheli, Stanford University

Dr. Antun Domic, Synopsys, "Synthesis of Tomorrow"

Prof. Bob Brayton, University of California-Berkeley, "Computing and Using Flexibility in Logic Networks"

12:00 - 1:30: Lunch : Special Tribute to Professor Edward McCluskey, Donner Room

Session Chair : Dr. LaNae Avra, Cadence Design Systems

Speakers : Dr. Don Chamberlin, IBM;  Prof. Michael Flynn, Stanford;  Prof. John Hayes, U. of Michigan;  Prof. Kishor Trivedi, Duke University;  Prof. Wayne Wolf, Princeton;  Prof. Janak Patel, University of Illinois;   Dr. John Shedletsky, IBM;  Prof. Melvin Breuer, USC;  Dr. John Shen, Intel;  Dr. Miron Abramovici, DAFCA;  Dr. Rob Roy, Zenasis;  Prof. Rulin Mangir, CSU Long Beach

2:00 - 3:30: Reliable Computing

Session Chair : Dr. Robert Horst

Prof. Algirdas Avizienis,UCLA, "The Information Infrastructure needs a Fault-Tolerance Infrastructure of its Own"

Prof. Dan Siewiorek, Carnegie Mellon University

Prof. Ravi Iyer, University of Illinois at Urbana-Champaign

4:00 - 5:00: Panel Discussion : Computer Engineering and Professor McCluskey’s Impact

Moderator : Dr. Bill Bottoms, Third Millennium Test Solutions

Panelists : Prof. Kent Fuchs, Cornell University;  Dr. Ed Eichelberger, IBM (retired);   Prof. Steve Szygenda, Southern Methodist University;  Prof. John Brzozowski, University of Waterloo

6:00 - 8:00: Dinner and Plenary Session, Monterey/Carmel Room

Plenary Session Chair: Prof. Dan Siewiorek, Carnegie Mellon University

 

Note that attendees must register separately for the workshop.


Monday, October 11

Registration: 8 AM - 5 PM

Convocation session 9 AM - 11:45 AM, Donner Room

9:00-9:30

    Welcome from Tom Dillinger, ICCD 2004 General Chair, and Ed Grochowski, ICCD 2004 Technical Program Chair


9:30-10:15

Keynote Address:  Gigascale System Design -- Challenges and Opportunities


  Dr. Shekhar Borkar, Intel Corp.


Abstract


VLSI system performance increased by five orders of magnitude in the last three decades, made possible by continued technology scaling, improving transistor performance to increase frequency, increasing integration capacity to realize complex architectures, and reducing energy consumed per logic operation to keep power dissipation within limit. The technology treadmill will continue, providing integration capacity of billions of transistors; however, power and energy consumption will be the barriers. Performance at any cost will not be an option in the future; system architectures will have to emphasize performance delivered in a given power envelope, with complexity limited by energy efficiency. This talk will discuss potential solutions in process technology, circuits, and microarchitectures to exploit future gigascale integration capacity. The concept of system on a chip (SOC) will help integrate diverse functional blocks, providing valued performance. The talk will conclude with recommendations to the chip and system designers on how to exploit these emerging paradigms.

Biography

Shekhar Borkar graduated with MS in Physics from University and Bombay, MSEE from University of Notre Dame in 1981, and joined Intel Corporation. He worked on the 8051 family of microcontrollers, the iWarp multicomputer project, and subsequently on Intel's supercomputers. He is
an Intel Fellow and director of Circuit Research. His research interests are high performance and low power digital circuits, and high-speed signaling. Shekhar is an adjunct faculty member at Oregon Graduate Institute, and teaches VLSI design.


10:15-11:00

Keynote Address:  Error-Tolerance

    Professor Melvin Breuer, USC

Abstract

Because of trends in scaling, in the near future every high performance dice will contain a massive number of defects and process aggravated noise and performance problems. In an attempt to obtain useful yields, designers and test engineers will need to adopt a qualitatively different approach to their work. They will need to learn, enhance and deploy techniques such as fault- and defect-tolerance. For some applications, they may even apply error-tolerance, a somewhat controversial emerging paradigm. A circuit is error-tolerant (ET) with respect to an application, if (1) it contains defects that cause internal and may cause external errors, and (2) the system that incorporates this circuit produces acceptable results . In this presentation we illustrate and give quantitative bounds on several factors that will shape the future of digital design. We compare and contrast defect and fault-tolerant schemes with that of error-tolerance. We discuss how yield can be optimized by appropriately selecting the granularity of spares in light of defect densities and interconnect complexity. Finally, we show that several large classes of consumer electronic applications are resilient to errors, and how error-tolerance can then be used to significantly enhance effective yield.

Biography

Melvin A. Breuer received his Ph.D. in electrical engineering from the University of California, Berkeley, and is the Charles Lee Powell Professor of Electrical Engineering and Computer Science at the University of Southern California. He was Chairman of the Department of Electrical Engineering-Systems from 1991-1994, and again from 2000-2003. He was Chair of the Faculty of the School of Engineering, USC, for the 1997-98 academic year. His main interests are in the area of computer-aided design of digital systems, design-for-test and built-in self-test, and VLSI circuits.


11:00-11:45

Keynote Address: Digital Integrated Circuit Testing for Art Historians and Test Experts

    Professor E.J. McCluskey, Stanford University Center for Reliable Computing

Abstract

This talk is my attempt to identify the basic concerns in digital IC production testing. The details, too often, crowd in and prevent us from understanding why we are having difficulties: why testing is too costly or why too many bad chips escape (pass) the test despite our best efforts. Then I want to explore some of the myths representing the common wisdom about testing. My opinions will be supported by the results of experiments - not simulations, but real-world tests - carried out on actual chips from various technologies.

Biography

Professor McCluskey worked on electronic switching systems at the Bell Telephone Laboratories from 1955 to 1959. In 1959, he moved to Princeton University, where he was Professor of Electrical Engineering and Director of the University Computer Center. In 1966, he joined Stanford University, where he is Professor of Electrical Engineering and Computer Science, as well as Director of the Center for Reliable Computing. He founded the Stanford Digital Systems Laboratory (now the Computer Systems Laboratory) in 1969 and the Stanford Computer Engineering Program (now the Computer Science MS Degree Program) in 1970. The Stanford Computer Forum (an Industrial Affiliates Program) was started by Dr. McCluskey and two colleagues in 1970 and he was its Director until 1978.

Professor McCluskey developed the first algorithm for designing combinational circuits - the Quine-McCluskey logic minimization procedure as a doctoral student at MIT. At Bell Labs and Princeton, he developed the modern theory of transients (hazards) in logic networks and formulated the concept of operating modes of sequential circuits. His Stanford research focuses on logic testing, synthesis, design for testability, and fault-tolerant computing. Prof. McCluskey and his students at the Center for Reliable Computing worked out many key ideas for fault equivalence, probablilistic modelling of logic networks, pseudo-exhaustive testing, and watchdog processors. He collaborated with Signetics researchers in developing one of the first practical multivalued logic implementations and then worked out a design technique for such circuitry.

Dr. McCluskey served as the first President of the IEEE Computer Society. He is the recipient of the 1996 IEEE Emanuel R. Piore Award. He is a Fellow of the IEEE, AAAS, and ACM; and a member of the NAE. He has honorary doctorates from the University of Grenoble and Bowdoin College. He has published several books including two widely used texts.

11:45-1:00 

Lunch, Siskiyou Room


1:00-3:00 

Session 1.1: High-Speed and Energy-Efficient Circuit Design, Santa Clara Room

 

Session Chair :  Nestoras Tzartzanis, Fujitsu Laboratories of America

This session presents circuit techniques for microprocessors and digital signal processors. The first paper   describes a new ternary CAM architecture that results in significant area savings and better energy-efficiency compared to standard designs.  The second paper introduces an encoding technique that improves crosstalk between adjacent wires in long buses. Simulations indicate that the proposed technique achieves up to 10% energy savings compared to shielding. The next three papers discuss new techniques that advance the design of multipliers. The first of these papers proposes an asynchronous Booth multiplier for energy- and area-critical applications. Each Booth iteration requires 1.08ns in a 0.18um, 1.8V CMOS process. The next paper presents an iterative decimal multiplier that uses a new decimal representation for intermediate partial products.  A standard-cell implementation of the proposed multiplier is estimated to operate at 2GHz in a 0.13um CMOS process.  The last paper of the session introduces a twin-precision multiplier that is capable of performing one N bit, one N/2 bit, or two N/2 bit multiplications in parallel without significant changes or overhead compared to a conventional multiplier.

1.1.1.  PCAM: A Ternary CAM Optimized for Longest Prefix Matching Tasks
       
Mohammad J. Akhbarizadeh, Mehrdad Nourani, Deepak-Sarathi V., and Poras Balsara, University of Texas at Dallas

1.1.2.  Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip Buses

        Srinivasa R.Sridhara, Arshad Ahmed, and Naresh R.Shanbhag, University of Illinois at Urbana-Champaign     

1.1.3.  An Area- and Energy-Efficient Asynchronous Booth Multiplier for Mobile Devices

        Justin Hensley, Montek Singh and Anselmo Lastra, University of North Carolina at Chapel Hill

1.1.4.  A High Frequency Decimal Multiplier (short)

        Robert D. Kenney, Michael J. Schulte, University of Wisconsin at Madison ; Mark Erle, IBM  

1.1.5.  An Efficient Twin-Precision Multiplier (short)

       Magnus Själander, Henrik Eriksson, and Per Larsson-Edefors, Chalmers University of Technology, Goteborg, Sweden


Session 1.2 : Energy-Efficient Processor Microarchitecture (1), San Jose Room

Session chair:  Peter-Michael Seidel, Southern Methodist University

Processor micro-architectures are increasingly becoming power-aware and energy-efficient. The first paper proposes a new dynamic scheduler design that reduces scheduler critical path latency and reduces power consumption with minimal performance impact. The second paper proposes a power-aware cache block allocation algorithm. The third paper proposes techniques for clustered micro-architecture that reduce both average and peak temperatures. The last paper of this session discusses a dynamic power-aware issue queue design for multimedia applications.
 

1.2.1.  Defining Wakeup Width for Efficient Dynamic Scheduling

        Aneesh Aggarwal, Oguz Ergin, Binghamton University ; Manoj Franklin, University of Maryland

1.2.2.   Power-aware deterministic block allocation for low-power way-selective cache structure

        Jung-Wook Park, Shin-Dug Kim, Yonsei University, Seoul, Korea ; Gi-Ho Park, Sung-Bae Park, Samsung Electronics

1.2.3.   Thermal-Aware Clustered Microarchitectures

         Pedro Chaparro, José González, Antonio González, Intel Labs, Barcelona

1.2.4.   Reducing Issue Queue Power for Multimedia Applications using a Feedback Control Algorithmcing Issue Queue Power for Multimedia Applications using a Feedback Control AlgoYu Bai and R. Iris Bahar, Brown University thm

      


Session 1.3: Scan Design and Test, Carmel Room

Session chair:  Prab Varma, Veritable

This session presents several new techniques for scan design and test. The first paper proposes a power supply gating technique to reduce test power in portable devices employing periodic self-test. The second paper introduces a new scan control technique for low area overhead. The third paper presents a testability analysis and DfT insertion methodology for end-to-end mixed-signal paths in which the DfT insertion problem is formulated as a min-cost set cover problem. The fourth paper examines creating functional scan chains at the register-transfer level to improve timing and test data compression. The final paper evaluates a transparent-scan approach that allows very aggressive test compaction without increasing the number of undetectable faults.

1.3.1.   A Novel Low-Power Scan Design Technique Using Supply Gating

         S. Bhunia, H. Mahmoodi, D. Ghosh, S. Mukhopadhyay, and K. Roy, Purdue University     

1.3.2.   Asynchronous Scan-Latch controller for Low Area Overhead DFT

         Masayuki Tsukisaka, Masashi Imai, Takashi Nanya, University of Tokyo

1.3.3.   End-to-end testability analysis and DfT insertion for mixed-signal paths

         Sule Ozev, Duke University ; Alex Orailoglu, University of California at San Diego

1.3.4.  Functional Illinois Scan Design at RTL  (short)

         Ho Fai Ko and Nicola Nicolici, McMaster University

1.3.5.   On Undetectable Faults in Partial Scan Circuits Using Transparent-Scan  (short)

         Irith Pomeranz, Purdue University ; Sudhakar M. Reddy, University of Iowa


 
3:30-5:30 

Session 2.1:  Routing and Floorplanning, Santa Clara

Session chair:   Hung-Ming Chen, NCTU, Taiwan

 

This session presents new formlations and techniques for routing and floorplanning. The first paper presents  a method for incremental routing with a view to limit perturbations of existing nets using a constraint satisfiing depth-first search approach. The second paper presents a tile-based global routing technique that allocates shields and buffers to each net in each tile to reduce crosstalk and satisfy resource constraints. The third paper presents a channel-based routing algorithm for bus interconnects on high-speed boards with minimum and maximum delay constraints. The final paper develops an area-efficient floorplanner that satisfies constraints that provide guaranteed yields.

2.1.1.   A Depth-First-Search Controlled Gridless Incremental Routing Algorithm for VLSI Circuits

      Hasan Arslan and Shantanu Dutt, University of Illinois at Chicago

2.1.2.  Simultaneous Shield and Buffer Insertion for Crosstalk Noise Reduction in Global Routing

      Tianpei Zhang and Sachin S. Sapatnekar, University of Minnesota   

2.1.3.   A Two-Layer Bus Routing Algorithm for High-Speed Boards

      Muhammet Mustafa Ozdal, Martin D. F. Wong, University of Illinois at Urbana-Champaign     

2.1.4.  Reticle Floorplanning with Guaranteed Yield for Multi-Project Wafers

      Andrew B. Kahng and Sherief Reda, University of California at San Diego  


Session 2.2 :  Formal Verification  (Embedded Tutorial), San Jose

Session chair: Alan Hu, University of British Columbia

This session begins with a talk on partitioning large combinational logic cones into smaller pieces through intermediate variables. These intermediate variables, together with the state variables, are treated as atoms in abstraction refinement. The second talk compares different methods for evaluation of formulas expressing microprocessor correctness in the logic of Equality with Uninterpreted Functions and Memories (EUFM) by translation to propositional logic, given recently developed efficient Boolean-to-CNF translations, in order to identify the best overall translation strategy from EUFM to CNF.

2.2.1.  Fine-Grain Abstraction and Sequential Don't Cares for Large Scale Model Checking
       
Chao Wang, Gary D. Hachtel, and Fabio Somenzi, University of Colorado at Boulder

2.2.2.    Comparative Study of Strategies for Formal Verification of High-Level Processors (tutorial)

       Miroslav Velev, Reservoir Labs       


Session 2.3:  Signal Integrity and Leakage, Carmel

Session chair:  Azeez Bhavnagarwala, IBM

This session addresses two increasingly important circuit design considerations: signal integrity and leakage dissipation. The first paper analyzes the effect of soft errors in static CMOS circuits and proposes a new technique to reduce their rate by 70% with a 4% area overhead. The second paper presents a cost-effective method to test signal integrity in PCB buses. The proposed method requires laboratory tests at moderate frequencies (i.e., 50-100 MHz). The next paper introduces a new method for leakage reduction in repeaters. A mixture of low- and high-threshold voltage transistors are used in the repeater pull-down and pull-up networks to adjust their effective threshold voltage. Circuit simulations indicate that total power is reduced by up to 38% for a 0.13um, 1.2V CMOS process. The last paper discusses leakage reduction techniques for SRAM-based FPGAs.

 

2.3.1.   A Highly-Efficient Technique for Reducing Soft Errors in Static CMOS Circuits

        Srivathsan Krishnamohan and Nihar R. Mahapatra, Michigan State University

2.3.2.   A Signal Integrity Test Bed for PCB Buses

        Jihong Ren and Mark R. Greenstreet, University of British Columbia

2.3.3.   A New Threshold Voltage Assignment Scheme for Runtime Leakage Reduction in On-Chip Repeaters

        Saumil Shah, Kanak Agarwal, Dennis Sylvester, University of Michigan  

2.3.4.   A General Post-Processing Approach to Leakage Current Reduction in SRAM-based FPGAs

        Jason Brandon, John Lach, Kevin Skadron, University of Virginia


Tuesday, October 12

Registration: 8 AM - 4 PM

8:30-10:30 


Session 3.1:  Special Session on High-Performance On-Chip Communication, Santa Clara

Session chair:  Kevin Rudd, Intel

High-performance on-chip communication has become a key design aspect of modern integrated circuits.

The first talk examines the challenges of high-speed communication in DSM technologies and surveys several solutions. The second paper presents a technique to pipeline wire delays transparently to the IP blocks being connected. The third paper describes a communication-centric design methodology using the globally asynchronous design paradigm. The final paper describes a global approach for building Network-on-Chip (NoC) designs based on an innovative switching fabric.

 
3.1.1.   Design Methodologies and Architecture Solutions for High-Performance Interconnects (invited)

       Davide Pandini, Cristiano Forzan, Livio Baldi, STMicroelectronics

3.1.2.   On-Chip Transparent Wire Pipelining (invited)

       Mario R. Casu and Luca Macchiarulo, Politecnico di Torino  

3.1.3.   Towards an Integrated Design Methodology for Fault-Tolerant, Multiple Clock/Voltage Integrated Systems (invited)

       Radu Marculescu, Diana Marculescu, Larry Pileggi, Carnegie Mellon University   

3.1.4.   Network-on-Chip: the intelligence is in the wire (invited)

       Gerard Mas, STMicroelectronics ; Philippe Martin, Arteris


Session 3.2 :  Test Generation and Characterization, San Jose

Session chair:  Jacob Abraham, University of Texas

Test generation and silicon characterization are becoming increasingly complex problems. The first talk presents a new encoding scheme that provides a second stage of compression after LFSR reseeding to significantly reduce test power and storage. The second talk presents an IP block for on-chip clock jitter measurement. The third talk presents a novel approach for hold time fault diagnosis. The next talk proposes two different methodologies for test cost reduction in scan-based designs. The final talk presents a set of methods, collectively known as testing knowledge, aimed at increasing the quality of automatically generated system-level test cases.

3.2.1.   Low Power Test Data Compression Based on LFSR Reseeding

        Jinkyu Lee and Nur A. Touba, University of Texas at Austin

3.2.2.   An Infrastructure IP for On-Chip Clock Jitter Measurement

        Jui-Jer Huang, and Jiun-Lang Huang, National Taiwan University

3.2.3.   Diagnosis of Hold Time Defects

        Zhiyuan Wang, Malgorzata Marek-Sadowska, University of California at Santa Barbara ; Kun-Han Tsai, Janusz Rajski, Mentor Graphics

3.2.4.   Extending the Applicability of Parallel-Serial Scan Designs (short)

        Baris Arslan, Ozgur Sinanoglu and Alex Orailoglu, University of California at San Diego

3.2.5.   Quality Improvement Methods for System-level Stimuli Generation (short)

        Roy Emek, Itai Jaeger, Yoav Katz, Yehuda Naveh, IBM Research, Haifa


Session 3.3:  Physically-Aware Design Tools, Carmel

Session chair:  John Lach, University of Virginia

This session presents tools that estimate or optimize various physical metrics at different levels of the VLSI design flow. The first paper provides a tool for measuring the impact of crosstalk on path delays using a path-based (as opposed to the more pessimistic net-based) approach. The second paper discusses an efficient data structure for computing optimal buffer insertions in distributed RC-tree routings. The third paper presents a 0/1 ILP method for high-level synthesis that, in addition to the usual functions of scheduling and binding, also performs resource allocation to logic layers of a 3D-chip to minimize the number of inter-strata vias, and under appropriate constraints. The fourth paper presents a technique for reordering the pins and transisitors in dual Tox gates in order to reduce gate-oxide and subthreshold leakage currents under delay constraints.

3.3.1. XTalkDelay: A Crosstalk-aware Timing Analysis Tool for Chip-level Designs 

        Rajeev Murgai, Takashi Miyoshi, Fujitsu Laboratories of America ; Yinghua Li, University of California at Berkeley ; Ashwini Verma, Amdocs

3.3.2.   A flexible data structure for efficient buffer insertion

        Ruiming Chen, Hai Zhou, Northwestern University

3.3.3.   Simultaneous Scheduling, Binding and Layer Assignment for Synthesis of Vertically Integrated 3D Systems

        Madhubanti Mukherjee and Ranga Vemuri, University of Cincinnati  

3.3.4.   Transistor and Pin Reordering for Gate Oxide Leakage Reduction in Dual Tox Circuits

        Anup Kumar Sultania, Sachin S. Sapatnekar, University of Minnesota ; Dennis Sylvester, University of Michigan

         


11:00-12:30 

Session 4.1:  Energy-Efficient Processor Microarchitecture (2), Santa Clara

Session chair:   Steve Krueger, Texas Instruments

This session continues the theme of power-aware microprocessor designs. The first paper describes the tradeoff between latency and throughput performance, analyzing a set of four techniques for achieving variable energy per instruction. The second paper analyzes the power saving benefits of halting instruction fetching during critical long latency instructions, such as a critical load miss. The final paper presents a novel micro-architecture combining the benefits of both clustering and GALS (Globally Asynchronous Locally Synchronous) design techniques.

4.1.1.   Best of Both Latency and Throughput

        Ed Grochowski, Ronny Ronen, John Shen, Hong Wang, Intel Labs

4.1.2.   Fetch Halting on Critical Load Misses

        Brian Singer, Nikil Mehta, R. Iris Bahar, Brown University ; Michael Leuchtenburg, Richard Weiss, Hampshire College

4.1.3.   Frontend Frequency-Voltage Adaptation for Optimal Energy-Delay^2

        Grigorios Magklis, Jose Gonzalez, Antonio Gonzalez, Intels Barcelona Research Center
 

Session 4.2 : Power and Timing Optimization, San Jose

Session chair:  Sinan Kaptanoglu, Actel

Papers in this session present techniques for performance optimization in power and timing. The first paper is concerned with leakage power reduction, and examines the approach of gate-sizing and Vt assignment. The optimization is formulated as a mixed-integer linear programming problem. The second paper exploits clock skew to improving timing, and uses a metric known as potential slack to guide the optimization. A new linear-programming formulation yields significant speedup over previous approaches. The third paper is concerned with the power optimization in the context of increasing process variability, and proposes a statistical method for gate sizing for this optimization problem.

4.2.1.   Gate Sizing and Vt Assignment for Active-Mode Leakage Power Reduction

        Feng Gao and John P. Hayes, University of Michigan

4.2.2.   Potential Slack Budgeting with Clock Skew Optimization

        Kai Wang and Malgorzata Marek-Sadowska, University of California at Santa Barbara

4.2.3.   A New Statistical Optimization Algorithm for Gate Sizing

        Murari Mani, Michael Orshansky, University of Texas at Austin


Session 4.3:  Novel Processor Design, Carmel

Session chair:  Ed Grochowski, Intel

This session begins with a general system architecture for searching, filtering, compression, encryption, and other operations on unstructured data streaming from a disk system. The second paper describes a prototype Itanium microprocessor written in the Bluespec hardware description language and synthesized into an FPGA. The third paper presents a novel method to reduce the memory bandwidth required for fetching texture image data through adaptive selection of a cache index.

  

4.3.1.   An Architecture for Fast Processing of Large Unstructured Data Sets

        Mark Franklin, Roger Chamberlain, Michael Henrichs, Berkley Shands, Jason White, Washington University

4.3.2.   In-System FPGA Prototyping of an Itanium Microarchitecture

        Roland E. Wunderlich and James C. Hoe, Carnegie Mellon University

4.3.3.   Adaptive Selection of an Index in a Texture Cache

        Chun-Ho Kim, Lee-Sup Kim, Korea Advanced Institute of Science and Technology (KAIST)

12:30-1:30 

Lunch, Donner Room


1:30-3:00 

Session 5.1: Emerging Technologies (Special Session), Santa Clara

Session chair:  AJ KleinOsowski, IBM Austin Research Laboratory

The emerging technologies session begins with a survey of nano-scale device technologies. The authors discuss how system-level research can be used to influence device development and propose a design methodology that addresses whether computationally interesting and buildable circuits are possible with Quantum-dot Cellular Automata (QCA). The second talk presents new techniques for modeling quantum circuits, including the design of an FPGA emulator for quantum algorithms. The final talk presents a 3D die stacking technology in which an IA-32 microprocessor is partitioned between two die in order to simultaneously improve performance and reduce power.

5.1.1.   Using Circuits and Systems-Level Research to Drive Nanotechnology (invited)

        Michael T. Niemier, Ramprasad Ravichandran, Georgia Institute of Technology ; Peter M. Kogge, University of Notre Dame

5.1.2.   FPGA Emulation of Quantum Circuits

        Ahmed Usman Khalid, Zeljko Zilic, McGill University ; Katarzyna Radecka, Concordia University

5.1.3.    3D Processing Technology and its Impact on iA32 Microprocessors (invited)

        Don Nelson, Clair Webb, Nick Samra, Bryan Black, Intel Corporation

        
Session 5.2 :  Cache Memory Design, Carmel

Session chair:  Srikanth Srinivasan, Intel

This session examines issues in cache memory design. The first talk presents a cache access time model which optimizes the memory array for minimum access and cycle times. The second talk explores tracking cache request history using a novel hardware monitoring framework to improve scheduling and performance of a simultaneous multithreaded processor. The final talk presents a CAM-based cache design that uses prediction to reduce energy consumption.

5.2.1.   Cache Array Architecture Optimization at Deep Submicron Technologies

        Annie Y. Zeng, Ken Rose, Ronald J. Gutmann, Rensselaer Polytechnic Institute

5.2.2.   Implementation of Fine-Grained Cache Monitoring for Improved SMT scheduling

        Joshua Kihm, Daniel Connors, University of Colorado

5.2.3.   Using Prediction to Reduce Energy Consumption in Highly-Associative Caches for Embedded Processors (short)

        Alex Veidenbaum, Dan Nicolaescu, University of California at Irvine


 
3:30-5:00 

Session 6.1:  Layout-Driven Circuit Optimization, Santa Clara

Session chair:  W. Rhett Davis, North Carolina State University

This session addresses CAD-related issues associated with circuit design. The first paper studies the effectiveness of via-configurable gate arrays for the implementation of circuits ranging from simple gates and memory cells to arithmetic units. The second paper proposes an algorithm that simplifies the netlist of hybrid structured clock trees and therefore significantly reduces the simulation time required for their analysis. The third paper presents a new formalism for layout-driven optimization of datapaths which has been integrated into a commercial Electronic Design Automation environment. The last paper discusses a high-level synthesis methodology for digital filter which is based on floorplan-aware complexity reduction.

6.1.1.  The Magic of a Via-Configurable Regular Fabric

        Yajun Ran and Malgorzata Marek-Sadowska, University of California at Santa Barbara

6.1.2.  A Fast Delay Analysis Algorithm for the Hybrid Structured Clock Network

        Yi Zou, Yici Cai, Qiang Zhou, Xianlong Hong, Tsinghua University ; Sheldon Tan, University of California at Riverside

6.1.3. Layout Driven Optimization of Datapath Circuits using Arithmetic Reasoning (short)

         Ingmar Neumann, Domink Stoffel, Kolja Sulimma, Wolfgang Kunz, University of Kaiserslautern ; Michel Berkelaar, Magma Design Automation

6.1.4. Floorplan-aware High-level Synthesis for Digital Filters (short)

        Dongku Kang, Hunsoo Choo, Kaushik Roy, Purdue University

Session 6.2 :  Instruction-Level Parallelism (1), San Jose

Session chair:  Nihar Mahapatra, Michigan State University

This session presents techniques for faster instruction execution. The first paper shows the benefits of combining three different thread spawning policies for SMT, achieving better performance gain overall than either of these alone. The second paper proposes a TriBank register file design with higher bandwidth and smaller access time compared to convention large register file designs. The final paper exploits slack in instruction schedules to design dynamically scaled pipelines with multiple clock domains resulting in improved power-performance.

6.2.1.  A Minimal Dual-Core Speculative Multi-Threading Architecture

        Srikanth Srinivasan, Haitham Akkary, Tom Holman, Konrad Lai, Intel Corporation

6.2.2.   Exploiting Quiescent States in Register Lifetime

        Rama Sangireddy, University of Texas at Dallas ; Arun K. Somani, Iowa State University
6.2.3.    Evaluating Techniques for Exploiting Instruction Slack (short)

        Yau Chin, John Sheu, David Brooks, Harvard University


Session 6.3: Power Estimation and Minimization, Carmel

Session chair:  Yuan Xie, Pennsylvania State University

Papers in this session are concerned with power estimation and minimization at various levels of a system - from multiprocessors down to gates. The first paper uses continuous transition probability waveforms to represent gate delays as random variables to account for manufacturing process variations, and uses them to compute dynamic power dissipation efficiently and with low error. The second paper discusses approaches to take cognizance of circuit topology to restructure an ILP over circuit states for optimization problems in order to improve the perfromance of the ILP. Power estimation of communication primitives in on-chip multiprocessors viasoftware-interrupt calls to power estimators for component functions of the primitives, memory access, bus access and instruction execution, is addressed in the third paper. The final paper of the session presents a system-level power estimation tool for analog-to-digital converters (ADCs) based on a parameterized power estimation function that can be applied to basic ADC components.

6.3.1.   Static Transition Probability Analysis under Uncertainty

        Siddharth Garg, Siddharth Tata, Ravishankar Arunachalam, Indian Institute of Technology at Madras       

6.3.2.   Circuit-based Preprocessing of ILP and Its Applications in Leakage Minimization and Power Estimation

        Donald Chai, University of California, Berkeley ; Andreas Kuehlmann, Cadence Berkeley Labs

6.3.3.   Analyzing Power Consumption of Message Passing Primitives in a Single-chip Multiprocessor (short)

        Mirko Loghi, Massimo Poncino, Universit`a di Verona ; Luca Benini, Universit`a di Bologna

6.3.4.   An Architectural Power Estimator for Analog-to-Digital Converters (short)

        Zhaohui Huang, Peixin Zhong, Michigan State University

 

6:00-9:00 

 

Banquet:  Doubletree Hotel, Donner Room

Panel Discussion:  Grid Computing, Session Chair:  Tobin Lehman, IBM

 Panelists :  Inderpal Narang, IBM Research ;  Nader Shaterian, Marsys ;  David Epstein, Cross Link Capital ;  Steve Kishi, Hummer Winblad Venture Partners

 


Wednesday, October 13

Registration: 8:30 AM - 4 PM

9:00-10:30 


Session 7.1:  Formal Verification Techniques, Santa Clara

Session chair:  Miroslav Velev, Reservoir Labs

The first paper addresses the special challenges in the formal verification of high-performance circuits that involve redundant number representations and partial compressions, where symbolic consideration of number representations is not sufficient and additional properties beyond operand values need to be considered to show correct operation. The second paper proposes a simulation-friendly version of the Generalized Symbolic Trajectory Evaluation (GSTE) specification that is convertable into conventional GSTE to access the formal verification tool flow, as well as convertable into completely non-symbolic monitor circuits suitable for conventional dynamic verification. The final paper presents a graph automorphism-based algorithm for computing maximal sets of symmetric inputs of circuitsthat can be used to identity nonsymmetric inputs in a circuit and enhance the efficiency of input matching, technology mapping, and logic verification.


7.1.1  Formal Hardware Verification based on Signal Correlation Properties

        Nikhil Kikkeri, Peter-Michael Seidel, Southern Methodist University        

7.1.2. Generating Monitor Circuits for Simulation-Friendly GSTE Assertion Graphs

        Kelvin Ng, Alan J. Hu, Jin Yang, University of British Columbia

7.1.3. Graph Automorphism-based Algorithm for Determining Symmetric Inputs (short)

        Chen-Ling Chou, Geeng-Wei Lee, Jing-Yang Jou, National Chiao Tung University ; Chun-Yao Wang, National Tsing Hua University


Session 7.2 :   Networks on Chips, San Jose

Session chair:  Hai Zhou, Northwestern University

This session addresses problems in building network-on-chip (NoC) architectures, which have been proposed as a solution to complex on-chip interconnect problems. The first paper addresses the problem of virtualizing and placing logic processing units to achieve thermally balanced designs. The second paper formulates a mixed-integer linear programming (MILP) problem for the synthesis of low-power NoC architectures subject to performance constraints, and proposes heuristics to reduce the runtime of the optimization. The third paper tackles the problem of mapping cores onto an NoC architecture to minimize either energy consumption or congestion. The paradigm of many-to-many core-switch mapping is introduced, and the problem is formulated as MILP to take into account core placement, switches for each core, and network interfaces for communication flows.

7.2.1.    Linear Programming based Techniques for Synthesis of Network-on-Chip Architectures

        Krishnan Srinivasan, Karam S. Chatha, Goran Konjevod, Arizona State University

7.2.2.   Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture

        W. Hung, C. Addo-Quaye, T. Theocharides, Y. Xie, N. Vijaykrishnan, M. J. Irwin, Penn State University  

7.2.3.   Many-to-Many Core-Switch Mapping in 2-D Mesh NoC Architectures

         Chae-Eun Rhee, Samsung Electronics ; Han-You Jeong, Soonhoi Ha, Seoul National University


Session 7.3:  Novel Processor Architecture, Carmel

Session chair:  Pradeep Dubey, Intel

This session is devoted to novel processor architecture and micro-architecture features. The first paper proposes a reconfigurable SIMD DSP capable of instantly scaling single instruction stream over multiple data streams (ISSIMD). The second paper presents architectural technique called, Runtime Execution Monitoring (REM) to detect program flow anomalies resulting from various worm and virus attacks. The final contribution of this session is about dynamic address compression schemes applied to on-chip address buses and resulting cost, power, and performance benefits.

7.3.1.   An Embedded Reconfigurable SIMD DSP with Capability of Dimension-Controllable Vector Processing

       Liang Han, Jie Chen, Ying Li, Chaoxian Zhou, Xin Zhang, Zhibi Liu, Xiaoyun Wei, Baofeng Li, Institute of Microelectronics, Chinese Academy of Science

7.3.2.   Runtime Execution Monitoring (REM) to Detect and Prevent Malicious Code Execution

        A. Murat Fiskiran, Ruby B. Lee, Princeton University  

7.3.3.    Dynamic Address Compression Schemes: A Performance, Energy, and Cost Study

        Jiangjiang Liu, University at Buffalo ; Krishnan Sundaresan, Nihar R. Mahapatra, Michigan State University


11:00-12:30 

Session 8.1:  Instruction-Level Parallelism (2), Santa Clara

Session chair:  Nihar Mahapatra, Michigan State University

This session continues the theme of improved performance through ILP techniques. The first paper proposes a compile-time method to construct frames using profiling and static analysis to produce well-optimized frequently executed frames with minimum recovery penalty. The second paper discusses a technique that dynamically assigns ILP classification to load instructions and uses this class label to dynamically control cache associativity. The final paper discusses novel register management techniques aimed at large physical register files through early register de-allocation which result in improved performance.

8.1.1.   Compiler-Based Frame Formation for Static Optimization

        Feng Shi, Sobeeh Almukhaizim, Pey-Chang Lin, and Yiorgos Makris, Yale University

8.1.2.   IPC Driven Dynamic Associativity Management Cache Architecture for Low energy

        Sriram Nadathur, Akhilesh Tyagi, Iowa State University   

8.1.3.   Increasing Processor Performance through Early Register Release     

        Oguz Ergin, Deniz Balkan, Dmitry Ponomarev, Kanad Ghose, State University of New York at Binghamton


Session 8.2 :  Topics in Synthesis and Co-simulation, San Jose

Session chair:  Rajeev Murgai, Fujitsu

This session consists of papers in various topics in synthesis and co-simulation. The first paper proposes a combined channel segmentation and buffer insertion to improve the routability of FPGA designs. The second paper presents a framework for modeling and analyzing heterogeneous networks, based on a tight integration of a network simulator with embedded software, middleware, and a real-time operating system. The third paper proposes a new architecture for a hardware-accelerated boolean satisfiability solver; the design, intended for FPGA implementation, is modeled and simulated in SystemC. The last paper presents a dual-rail encoding method for creating combinational logic network with a signal asserting the stability of all other outputs, to overcome timing problems resulting from variability of combinational logic delays.

8.2.1.   Combined FPGA Channel Segmentation and Buffer Insertion for Simultaneous Routability and Performance Improvement

        Hu Huang, Joseph B. Bernstein, Martin Peckerar, Ji Luo, University of Maryland

8.2.2.   Software/Network Co-Simulation of Heterogeneous Industrial Networks Architectures

        S. Martini, G. Perbellini, F.Fummi, M.Poncino, M.Monguzzi, Universit`a di Verona

8.2.3.   Hardware/Software Co-modeling of a SAT Solver Based on Distributed Computing Elements using SystemC (short)

        Jinwen Xi, Peixin Zhong, Michigan State University

8.2.4.  Coping with the variability of combinational logic delays (short)

       Jordi Cortadella, Univ. Polit`ecnica Catalunya, Barcelona ;  Alex Kondratyev, Cadence Berkeley Labs ; Luciano Lavagno, Politecnico di Torino ; Christos Sotiriou, ICS-FORTH


Session 8.3 :   Low-Power Architecture, Carmel

Session chair:  Kee Sup Kim, Intel

This session examines architectural techniques for low power. The first paper presents a design methodology for power-aware networks in which communication links are turned on and off in response to bursts and dips in traffic. The second paper evaluates several hardware-based data prefetching techniques from an energy perspective, and explores their energy/performance tradeoffs. The final paper introduces an efficient Montgomery multiplier for the modular exponentiation operation, which is fundamental to numerous public key cryptosystems.

8.3.1.   Design-Space Exploration of Power-Aware On/Off Interconnection Networks

        Vassos Soteriou, Li-Shiuan Peh, Princeton University

8.3.2.   Energy Characterization of Hardware-Based Data Prefetching

        Yao Guo, Saurabh Chheda, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, University of Massachusetts at Amherst

8.3.3.   Design and Implementation of Scalable Low-Power Montgomery Multiplier

        Hee-Kwan Son, Sang-Geun Oh, Samsung Electronics

 

12:30-1:30 

Lunch, Poolside Foyer


1:30-2:30 

Session 9.1:   Test Generation, Santa Clara

Session chair:  Nina Saxena, Intel

This session examines issues in test generation and diagnosis. The first paper introduces a new method for deterministic diagnosis of logic cores based on on-chip decompression and comparison of incompletely specified test patterns and test responses. The second paper proposes an automatic test pattern generation framework for combinational threshold networks, as employed in several emerging nanotechnologies. The final paper presents an algorithm for memory repair problems using BDD that is not only perfect but also highly efficient.


9.1.1.   Compressed Embedded Diagnosis of Logic Cores

        Scott Ollivierre, Adam B. Kinsman, Nicola Nicolici, McMaster University  

9.1.2.   An Automatic Test Pattern Generation Framework For Combinational Threshold Networks

        Pallav Gupta, Rui Zhang, Niraj K. Jha, Princeton University

9.1.3.   An Efficient Algorithm for Reconfiguring Shared Spare RRAM (short)

        Hung-Yau Lin, Hong-Zu Chou, Fu-Min Yeh, Sy-Yen Kuo, National Taiwan University    

    


Session 9.2 :   Network Routing, San Jose

Session chair:  Tom Dillinger, IBM

This session begins with a proposal for a new analytical model to compute message latency in a general n dimensional torus network with an arbitrary number of virtual channels per physical channel. The second paper proposes a solution to eliminate the requirement of sorting by prefix length in IP forwarding devices using Ternary Content Addressable Memories (TCAMs).

 

9.2.1.    An Accurate Combinatorial Model for Performance Prediction of Deterministic Wormhole Routing in Torus Multicomputer Systems

        Hashem Hashemi Najaf-abadi, Hamid Sarbazi-azad, Sharif University of Technology, Tehran

9.2.2.    Technique to Eliminate Sorting in IP Packet Forwarding Devices

        Raymond W. Baldwin, Enrico Ng, University of Illinois at Chicago


Session 9.3 :    Placement and Floorplanning, Carmel

Session chair:  Rajeev Murgai, Fujitsu

Papers in this session are concerned with techniques to improve the quality and performance of placement and floorplanning. The first paper addresses the problem of I/O placement in flip-chip designs, where I/O can be placed throughout the whole chip without long wires from the periphery. A clustering approach is proposed which considers design cost reduction and signal integrity. The second paper uses the B*-tree representation to capture alignment and performance constraints in placement. The last paper presents a new data structure called the adjacent constraint graph for floorplanning; the data structure consolidates the traditional adjacency and vertical/horizontal constraint graphs into one, and allows fast insert and swap operations for the exploration of solution space.

9.3.1.  I/O Clustering in Design Cost and Performance Optimization for Flip-Chip Design

        Hung-Ming Chen, National Chiao Tung University ;  I-Min Liu, Cadence Design Systems ; Martin D.F. Wong, University of Illinois at Urbana-Champaign; Muzhou Shao, Synopsys ; Li-Da Huang, Texas Instruments

9.3.2.   Placement with Alignment and Performance Constraints Using the B*-tree Representation

        Meng-Chen Wu, National Chiao Tung University ; Yao-Wen Chang, National Taiwan University

9.3.3.   ACG-Adjacent Constraint Graph for General Floorplan

        Hai Zhou, Jia Wang, Northwestern University