ICCD 2004 Final Program
Registration : 8:00AM - 4:00 PM
9:00 - 10:30: Digital Testing
Session Chair : Dr. Tom Williams, Synopsys
Dr. Bernd Koenemann, Cadence Design Systems, "ELF + MURPHY = Leprechaun?"
Dr. Rohit Kapur, Synopsys, "Evaluating Test Compression Methods"
Prof. Jacob Abraham, University of Texas-Austin, "Quadruple Time Redundancy: Efficient Error Correction for Datapaths"
11:00 - 12:00: Logic Synthesis
Session Chair : Prof. Giovanni De Micheli, Stanford University
Dr. Antun Domic, Synopsys, "Synthesis of Tomorrow"
Prof. Bob Brayton, University of California-Berkeley, "Computing and Using Flexibility in Logic Networks"
12:00 - 1:30: Lunch : Special Tribute to Professor Edward McCluskey, Donner Room
Session Chair : Dr. LaNae Avra, Cadence Design Systems
Speakers : Dr. Don Chamberlin, IBM; Prof. Michael Flynn, Stanford; Prof. John Hayes, U. of Michigan; Prof. Kishor Trivedi, Duke University; Prof. Wayne Wolf, Princeton; Prof. Janak Patel, University of Illinois; Dr. John Shedletsky, IBM; Prof. Melvin Breuer, USC; Dr. John Shen, Intel; Dr. Miron Abramovici, DAFCA; Dr. Rob Roy, Zenasis; Prof. Rulin Mangir, CSU Long Beach
2:00 - 3:30: Reliable Computing
Session Chair : Dr. Robert Horst
Prof. Algirdas Avizienis,UCLA, "The Information Infrastructure needs a Fault-Tolerance Infrastructure of its Own"
Prof. Dan Siewiorek, Carnegie Mellon University
Prof. Ravi Iyer, University of Illinois at Urbana-Champaign
4:00 - 5:00: Panel Discussion : Computer Engineering and Professor McCluskey’s Impact
Moderator : Dr. Bill Bottoms, Third Millennium Test Solutions
Panelists : Prof. Kent Fuchs, Cornell University; Dr. Ed Eichelberger, IBM (retired); Prof. Steve Szygenda, Southern Methodist University; Prof. John Brzozowski, University of Waterloo
6:00 - 8:00: Dinner and Plenary Session, Monterey/Carmel Room
Plenary Session Chair: Prof. Dan Siewiorek, Carnegie Mellon University
Note that attendees must register separately for the workshop.
Registration: 8 AM - 5 PM
Welcome from Tom Dillinger, ICCD 2004 General Chair, and Ed Grochowski, ICCD 2004 Technical Program Chair
9:30-10:15
Abstract
VLSI system performance increased by five orders of magnitude
in the last three decades, made possible by continued technology scaling, improving
transistor performance to increase frequency, increasing integration capacity
to realize complex architectures, and reducing energy consumed per logic operation
to keep power dissipation within limit. The technology treadmill will continue,
providing integration capacity of billions of transistors; however, power and
energy consumption will be the barriers. Performance at any cost will not be
an option in the future; system architectures will have to emphasize performance
delivered in a given power envelope, with complexity limited by energy efficiency.
This talk will discuss potential solutions in process technology, circuits,
and microarchitectures to exploit future gigascale integration capacity. The
concept of system on a chip (SOC) will help integrate diverse functional blocks,
providing valued performance. The talk will conclude with recommendations to
the chip and system designers on how to exploit these emerging paradigms.
Biography
Shekhar Borkar graduated with MS in Physics from University and Bombay,
MSEE from University of Notre Dame in 1981, and joined Intel Corporation. He
worked on the 8051 family of microcontrollers, the iWarp multicomputer project,
and subsequently on Intel's supercomputers. He is
an Intel Fellow and director of Circuit Research. His research interests are
high performance and low power digital circuits, and high-speed signaling. Shekhar
is an adjunct faculty member at Oregon Graduate Institute, and teaches VLSI
design.
10:15-11:00
Professor Melvin Breuer, USC
Abstract
Because
of trends in scaling, in the near future every high performance dice will contain
a massive number of defects and process aggravated noise and performance problems.
In an attempt to obtain useful yields, designers and test engineers will need
to adopt a qualitatively different approach to their work. They will need to
learn, enhance and deploy techniques such as fault- and defect-tolerance. For
some applications, they may even apply error-tolerance, a somewhat controversial
emerging paradigm. A circuit is error-tolerant (ET)
with respect to an application, if (1) it contains defects that cause internal
and may cause external errors, and (2) the system that incorporates this circuit
produces acceptable results . In this
presentation we illustrate and give quantitative bounds on several factors that
will shape the future of digital design. We compare and contrast defect and
fault-tolerant schemes with that of error-tolerance. We discuss how yield can
be optimized by appropriately selecting the granularity of spares in light of
defect densities and interconnect complexity. Finally, we show that several
large classes of consumer electronic applications are resilient to errors, and
how error-tolerance can then be used to significantly enhance effective yield.
Melvin A. Breuer received his Ph.D. in electrical engineering from the University of California, Berkeley, and is the Charles Lee Powell Professor of Electrical Engineering and Computer Science at the University of Southern California. He was Chairman of the Department of Electrical Engineering-Systems from 1991-1994, and again from 2000-2003. He was Chair of the Faculty of the School of Engineering, USC, for the 1997-98 academic year. His main interests are in the area of computer-aided design of digital systems, design-for-test and built-in self-test, and VLSI circuits.
11:00-11:45
Keynote
Address: Digital Integrated
Circuit Testing for Art Historians and Test Experts
Professor E.J. McCluskey, Stanford University Center for Reliable Computing
Abstract
This talk is my attempt to identify the basic concerns in digital IC production testing. The details, too often, crowd in and prevent us from understanding why we are having difficulties: why testing is too costly or why too many bad chips escape (pass) the test despite our best efforts. Then I want to explore some of the myths representing the common wisdom about testing. My opinions will be supported by the results of experiments - not simulations, but real-world tests - carried out on actual chips from various technologies.
Biography
Professor McCluskey worked on electronic switching systems at the Bell Telephone Laboratories from 1955 to 1959. In 1959, he moved to Princeton University, where he was Professor of Electrical Engineering and Director of the University Computer Center. In 1966, he joined Stanford University, where he is Professor of Electrical Engineering and Computer Science, as well as Director of the Center for Reliable Computing. He founded the Stanford Digital Systems Laboratory (now the Computer Systems Laboratory) in 1969 and the Stanford Computer Engineering Program (now the Computer Science MS Degree Program) in 1970. The Stanford Computer Forum (an Industrial Affiliates Program) was started by Dr. McCluskey and two colleagues in 1970 and he was its Director until 1978.
Professor McCluskey developed the first algorithm for designing combinational circuits - the Quine-McCluskey logic minimization procedure as a doctoral student at MIT. At Bell Labs and Princeton, he developed the modern theory of transients (hazards) in logic networks and formulated the concept of operating modes of sequential circuits. His Stanford research focuses on logic testing, synthesis, design for testability, and fault-tolerant computing. Prof. McCluskey and his students at the Center for Reliable Computing worked out many key ideas for fault equivalence, probablilistic modelling of logic networks, pseudo-exhaustive testing, and watchdog processors. He collaborated with Signetics researchers in developing one of the first practical multivalued logic implementations and then worked out a design technique for such circuitry.
Dr. McCluskey served as the first President of the IEEE Computer Society. He is the recipient of the 1996 IEEE Emanuel R. Piore Award. He is a Fellow of the IEEE, AAAS, and ACM; and a member of the NAE. He has honorary doctorates from the University of Grenoble and Bowdoin College. He has published several books including two widely used texts.
11:45-1:00
Lunch, Siskiyou Room
1:00-3:00
Session 1.1: High-Speed and Energy-Efficient Circuit Design, Santa Clara Room
Session Chair : Nestoras Tzartzanis, Fujitsu Laboratories of America
This session presents circuit
techniques for microprocessors and digital signal processors. The first paper
describes a new ternary CAM architecture that results in significant
area savings and better energy-efficiency compared to standard designs.
The second paper introduces an encoding technique that improves crosstalk between
adjacent wires in long buses. Simulations indicate that the proposed technique
achieves up to 10% energy savings compared to shielding. The next three papers
discuss new techniques that advance the design of multipliers. The first of
these papers proposes an asynchronous Booth multiplier for energy- and area-critical
applications. Each Booth iteration requires 1.08ns in a 0.18um, 1.8V CMOS process.
The next paper presents an iterative decimal multiplier that uses a new decimal
representation for intermediate partial products. A standard-cell implementation
of the proposed multiplier is estimated to operate at 2GHz in a 0.13um CMOS
process. The last paper of the session introduces a twin-precision multiplier
that is capable of performing one N bit, one N/2 bit, or two N/2 bit multiplications
in parallel without significant changes or overhead compared to a conventional
multiplier.
1.1.1.
PCAM: A Ternary CAM Optimized for Longest Prefix Matching Tasks
Mohammad
J. Akhbarizadeh, Mehrdad Nourani, Deepak-Sarathi V., and Poras Balsara, University
of Texas at Dallas
1.1.2. Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip Buses
Srinivasa R.Sridhara, Arshad Ahmed, and Naresh R.Shanbhag, University of Illinois at Urbana-Champaign
1.1.3. An Area- and Energy-Efficient Asynchronous Booth Multiplier for Mobile Devices
Justin Hensley, Montek Singh and Anselmo Lastra, University of North Carolina at Chapel Hill
1.1.4. A High Frequency Decimal Multiplier (short)
Robert D. Kenney, Michael J. Schulte, University of Wisconsin at Madison ; Mark Erle, IBM
1.1.5. An Efficient Twin-Precision Multiplier (short)
Magnus Själander, Henrik Eriksson, and Per Larsson-Edefors, Chalmers University of Technology, Goteborg, Sweden
Session 1.2 : Energy-Efficient Processor Microarchitecture
(1), San Jose Room
Session chair: Peter-Michael Seidel, Southern Methodist University
Processor micro-architectures
are increasingly becoming power-aware and energy-efficient. The first paper
proposes a new dynamic scheduler design that reduces scheduler critical path
latency and reduces power consumption with minimal performance impact. The second
paper proposes a power-aware cache block allocation algorithm. The third paper
proposes techniques for clustered micro-architecture that reduce both average
and peak temperatures. The last paper of this session discusses a dynamic power-aware
issue queue design for multimedia applications.
1.2.1. Defining Wakeup Width for Efficient Dynamic Scheduling
Aneesh Aggarwal, Oguz Ergin, Binghamton University ; Manoj Franklin, University of Maryland
1.2.2. Power-aware deterministic block allocation for low-power way-selective cache structure
Jung-Wook Park, Shin-Dug Kim, Yonsei University, Seoul, Korea ; Gi-Ho Park, Sung-Bae Park, Samsung Electronics
1.2.3. Thermal-Aware Clustered Microarchitectures
Pedro Chaparro, José González, Antonio González, Intel Labs, Barcelona
1.2.4. Reducing Issue Queue Power for Multimedia Applications using a Feedback Control Algorithmcing Issue Queue Power for Multimedia Applications using a Feedback Control AlgoYu Bai and R. Iris Bahar, Brown University thm
Session
1.3: Scan Design and Test, Carmel Room
Session chair: Prab Varma, Veritable
This session presents several new techniques for scan design and test. The first paper proposes a power supply gating technique to reduce test power in portable devices employing periodic self-test. The second paper introduces a new scan control technique for low area overhead. The third paper presents a testability analysis and DfT insertion methodology for end-to-end mixed-signal paths in which the DfT insertion problem is formulated as a min-cost set cover problem. The fourth paper examines creating functional scan chains at the register-transfer level to improve timing and test data compression. The final paper evaluates a transparent-scan approach that allows very aggressive test compaction without increasing the number of undetectable faults.
1.3.1. A Novel Low-Power Scan Design Technique Using Supply Gating
S. Bhunia, H. Mahmoodi, D. Ghosh, S. Mukhopadhyay, and K. Roy, Purdue University
1.3.2. Asynchronous Scan-Latch controller for Low Area Overhead DFT
Masayuki Tsukisaka, Masashi Imai, Takashi Nanya, University of Tokyo
1.3.3. End-to-end testability analysis and DfT insertion for mixed-signal paths
Sule Ozev, Duke University ; Alex Orailoglu, University of California at San Diego
1.3.4. Functional Illinois Scan Design at RTL (short)
Ho Fai Ko and Nicola Nicolici, McMaster University
1.3.5. On Undetectable Faults in Partial Scan Circuits Using Transparent-Scan (short)
Irith Pomeranz, Purdue University ; Sudhakar M. Reddy, University of Iowa
3:30-5:30
Session 2.1: Routing and Floorplanning, Santa Clara
Session chair: Hung-Ming Chen, NCTU, Taiwan
This session presents new formlations and techniques for routing and floorplanning. The first paper presents a method for incremental routing with a view to limit perturbations of existing nets using a constraint satisfiing depth-first search approach. The second paper presents a tile-based global routing technique that allocates shields and buffers to each net in each tile to reduce crosstalk and satisfy resource constraints. The third paper presents a channel-based routing algorithm for bus interconnects on high-speed boards with minimum and maximum delay constraints. The final paper develops an area-efficient floorplanner that satisfies constraints that provide guaranteed yields.
2.1.1. A Depth-First-Search Controlled Gridless Incremental Routing Algorithm for VLSI Circuits
Hasan Arslan and Shantanu Dutt, University of Illinois at Chicago
2.1.2. Simultaneous Shield and Buffer Insertion for Crosstalk Noise Reduction in Global Routing
Tianpei Zhang and Sachin S. Sapatnekar, University of Minnesota
2.1.3. A Two-Layer Bus Routing Algorithm for High-Speed Boards
Muhammet Mustafa Ozdal, Martin D. F. Wong, University of Illinois at Urbana-Champaign
2.1.4. Reticle Floorplanning with Guaranteed Yield for Multi-Project Wafers
Andrew B. Kahng and Sherief Reda, University of California at San Diego
Session 2.2 : Formal Verification (Embedded
Tutorial), San Jose
Session chair: Alan Hu, University of British Columbia
This session begins with a talk on partitioning large combinational logic cones into smaller pieces through intermediate variables. These intermediate variables, together with the state variables, are treated as atoms in abstraction refinement. The second talk compares different methods for evaluation of formulas expressing microprocessor correctness in the logic of Equality with Uninterpreted Functions and Memories (EUFM) by translation to propositional logic, given recently developed efficient Boolean-to-CNF translations, in order to identify the best overall translation strategy from EUFM to CNF.
2.2.1.
Fine-Grain Abstraction and Sequential Don't Cares for Large Scale Model
Checking
Chao
Wang, Gary D. Hachtel, and Fabio Somenzi, University of Colorado at Boulder
2.2.2. Comparative Study of Strategies for Formal Verification of High-Level Processors (tutorial)
Miroslav Velev, Reservoir Labs
Session
2.3: Signal Integrity and Leakage, Carmel
Session chair: Azeez Bhavnagarwala, IBM
This session addresses two increasingly important circuit design considerations: signal integrity and leakage dissipation. The first paper analyzes the effect of soft errors in static CMOS circuits and proposes a new technique to reduce their rate by 70% with a 4% area overhead. The second paper presents a cost-effective method to test signal integrity in PCB buses. The proposed method requires laboratory tests at moderate frequencies (i.e., 50-100 MHz). The next paper introduces a new method for leakage reduction in repeaters. A mixture of low- and high-threshold voltage transistors are used in the repeater pull-down and pull-up networks to adjust their effective threshold voltage. Circuit simulations indicate that total power is reduced by up to 38% for a 0.13um, 1.2V CMOS process. The last paper discusses leakage reduction techniques for SRAM-based FPGAs.
2.3.1. A Highly-Efficient Technique for Reducing Soft Errors in Static CMOS Circuits
Srivathsan Krishnamohan and Nihar R. Mahapatra, Michigan State University
2.3.2. A Signal Integrity Test Bed for PCB Buses
Jihong Ren and Mark R. Greenstreet, University of British Columbia
2.3.3. A New Threshold Voltage Assignment Scheme for Runtime Leakage Reduction in On-Chip Repeaters
Saumil Shah, Kanak Agarwal, Dennis Sylvester, University of Michigan
2.3.4. A General Post-Processing Approach to Leakage Current Reduction in SRAM-based FPGAs
Jason Brandon, John Lach, Kevin Skadron, University of Virginia
Registration: 8 AM - 4 PM
8:30-10:30
Session 3.1: Special Session on High-Performance
On-Chip Communication, Santa Clara
Session chair: Kevin Rudd, Intel
High-performance on-chip communication has become a key design aspect of modern integrated circuits.
The first talk examines the challenges of high-speed communication in DSM technologies and surveys several solutions. The second paper presents a technique to pipeline wire delays transparently to the IP blocks being connected. The third paper describes a communication-centric design methodology using the globally asynchronous design paradigm. The final paper describes a global approach for building Network-on-Chip (NoC) designs based on an innovative switching fabric.
3.1.1. Design Methodologies and Architecture Solutions for High-Performance
Interconnects (invited)
Davide Pandini, Cristiano Forzan, Livio Baldi, STMicroelectronics
3.1.2. On-Chip Transparent Wire Pipelining (invited)
Mario R. Casu and Luca Macchiarulo, Politecnico di Torino
3.1.3. Towards an Integrated Design Methodology for Fault-Tolerant, Multiple Clock/Voltage Integrated Systems (invited)
Radu Marculescu, Diana Marculescu, Larry Pileggi, Carnegie Mellon University
3.1.4. Network-on-Chip: the intelligence is in the wire (invited)
Gerard Mas, STMicroelectronics ; Philippe Martin, Arteris
Session 3.2 : Test Generation and Characterization,
San Jose
Session chair: Jacob Abraham, University of Texas
Test generation and silicon characterization
are becoming increasingly complex problems. The first talk presents a new encoding
scheme that provides a second stage of compression after LFSR reseeding to significantly
reduce test power and storage. The second talk presents an IP block for on-chip
clock jitter measurement. The third talk presents a novel approach for hold
time fault diagnosis. The next talk proposes two different methodologies for
test cost reduction in scan-based designs. The final talk presents a set of
methods, collectively known as testing knowledge, aimed at increasing the quality
of automatically generated system-level test cases.
3.2.1. Low Power Test Data Compression Based on LFSR Reseeding
Jinkyu Lee and Nur A. Touba, University of Texas at Austin
3.2.2. An Infrastructure IP for On-Chip Clock Jitter Measurement
Jui-Jer Huang, and Jiun-Lang Huang, National Taiwan University
3.2.3. Diagnosis of Hold Time Defects
Zhiyuan Wang, Malgorzata Marek-Sadowska, University of California at Santa Barbara ; Kun-Han Tsai, Janusz Rajski, Mentor Graphics
3.2.4. Extending the Applicability of Parallel-Serial Scan Designs (short)
Baris Arslan, Ozgur Sinanoglu and Alex Orailoglu, University of California at San Diego
3.2.5. Quality Improvement Methods for System-level Stimuli Generation (short)
Roy Emek, Itai Jaeger, Yoav Katz, Yehuda Naveh, IBM Research, Haifa
Session
3.3: Physically-Aware Design Tools, Carmel
Session chair: John Lach, University of Virginia
This session presents tools that estimate or optimize various physical metrics at different levels of the VLSI design flow. The first paper provides a tool for measuring the impact of crosstalk on path delays using a path-based (as opposed to the more pessimistic net-based) approach. The second paper discusses an efficient data structure for computing optimal buffer insertions in distributed RC-tree routings. The third paper presents a 0/1 ILP method for high-level synthesis that, in addition to the usual functions of scheduling and binding, also performs resource allocation to logic layers of a 3D-chip to minimize the number of inter-strata vias, and under appropriate constraints. The fourth paper presents a technique for reordering the pins and transisitors in dual Tox gates in order to reduce gate-oxide and subthreshold leakage currents under delay constraints.
3.3.1. XTalkDelay: A Crosstalk-aware Timing Analysis Tool for Chip-level Designs
Rajeev Murgai, Takashi Miyoshi, Fujitsu Laboratories of America ; Yinghua Li, University of California at Berkeley ; Ashwini Verma, Amdocs
3.3.2. A flexible data structure for efficient buffer insertion
Ruiming Chen, Hai Zhou, Northwestern University
3.3.3. Simultaneous Scheduling, Binding and Layer Assignment for Synthesis of Vertically Integrated 3D Systems
Madhubanti Mukherjee and Ranga Vemuri, University of Cincinnati
3.3.4. Transistor and Pin Reordering for Gate Oxide Leakage Reduction in Dual Tox Circuits
Anup Kumar Sultania, Sachin S. Sapatnekar, University of Minnesota ; Dennis Sylvester, University of Michigan
11:00-12:30
Session 4.1: Energy-Efficient Processor Microarchitecture (2), Santa Clara
Session chair: Steve Krueger, Texas Instruments
This session continues the theme
of power-aware microprocessor designs. The first paper describes the tradeoff
between latency and throughput performance, analyzing a set of four techniques
for achieving variable energy per instruction. The second paper analyzes the
power saving benefits of halting instruction fetching during critical long latency
instructions, such as a critical load miss. The final paper presents a novel
micro-architecture combining the benefits of both clustering and GALS (Globally
Asynchronous Locally Synchronous) design techniques.
4.1.1. Best of Both Latency and Throughput
Ed Grochowski, Ronny Ronen, John Shen, Hong Wang, Intel Labs
4.1.2. Fetch Halting on Critical Load Misses
Brian Singer, Nikil Mehta, R. Iris Bahar, Brown University ; Michael Leuchtenburg, Richard Weiss, Hampshire College
4.1.3. Frontend Frequency-Voltage Adaptation for Optimal Energy-Delay^2
Grigorios Magklis,
Jose Gonzalez, Antonio Gonzalez, Intels Barcelona Research Center
Session 4.2 : Power and Timing Optimization, San Jose
Session chair: Sinan Kaptanoglu, Actel
Papers in this session present techniques for performance optimization in power and timing. The first paper is concerned with leakage power reduction, and examines the approach of gate-sizing and Vt assignment. The optimization is formulated as a mixed-integer linear programming problem. The second paper exploits clock skew to improving timing, and uses a metric known as potential slack to guide the optimization. A new linear-programming formulation yields significant speedup over previous approaches. The third paper is concerned with the power optimization in the context of increasing process variability, and proposes a statistical method for gate sizing for this optimization problem.
4.2.1. Gate Sizing and Vt Assignment for Active-Mode Leakage Power Reduction
Feng Gao and John P. Hayes, University of Michigan
4.2.2. Potential Slack Budgeting with Clock Skew Optimization
Kai Wang and Malgorzata Marek-Sadowska, University of California at Santa Barbara
4.2.3. A New Statistical Optimization Algorithm for Gate Sizing
Murari Mani, Michael Orshansky, University of Texas at Austin
Session
4.3: Novel Processor Design, Carmel
Session chair: Ed Grochowski, Intel
This session begins with a general system architecture for searching, filtering, compression, encryption, and other operations on unstructured data streaming from a disk system. The second paper describes a prototype Itanium microprocessor written in the Bluespec hardware description language and synthesized into an FPGA. The third paper presents a novel method to reduce the memory bandwidth required for fetching texture image data through adaptive selection of a cache index.
4.3.1. An Architecture for Fast Processing of Large Unstructured Data Sets
Mark Franklin, Roger Chamberlain, Michael Henrichs, Berkley Shands, Jason White, Washington University
4.3.2. In-System FPGA Prototyping of an Itanium Microarchitecture
Roland E. Wunderlich and James C. Hoe, Carnegie Mellon University
4.3.3. Adaptive Selection of an Index in a Texture Cache
Chun-Ho Kim, Lee-Sup Kim, Korea Advanced Institute of Science and Technology (KAIST)
12:30-1:30
Lunch, Donner Room
1:30-3:00
Session 5.1: Emerging Technologies (Special Session), Santa Clara
Session chair: AJ KleinOsowski, IBM Austin Research Laboratory
The emerging technologies session
begins with a survey of nano-scale device technologies. The authors discuss
how system-level research can be used to influence device development and propose
a design methodology that addresses whether computationally interesting and
buildable circuits are possible with Quantum-dot Cellular Automata (QCA). The
second talk presents new techniques for modeling quantum circuits, including
the design of an FPGA emulator for quantum algorithms. The final talk presents
a 3D die stacking technology in which an IA-32 microprocessor is partitioned
between two die in order to simultaneously improve performance and reduce power.
5.1.1. Using Circuits and Systems-Level Research to Drive Nanotechnology
(invited)
Michael T. Niemier, Ramprasad Ravichandran, Georgia Institute of Technology ; Peter M. Kogge, University of Notre Dame
5.1.2. FPGA Emulation of Quantum Circuits
Ahmed Usman Khalid, Zeljko Zilic, McGill University ; Katarzyna Radecka, Concordia University
5.1.3. 3D Processing Technology and its Impact on iA32 Microprocessors (invited)
Don Nelson, Clair Webb, Nick Samra, Bryan Black, Intel Corporation
Session 5.2 : Cache Memory Design, Carmel
Session chair: Srikanth Srinivasan, Intel
This session examines issues in
cache memory design. The first talk presents a cache access time model which
optimizes the memory array for minimum access and cycle times. The second talk
explores tracking cache request history using a novel hardware monitoring framework
to improve scheduling and performance of a simultaneous multithreaded processor.
The final talk presents a CAM-based cache design that uses prediction to reduce
energy consumption.
5.2.1. Cache Array Architecture Optimization at Deep Submicron Technologies
Annie Y. Zeng, Ken Rose, Ronald J. Gutmann, Rensselaer Polytechnic Institute
5.2.2. Implementation of Fine-Grained Cache Monitoring for Improved SMT scheduling
Joshua Kihm, Daniel Connors, University of Colorado
5.2.3. Using Prediction to Reduce Energy Consumption in Highly-Associative Caches for Embedded Processors (short)
Alex Veidenbaum, Dan Nicolaescu, University of California at Irvine
3:30-5:00
Session 6.1: Layout-Driven Circuit Optimization, Santa Clara
Session chair: W. Rhett Davis, North Carolina State University
This session addresses CAD-related
issues associated with circuit design. The first paper studies the effectiveness
of via-configurable gate arrays for the implementation of circuits ranging from
simple gates and memory cells to arithmetic units. The second paper proposes
an algorithm that simplifies the netlist of hybrid structured clock trees and
therefore significantly reduces the simulation time required for their analysis.
The third paper presents a new formalism for layout-driven optimization of datapaths
which has been integrated into a commercial Electronic Design Automation environment.
The last paper discusses a high-level synthesis methodology for digital filter
which is based on floorplan-aware complexity reduction.
6.1.1. The Magic of a Via-Configurable Regular Fabric
Yajun Ran and Malgorzata Marek-Sadowska, University of California at Santa Barbara
6.1.2. A Fast Delay Analysis Algorithm for the Hybrid Structured Clock Network
Yi Zou, Yici Cai, Qiang Zhou, Xianlong Hong, Tsinghua University ; Sheldon Tan, University of California at Riverside
6.1.3. Layout Driven Optimization of Datapath Circuits using Arithmetic Reasoning (short)
Ingmar Neumann, Domink Stoffel, Kolja Sulimma, Wolfgang Kunz, University of Kaiserslautern ; Michel Berkelaar, Magma Design Automation
6.1.4. Floorplan-aware High-level Synthesis for Digital Filters (short)
Dongku Kang, Hunsoo Choo, Kaushik Roy, Purdue University
Session 6.2 : Instruction-Level Parallelism (1), San Jose
Session chair: Nihar Mahapatra, Michigan State University
This session presents techniques for faster instruction execution. The first paper shows the benefits of combining three different thread spawning policies for SMT, achieving better performance gain overall than either of these alone. The second paper proposes a TriBank register file design with higher bandwidth and smaller access time compared to convention large register file designs. The final paper exploits slack in instruction schedules to design dynamically scaled pipelines with multiple clock domains resulting in improved power-performance.
6.2.1. A Minimal Dual-Core Speculative Multi-Threading Architecture
Srikanth Srinivasan, Haitham Akkary, Tom Holman, Konrad Lai, Intel Corporation
6.2.2. Exploiting Quiescent States in Register Lifetime
Rama
Sangireddy, University of Texas at Dallas ; Arun K. Somani, Iowa State University
6.2.3. Evaluating Techniques for Exploiting Instruction Slack (short)
Yau Chin, John Sheu, David Brooks, Harvard University
Session
6.3: Power Estimation and Minimization, Carmel
Session chair: Yuan Xie, Pennsylvania State University
Papers in this session are concerned with power estimation and minimization at various levels of a system - from multiprocessors down to gates. The first paper uses continuous transition probability waveforms to represent gate delays as random variables to account for manufacturing process variations, and uses them to compute dynamic power dissipation efficiently and with low error. The second paper discusses approaches to take cognizance of circuit topology to restructure an ILP over circuit states for optimization problems in order to improve the perfromance of the ILP. Power estimation of communication primitives in on-chip multiprocessors viasoftware-interrupt calls to power estimators for component functions of the primitives, memory access, bus access and instruction execution, is addressed in the third paper. The final paper of the session presents a system-level power estimation tool for analog-to-digital converters (ADCs) based on a parameterized power estimation function that can be applied to basic ADC components.
6.3.1. Static Transition Probability Analysis under Uncertainty
Siddharth Garg, Siddharth Tata, Ravishankar Arunachalam, Indian Institute of Technology at Madras
6.3.2. Circuit-based Preprocessing of ILP and Its Applications in Leakage Minimization and Power Estimation
Donald Chai, University of California, Berkeley ; Andreas Kuehlmann, Cadence Berkeley Labs
6.3.3. Analyzing Power Consumption of Message Passing Primitives in a Single-chip Multiprocessor (short)
Mirko Loghi, Massimo Poncino, Universit`a di Verona ; Luca Benini, Universit`a di Bologna
6.3.4. An Architectural Power Estimator for Analog-to-Digital Converters (short)
Zhaohui Huang, Peixin Zhong, Michigan State University
Banquet: Doubletree Hotel, Donner Room
Panel Discussion: Grid Computing, Session Chair: Tobin Lehman, IBM
Panelists : Inderpal Narang, IBM Research ; Nader Shaterian, Marsys ; David Epstein, Cross Link Capital ; Steve Kishi, Hummer Winblad Venture Partners
Registration: 8:30 AM - 4 PM
9:00-10:30
Session 7.1: Formal Verification Techniques,
Santa Clara
Session chair: Miroslav Velev, Reservoir Labs
The first paper addresses the special challenges in the formal verification of high-performance circuits that involve redundant number representations and partial compressions, where symbolic consideration of number representations is not sufficient and additional properties beyond operand values need to be considered to show correct operation. The second paper proposes a simulation-friendly version of the Generalized Symbolic Trajectory Evaluation (GSTE) specification that is convertable into conventional GSTE to access the formal verification tool flow, as well as convertable into completely non-symbolic monitor circuits suitable for conventional dynamic verification. The final paper presents a graph automorphism-based algorithm for computing maximal sets of symmetric inputs of circuitsthat can be used to identity nonsymmetric inputs in a circuit and enhance the efficiency of input matching, technology mapping, and logic verification.
7.1.1 Formal Hardware Verification
based on Signal Correlation Properties
Nikhil Kikkeri, Peter-Michael Seidel, Southern Methodist University
7.1.2. Generating Monitor Circuits for Simulation-Friendly GSTE Assertion Graphs
Kelvin Ng, Alan J. Hu, Jin Yang, University of British Columbia
7.1.3. Graph Automorphism-based Algorithm for Determining Symmetric Inputs (short)
Chen-Ling Chou, Geeng-Wei Lee, Jing-Yang Jou, National Chiao Tung University ; Chun-Yao Wang, National Tsing Hua University
Session 7.2 : Networks on Chips, San Jose
Session chair: Hai Zhou, Northwestern University
This session addresses problems in building network-on-chip (NoC) architectures, which have been proposed as a solution to complex on-chip interconnect problems. The first paper addresses the problem of virtualizing and placing logic processing units to achieve thermally balanced designs. The second paper formulates a mixed-integer linear programming (MILP) problem for the synthesis of low-power NoC architectures subject to performance constraints, and proposes heuristics to reduce the runtime of the optimization. The third paper tackles the problem of mapping cores onto an NoC architecture to minimize either energy consumption or congestion. The paradigm of many-to-many core-switch mapping is introduced, and the problem is formulated as MILP to take into account core placement, switches for each core, and network interfaces for communication flows.
7.2.1. Linear Programming based Techniques for Synthesis of Network-on-Chip Architectures
Krishnan Srinivasan, Karam S. Chatha, Goran Konjevod, Arizona State University
7.2.2. Thermal-Aware IP Virtualization and Placement for Networks-on-Chip Architecture
W. Hung, C. Addo-Quaye, T. Theocharides, Y. Xie, N. Vijaykrishnan, M. J. Irwin, Penn State University
7.2.3. Many-to-Many Core-Switch Mapping in 2-D Mesh NoC Architectures
Chae-Eun Rhee, Samsung Electronics ; Han-You Jeong, Soonhoi Ha, Seoul National University
Session
7.3: Novel Processor Architecture, Carmel
Session chair: Pradeep Dubey, Intel
This session is devoted to novel processor architecture and micro-architecture features. The first paper proposes a reconfigurable SIMD DSP capable of instantly scaling single instruction stream over multiple data streams (ISSIMD). The second paper presents architectural technique called, Runtime Execution Monitoring (REM) to detect program flow anomalies resulting from various worm and virus attacks. The final contribution of this session is about dynamic address compression schemes applied to on-chip address buses and resulting cost, power, and performance benefits.
7.3.1. An Embedded Reconfigurable SIMD DSP with Capability of Dimension-Controllable Vector Processing
Liang Han, Jie Chen, Ying Li, Chaoxian Zhou, Xin Zhang, Zhibi Liu, Xiaoyun Wei, Baofeng Li, Institute of Microelectronics, Chinese Academy of Science
7.3.2. Runtime Execution Monitoring (REM) to Detect and Prevent Malicious Code Execution
A. Murat Fiskiran, Ruby B. Lee, Princeton University
7.3.3. Dynamic Address Compression Schemes: A Performance, Energy, and Cost Study
Jiangjiang Liu, University at Buffalo ; Krishnan Sundaresan, Nihar R. Mahapatra, Michigan State University
11:00-12:30
Session 8.1: Instruction-Level Parallelism (2), Santa Clara
Session chair: Nihar Mahapatra, Michigan State University
This session continues the theme
of improved performance through ILP techniques. The first paper proposes a compile-time
method to construct frames using profiling and static analysis to produce well-optimized
frequently executed frames with minimum recovery penalty. The second paper discusses
a technique that dynamically assigns ILP classification to load instructions
and uses this class label to dynamically control cache associativity. The final
paper discusses novel register management techniques aimed at large physical
register files through early register de-allocation which result in improved
performance.
8.1.1. Compiler-Based Frame Formation for Static Optimization
Feng Shi, Sobeeh Almukhaizim, Pey-Chang Lin, and Yiorgos Makris, Yale University
8.1.2. IPC Driven Dynamic Associativity Management Cache Architecture for Low energy
Sriram Nadathur, Akhilesh Tyagi, Iowa State University
8.1.3. Increasing Processor Performance through Early Register Release
Oguz Ergin, Deniz Balkan, Dmitry Ponomarev, Kanad Ghose, State University of New York at Binghamton
Session 8.2 : Topics in Synthesis and Co-simulation,
San Jose
Session chair: Rajeev Murgai, Fujitsu
This session consists of papers
in various topics in synthesis and co-simulation. The first paper proposes a
combined channel segmentation and buffer insertion to improve the routability
of FPGA designs. The second paper presents a framework for modeling and analyzing
heterogeneous networks, based on a tight integration of a network simulator
with embedded software, middleware, and a real-time operating system. The third
paper proposes a new architecture for a hardware-accelerated boolean satisfiability
solver; the design, intended for FPGA implementation, is modeled and simulated
in SystemC. The last paper presents a dual-rail encoding method for creating
combinational logic network with a signal asserting the stability of all other
outputs, to overcome timing problems resulting from variability of combinational
logic delays.
8.2.1. Combined FPGA Channel Segmentation and Buffer Insertion for Simultaneous Routability and Performance Improvement
Hu Huang, Joseph B. Bernstein, Martin Peckerar, Ji Luo, University of Maryland
8.2.2. Software/Network Co-Simulation of Heterogeneous Industrial Networks Architectures
S. Martini, G. Perbellini, F.Fummi, M.Poncino, M.Monguzzi, Universit`a di Verona
8.2.3. Hardware/Software Co-modeling of a SAT Solver Based on Distributed Computing Elements using SystemC (short)
Jinwen Xi, Peixin Zhong, Michigan State University
8.2.4. Coping with the variability of combinational logic delays (short)
Jordi Cortadella, Univ. Polit`ecnica Catalunya, Barcelona ; Alex Kondratyev, Cadence Berkeley Labs ; Luciano Lavagno, Politecnico di Torino ; Christos Sotiriou, ICS-FORTH
Session 8.3 : Low-Power Architecture, Carmel
Session chair: Kee Sup Kim, Intel
This session examines architectural techniques for low power. The first paper presents a design methodology for power-aware networks in which communication links are turned on and off in response to bursts and dips in traffic. The second paper evaluates several hardware-based data prefetching techniques from an energy perspective, and explores their energy/performance tradeoffs. The final paper introduces an efficient Montgomery multiplier for the modular exponentiation operation, which is fundamental to numerous public key cryptosystems.
8.3.1. Design-Space Exploration of Power-Aware On/Off Interconnection Networks
Vassos Soteriou, Li-Shiuan Peh, Princeton University
8.3.2. Energy Characterization of Hardware-Based Data Prefetching
Yao Guo, Saurabh Chheda, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, University of Massachusetts at Amherst
8.3.3. Design and Implementation of Scalable Low-Power Montgomery Multiplier
Hee-Kwan Son, Sang-Geun Oh, Samsung Electronics
12:30-1:30
Lunch, Poolside Foyer
1:30-2:30
Session 9.1: Test Generation, Santa Clara
Session chair: Nina Saxena, Intel
This session examines issues in test generation and diagnosis. The first paper introduces a new method for deterministic diagnosis of logic cores based on on-chip decompression and comparison of incompletely specified test patterns and test responses. The second paper proposes an automatic test pattern generation framework for combinational threshold networks, as employed in several emerging nanotechnologies. The final paper presents an algorithm for memory repair problems using BDD that is not only perfect but also highly efficient.
9.1.1. Compressed Embedded Diagnosis of Logic Cores
Scott
Ollivierre, Adam B. Kinsman, Nicola Nicolici, McMaster University
9.1.2. An Automatic Test Pattern Generation Framework For Combinational Threshold Networks
Pallav Gupta, Rui Zhang, Niraj K. Jha, Princeton University
9.1.3. An Efficient Algorithm for Reconfiguring Shared Spare RRAM (short)
Hung-Yau Lin, Hong-Zu Chou, Fu-Min Yeh, Sy-Yen Kuo, National Taiwan University
Session 9.2 : Network Routing, San Jose
Session chair: Tom Dillinger, IBM
This session begins with a proposal for a new analytical model to compute message latency in a general n dimensional torus network with an arbitrary number of virtual channels per physical channel. The second paper proposes a solution to eliminate the requirement of sorting by prefix length in IP forwarding devices using Ternary Content Addressable Memories (TCAMs).
9.2.1. An Accurate Combinatorial Model for Performance Prediction of Deterministic Wormhole Routing in Torus Multicomputer Systems
Hashem Hashemi Najaf-abadi, Hamid Sarbazi-azad, Sharif University of Technology, Tehran
9.2.2. Technique to Eliminate Sorting in IP Packet Forwarding Devices
Raymond W. Baldwin, Enrico Ng, University of Illinois at Chicago
Session 9.3 : Placement and Floorplanning,
Carmel
Session chair: Rajeev Murgai, Fujitsu
Papers in this session are concerned with techniques to improve the quality and performance of placement and floorplanning. The first paper addresses the problem of I/O placement in flip-chip designs, where I/O can be placed throughout the whole chip without long wires from the periphery. A clustering approach is proposed which considers design cost reduction and signal integrity. The second paper uses the B*-tree representation to capture alignment and performance constraints in placement. The last paper presents a new data structure called the adjacent constraint graph for floorplanning; the data structure consolidates the traditional adjacency and vertical/horizontal constraint graphs into one, and allows fast insert and swap operations for the exploration of solution space.
9.3.1. I/O Clustering in Design Cost and Performance Optimization for Flip-Chip Design
Hung-Ming Chen, National Chiao Tung University ; I-Min Liu, Cadence Design Systems ; Martin D.F. Wong, University of Illinois at Urbana-Champaign; Muzhou Shao, Synopsys ; Li-Da Huang, Texas Instruments
9.3.2. Placement with Alignment and Performance Constraints Using the B*-tree Representation
Meng-Chen Wu, National Chiao Tung University ; Yao-Wen Chang, National Taiwan University
9.3.3. ACG-Adjacent Constraint Graph for General Floorplan
Hai Zhou, Jia Wang, Northwestern University