Transcript Document

Computer Architecture Research Overview
Rajeev Balasubramonian
School of Computing, University of Utah
http://www.cs.utah.edu/~rajeev
1
What is Computer Architecture?
2
What is Computer Architecture?
• If the Intel Pentium4 has a faster clock speed than the
IBM Power4, does it execute your programs faster?
3
What is Computer Architecture?
• If the Intel Pentium4 has a faster clock speed than the
IBM Power4, does it execute your programs faster?
Case 1:
Completing instruction
Clock tick
Case 2:
Time
4
What is Computer Architecture?
To a large extent, computer architecture determines:
• the number of instructions used to execute a program
• the time each instruction takes to execute
• the idle cycles when no work gets done
• the number of instructions that can execute in parallel
5
A Typical Microprocessor
Branch
Predictor
L1 Instr
Cache
L2 Cache
Decode &
Rename
L1 Data
Cache
Issue Logic
ALU
ALU
ALU
ALU
Register
File
6
Architecture Trends in the 90s
• Performance was the ultimate metric
• Transistors were a limiting factor
As on-chip transistors became available in the 90s, more functionality
and complex circuitry was added to boost performance – most of the
low-hanging fruit has now been picked
7
Hitting the Wall
We have now hit the following walls:
• Single core performance
• Memory
• Complexity
• Power, temperature
8
Hitting the Power Wall
From Shekhar Borkar, MICRO’99
Power is as important a metric today as performance
9
The Advent of Multi-Core Chips
Core
Cache bank
• In the past, performance magically increased by 50% every year
• In the future, this improvement will be only ~20% every year
… unless … the application is multi-threaded!
10
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
For publications, see http://www.cs.utah.edu/~rajeev/research.html
11
Interconnects as a Bottleneck
• In the past, on-chip data transmission on wires cost almost nothing
• Interconnect speed and power has been improving, but not at the
same rate as transistor speeds
Hence, relative to computation, communication is much more expensive
• In the near future, it will take 100 cycles to travel across the chip
• 50% of chip power can be attributed to interconnects
12
Interconnects in Multi-Core Chips
CPU 1
L2
cache
CPU 2
L2
control
L2
control
L1
A
A
A
A
A
A
CPU 3
A
13
Not all Wires are Created Equal
B-Wires
Relative latency
1x
Relative area
1x
Dynamic power (W/m) 2.65a
Static Power (W/m)
1.02
L-Wires
0.5x
4x
1.46a
0.57
W-Wires
PW-Wires
1.6x
0.5x
2.9a
1.16
3.2x
0.5x
0.87a
0.31
14
Data Transfers have Varying Needs
• Example of a cache coherence transaction:
Read exclusive request for a shared block
15
Other Interconnect Choices
• Optical interconnects: speed of light, cost in converting
between optical and electrical domains
• 3D chips: reduces communication distances, low cost
for vertical signal transmission, increase in power density
16
3D Layouts
Cluster
Cache bank
Intra-die horizontal wire
Inter-die vertical wire
Die 1
Die 0
(a) Arch-1 (cache-on-cluster)
(b) Arch-2 (cluster on cluster)
(c) Arch-3 (staggered)
17
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
Clustered architectures:
relatively low complexity
scalable solution
easily handles multiple threads
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
18
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
Heterogeneous perf/power
Cores that execute the OS
Cores that verify results
• Power and temperature-efficient designs
• Designs tolerant of errors
19
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
Hardware to support
transactional memory
• Power and temperature-efficient designs
• Designs tolerant of errors
20
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
Faults are caused by high
energy particles that deposit
enough charge to toggle bits
Variations in conditions may
cause a circuit to not produce
its result in time
• Power and temperature-efficient designs
• Designs tolerant of errors
21
Research Methodologies
It’s all about the simulators!
• Simplescalar & Wattch & Hotspot: about 10,000 lines of
C code that models the flow of instructions through a
modern processor
• Inputs: configuration file that specifies processor
parameters, benchmark program (say, gzip)
• Outputs: how long the program runs on the simulated
processor (Simplescalar), how much power is consumed
(Wattch), what is the peak temperature (Hotspot)
22
Evaluating a New Idea
• Lots of reading (it’s better than waiting for divine inspiration)
• Identify bottlenecks, identify problems, develop an idea, repeatedly
question that idea
• Understand simulator
• Engineer a solution, modify simulator code (perhaps, write fewer than
1000 lines of C code)
• Analyze data (things never work the first time), engineer/optimize/debug
your solution
• Write papers
• Implement in silicon?
23
To Learn More…
• CS/EE 3810: Computer Organization
• CS/EE 6810: Computer Architecture
• CS/EE 7810: Advanced Computer Architecture
• CS/EE 7820: Parallel Computer Architecture
• CS 7937 / 7940: Architecture Reading Seminar
24
Title
• Bullet
25