Introduction to Systems Research at SFU Dr. Alexandra Fedorova August 2007

Download Report

Transcript Introduction to Systems Research at SFU Dr. Alexandra Fedorova August 2007

Introduction to
Systems Research at SFU
Dr. Alexandra Fedorova
August 2007
Introduction
• Systems: software systems, hardware systems, the
interaction between them
• New research area at SFU, before December 2006 there
were no faculty members at SFU doing systems research
(not counting networking)
• Research opportunities at undergraduate and graduate
level:
–
–
–
–
Undergraduate honours thesis
CMPT 415
Paid research assistanships
Master’s and Ph.D.
CMPT 401 Summer 2007 © A. Fedorova
2
What is Systems Research?
• System – a collection of software and hardware components
that accomplish a certain goal
• Usually this does not include applications, but includes system
software:
– The operating system
– System libraries
• Systems research concerns with building these components
and structuring their interaction
CMPT 401 Summer 2007 © A. Fedorova
3
Systems Research at SFU
System software design
for chip multithreading
processors
Computer
Architecture
CMPT 401 Summer 2007 © A. Fedorova
Distributed
Systems
4
System Software Design for Chip
Multithreading Processors
• What is chip multithreading?
• Why is this research relevant?
• What research problems are we addressing?
CMPT 401 Summer 2007 © A. Fedorova
5
Chip Multithreading (CMT)
•
Conventional processor: one
software thread runs on a
chip at a given instant:
•
CMT processors: multiple
threads runs on the same
chip simultaneously:
A CHIP
Level-1
cache
Level-2
cache
CMPT 401 Summer 2007 © A. Fedorova
6
CMT: The Dominant Architecture
• Most new processors are CMT:
– Intel: 100% of new server processors and 90% of highperformance desktop processors are CMT by the end of 2007
• All major hardware vendors are in the CMT business:
–
–
–
–
–
–
Sun Microsystems Niagara (32 threads on the chip)
IBM Power4, Power5, Power6
Intel Hyper-threaded Xeon (servers, desktops)
Intel Core Duo (desktops and laptops)
Dell Quad core systems (2x Intel Dual-core processors)
AMD Quad core (coming up in Fall 2007)
CMPT 401 Summer 2007 © A. Fedorova
7
Why CMT?
• Running one thread per chip is inefficient
• Due to nature of modern applications, computational hardware is
underutilized
– Modern applications spend 50-60% of their CPU time accessing
memory
– While memory is accessed CPU pipeline is stalled – it is idle, not
doing anything useful
– But while it is stalled, CPU is still consuming power
– So there’s power waste with no benefit
• Idea behind CMT: while one thread stalls the pipeline, let another
thread use it
– Sort of like overlapping I/O and computation but at the micro
level
CMPT 401 Summer 2007 © A. Fedorova
8
CMT: More Efficient CPU Utilization
Stall the pipeline
thread 0
thread 1
Pipeline is busy
1:load data
from memory
1:add
2:subtract
3:load data
from cache
2:add
3:subtract
4:add
stall the pipeline
4:load data from memory
time
CMPT 401 Summer 2007 © A. Fedorova
9
How to Enable CMT?
• How to enable running multiple threads on the same
chip?
– Hardware multithreading
– Multicore processing
– Combination of the two
CMPT 401 Summer 2007 © A. Fedorova
10
Hardware Multithreading
A CHIP
Level-1
cache
Level-2
cache
• Run at least two threads on the same
processing core
• Some hardware is duplicated, some is
shared
• Shared hardware:
– Pipeline: i.e., functional units, register
files, queues
– Caches: Level-1 (L1) instruction and
data caches, Level-2 (L2) unified cache
– Interconnects
• Multithreaded processors:
– Intel Hyper-threaded Xeon
– IBM Power5, Power6, Cell
– Sun Microsystems Niagara
CMPT 401 Summer 2007 © A. Fedorova
11
Multicore Processing
A CHIP
L1
cache
L1
cache
L2 cache
CMPT 401 Summer 2007 © A. Fedorova
• Multiple processing cores on
the same chip
• Threads share the L2 cache
(and other lower-level
caches), and interconnects
• Multicore processors:
– Intel Core Duo
– AMD Quad Core
– IBM Power4, 5, 6
– Sun Microsystems Niagara
12
Multicore + Multithreading
A CHIP
L1
cache
L1
cache
• A multicore processor
• Each core is multithreaded
• Multicore and multithreaded
processors:
– Sun Microsystems
Niagara
– IBM Power5, Power6
L2 cache
CMPT 401 Summer 2007 © A. Fedorova
13
Research on CMT Processors
• Computer architecture research:
– How to design a CMT processor to achieve a good combination of:
CPU utilization, application performance, power efficiency
• System software research:
– How to design system software, i.e., the operating system, that
enables applications to perform well on these processors?
CMPT 401 Summer 2007 © A. Fedorova
14
OS Design for CMT Processors
• Operating systems are traditionally responsible for the
allocation of hardware resources
• On CMT processors, on-chip resources are shared among
threads that run simultaneously
• How you allocate those resources among threads
determines the performance that those threads will
achieve
• Let’s look at a few examples…
CMPT 401 Summer 2007 © A. Fedorova
15
Constructing Optimal Co-schedules
A CHIP
L1
cache
L1
cache
L2 cache
CMPT 401 Summer 2007 © A. Fedorova
• Blue suffers when it does not have
enough L1 cache,
• Red uses lots of L1 cache
• Green does not use much L1 cache
• Yellow does not suffer when it does
not have much L1 cache
16
Constructing Optimal Co-schedules
(cont.)
• How do we find out applications’ cache behaviour?
– Turns out you need to consider memory access patterns - this is
not trivial to measure
• How do you model interactions among applications?
– How do you know if one application’s cache usage patterns are
incompatible with another’s?
• These patterns/relationships cannot be measured directly
• Can they be modeled?
– Simple models are inaccurate
– Complex models are too inefficient to use inside an operating
system scheduler
• Approach of my group: use learning methods, feedback-directed
scheduling
CMPT 401 Summer 2007 © A. Fedorova
17
Heterogeneous Multicore Systems
• One size does not fit all
A CHIP
L1
cache
L1
cache
L2 cache
CMPT 401 Summer 2007 © A. Fedorova
– Application class A runs best on
core with feature set X
– Application class B runs best on
core with feature set Y
• Rather than designing a
homogeneous multicore system
that attempts to satisfy
everyone but satisfies no one,
design a heterogeneous
multicore system (HMC)
18
Scheduling On HMC Systems
Set A: Want to run on Core 1
A CHIP
Core 1
Core 2
L1
cache
L1
cache
Set B: Want to run on Core 2
L2 cache
CMPT 401 Summer 2007 © A. Fedorova
19
Scheduling On HMC Systems
• If you schedule all threads in Set A on their preferred core,
those threads will suffer from:
– Low amount of CPU time
– High response time
• Because there is high demand for that core, and they’d
have to share it with others
• So you might want to schedule threads on their nonpreferred core once in a while
• How do you balance between performance, fair CPU
allocation and good response time?
CMPT 401 Summer 2007 © A. Fedorova
20
Summary
• CMT systems are new and cool, yet prevalent enough for
people to care about them
• Companies are desperate to hire students with experience
on CMT systems
• If you are thinking about academic career: new and hot
research area
– Many problems
– Many opportunities to publish
• Talk to me if you are interested in research opportunities
• Tell your friends who might be interested
CMPT 401 Summer 2007 © A. Fedorova
21