Multi-Core Computing - Western Michigan University

Download Report

Transcript Multi-Core Computing - Western Michigan University

Ahmad Aljebaly
Department of Computer Science
Western Michigan University







Introduction
Motivation for Multi-Core
What is multi-core processor?
Properties of Multi-core systems
Applications benefit from multi-core
Multiprocessor memory types
Multi-core design
 Symmetric multi-core processor
 Asymmetric multi-core processor
 Advantages & disadvantages of multi-core

First Microprocessor(1970’s)
 Intel 4004

PC spreads in the world(1980’s)
 Up to 32bits microprocessor
 AMD followed Intel’s technology

Flood of Computer Tasks(1990’s)
 Increasing of Computer user
 Server management → Construct the database
▪ We need better performance of PC or Server.
→ These demands accelerate the development of
microprocessor.

Emergence of Multi-core Processor(2000’s)
 Limits of improvement of single core
 Turn over of the idea that improve the MP technology
▪ Put execution cores in one die

Exploits increased feature-size and density

Increases functional units per chip (spatial
efficiency)

Limits energy consumption per operation

Constrains growth in processor complexity

A multi-core processor is a processing system
composed of two or more independent cores (or
CPUs). The cores are typically integrated onto a
single integrated circuit die (known as a chip
multiprocessor or CMP), or they may be integrated
onto multiple dies in a single chip package.

A many-core processor is one in which the number
of cores is large enough that traditional multiprocessor techniques are no longer efficient - this
threshold is somewhere in the range of several tens
of cores - and likely requires a network on chip.


dual-core processor contains two independent
microprocessors.
A dual core set-up is somewhat comparable to
having multiple, separate processors installed
in the same computer, but because the two
processors are actually plugged into the same
socket, the connection between them is faster.
Ideally, a dual core processor is nearly twice
as powerful as a single core processor. In
practice, performance gains are said to be
about fifty percent: a dual core processor is
likely to be about one-and-a-half times as
powerful as a single core processor.


A multi-core processor implements multiprocessing in a single
physical package. Cores in a multi-core device may be coupled
together tightly or loosely. For example, cores may or may not
share caches, and they may implement message passing or
shared memory inter-core communication methods. Common
network topologies to interconnect cores include: bus, ring, 2dimentional mesh, and crossbar.
All cores are identical in symmetric multi-core systems and they
are not identical in asymmetric multi-core systems. Just as with
single-processor systems, cores in multi-core systems may
implement architectures such as superscalar, vector processing,
or multithreading.
Multi-core processors are widely used across many
application domains including: general-purpose, embedded,
network, digital signal processing, and graphics.
 The amount of performance gained by the use of a multicore processor is strongly dependent on the software
algorithms and implementation.
 Multi-core processing is a growing industry trend as single
core processors rapidly reach the physical limits of possible
complexity and speed.
 Companies that have produced or are working on multicore products include AMD, ARM, Broadcom, Intel, and
VIA.





with a shared on-chip cache memory,
communication events can be reduced to just a
handful of processor cycles.
therefore with low latencies, communication
delays have a much smaller impact on overall
performance.
threads can also be much smaller and still be
effective.
automatic parallelization more feasible.

Cores will be shared with a wide range of other
applications dynamically.

Load can no longer be considered symmetric across
the cores.

Cores will likely not be asymmetric as accelerators
become common for scientific hardware.

Source code will often be unavailable, preventing
compilation against the specific hardware
configuration.






Database servers
Web servers
Telecommunication markets
Multimedia applications
Scientific applications
In general, applications with Thread-level
parallelism (as opposed to instruction-level
parallelism)

Replicate multiple processor cores on a single die.
 The cores fit on a single processor socket.
several
threads
several
threads
several
threads
several
threads
c
o
r
e
c
o
r
e
c
o
r
e
c
o
r
e
1
2
3
4

Programmers must use threads or processes.

Spread the workload across multiple cores.

Write parallel algorithms.

OS will map threads/processes to cores

Most major OS support multi-core today.

OS perceives each core as a separate
processor.

OS scheduler maps threads/processes
to different cores.

Editing a photo while recording a TV show through a
digital video recorder.

Downloading software while running an anti-virus
program.

“Anything that can be threaded today will map
efficiently to multi-core”.

BUT: some applications difficult to parallelize.
 Better Performance
▪ For the Multi tasking
▪ e.g. Burning CD with graphic works at the
same time
 Power consumption and Heat generation
▪ Caused from the advance of CPU clock speed

Save the room of motherboard
▪ Two single cores → In one die
▪ We can use this room more efficiently
 Simplicity
▪ We need additional systems to control the several
single cores.
 Economical efficiency
▪ A dual-core is much cheaper than two single
cores

Shared memory:
In this model, there is one (large) common
shared memory for all processors.

Distributed memory:
In this model, each processor has its own
(small) local memory, and its content is not
replicated anywhere else.

Taking the idea of superscalar operations to
the next level, it is possible to put multiple
microprocessor cores onto a single chip, and
have the cores operate in parallel with one
another.
 A symmetric multi-core processor is one that has multiple
cores on a single chip, and all of those cores are identical.
▪ Example: Intel Core 2:
▪ The Intel Core 2 is an example of a symmetric multicore processor. The Core 2 can have either 2 cores on
chip ("Core 2 Duo") or 4 cores on chip ("Core 2
Quad"). Each core in the Core 2 chip is symmetrical,
and can function independently of one another. It
requires a mixture of scheduling software and hardware
to farm tasks out to each core.
All cores which exist in a die are
exactly identical


A symmetric multi-core processor is a
processor which has multiple cores that are all
exactly the same. Every single core has the
same architecture and the same capabilities.
Each core has the same capabilities, so it
requires that there is an arbitration unit to give
each core a specific task. Software that uses
techniques like multithreading makes the
best use of a multi-core processor like the Intel
Core2.

Applications
 Personal Computers
 Server / Super Computer
 An asymmetric multi-core processor is one that has
multiple cores on a single chip, but those cores might be
different designs. For instance, there could be 2 general
purpose cores and 2 vector cores on a single chip.
▪ Example: Cell Processor:
▪ IBM's Cell processor, used in the Sony PlayStation 3 video game
console is an asymmetrical multi-core processor. The Cell has 9
processor cores on board, one general purpose processor, and 8
data-processing cores. The one multipurpose core, known as the
Power Processor Element (PPE) controls the communication
between the other cores, and distributes computing tasks to the
other cores for processing. The other 8 cores are known as
Synergistic Processor Elements (SPE), and are specially designed
to have high floating-point throughput, especially with vector
operations.
▪ In an asymmetric multi-core processor, the chip has
multiple cores onboard, but the cores might be
different designs.
▪ Each core will have different capabilities.

An example of an asymmetric multi-core processor is the IBM Cell
processor.

The IBM Cell processor has 1 Power Processor Element
(PPE) that controls the chip, and 8 Synergistic Processor
Elements (SPEs) that are designed for high mathematical
throughput. The IBM Cell processor is designed as follows:
Notice how the SPE cores only connect to the PPE, and not to
each other. Notice also that the PPE core is much larger then the
individual SPE cores.
• Applications
 Super Computing:
▪ IBM's latest supercomputer,
IBM Roadrunner, is a hybrid
of General Purpose CISC
Opteron as well as Cell
processors.
•
Applications
 Home cinema
▪ Toshiba is considering
producing HDTVs using Cell.
They have already presented a
system to decode 48 standard
definition MPEG-2 streams.
This can enable a viewer to
choose a channel based on
dozens of thumbnail videos
displayed on the screen in the
same time.
• Applications
 Video Processing Card
▪ Some companies, such as
Leadtek, have plans to
release a PCI-E card based
upon the Cell to allow for
"faster than real time"
transcoding of H.264,
MPEG-2 and MPEG-4
video.
•
Applications
 Console Video Games
▪ The first major commercial
application of Cell was in
Sony's PlayStation 3 game
console.
▪ This video game console
contains the first production
application of the Cell
processor, clocked at 3.2 GHz
and containing seven out of
eight operational SPEs, to
allow Sony to increase the yield
on the processor manufacture.
Only six of the seven SPEs are
accessible to developers as one
is reserved by the OS.

Future
 Based on the unique features, Cell can bridge the
gap between conventional desktop processors (such
as the Athlon 64, and Core 2 families) and more
specialized high-performance processors, such as
the NVIDIA and ATI graphics-processors (GPUs).
 Cell will expand its intended use in current and
future digital distribution systems, as well as in
high-definition displays and recording equipment
and computer entertainment systems.

Future of the SMP (Related to ASMP)
 Easy to implement that lots of cores put in one
integrated circuit
 Easier programming than ASMP
▪ Because all the cores are identical
 Easy to keep the development speed
 Apply to any type of system (General usage)

Future of the SMP (Related to ASMP)
 Not proper to certain specific system
▪ Audio/video processing, data compression, and so on
 Waste the silicon and power
▪ Because it is made for the general purpose
▪ Less efficiency than ASMP
 These are reasons why ASMP has emerged .

Relies on effective exploitation of multiple-thread parallelism
 Need for parallel computing model and parallel programming model

Aggravates memory wall
 Memory bandwidth
▪ Way to get data out of memory banks
▪ Way to get data into multi-core processor array
 Memory latency
 Fragments L3 cache

Pins become strangle point
▪ Rate of pin growth projected to slow and flatten
▪ Rate of bandwidth per pin (pair) projected to grow slowly

Requires mechanisms for efficient inter-processor coordination
 Synchronization
 Mutual exclusion
 Context switching

Cache coherency circuitry can operate at a much higher clock rate
than is possible if the signals have to travel off-chip.

Signals between different CPUs travel shorter distances, those
signals degrade less.

These higher quality signals allow more data to be sent in a given
time period since individual signals can be shorter and do not need
to be repeated as often.

A dual-core processor uses slightly less power than two coupled
single-core processors.

Ability of multi-core processors to increase application performance
depends on the use of multiple threads within applications.

Most Current video games will run faster on a 3 GHz single-core
processor than on a 2GHz dual-core processor (of the same core
architecture.

Two processing cores sharing the same system bus and memory
bandwidth limits the real-world performance advantage.

If a single core is close to being memory bandwidth limited, going
to dual-core might only give 30% to 70% improvement.

If memory bandwidth is not a problem, a 90% improvement can be
expected.


All computers are now parallel computers!
Multi-core processors represent an important new trend in computer
architecture.
 Decreased power consumption and heat generation.
 Minimized wire lengths and interconnect latencies.
They enable true thread-level parallelism with great energy
efficiency and scalability.
 To utilize their full potential, applications will need to move from a
single to a multi-threaded model.

 Parallel programming techniques likely to gain importance.
• the difficult problem is not building multi-core hardware, but programming
it in a way that lets mainstream applications benefit from the continued
exponential growth in CPU performance.

the software industry needs to get back into the state where existing
applications run faster on new hardware.




http://en.wikipedia.org/wiki/Multi-core_(computing)
Olukotun, Kunle and Hammond, Lance. The future of
microprocessors.Queue, Volume 3, Issue 7, September 2005.
www.princeton.edu/~jdonald/research/hyperthreading/garg_re
port.pdf
Zheltov, Sergey N. and Bratanov, Stanislav V. Multi-threading
for Experts: Synchronization. Technical Report. Intel. 2005.
(WWWdocument, referenced 17.11.2005). Available:
http://www.intel.com/cd/ids/developer/asmona/eng/183321.htm

Question
 Give a definition and an example for each of:
1. A symmetric multi-core processor
2. An asymmetric multi-core processor

Answer:
 A symmetric multi-core processor is one that has multiple cores
on a single chip, and all of those cores are identical.
▪ Example: Intel Core 2
 An asymmetric multi-core processor is one that has
multiple cores on a single chip, but those cores might be
different designs.
▪ Example: Cell Processor.