Floating Point Numbers & Parallel Computing

Download Report

Transcript Floating Point Numbers & Parallel Computing

Floating Point Numbers & Parallel Computing

1

Outline

• • • • • •

Fixed-point Numbers

Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing

3.141592653589793238462643383…

2

Fixed-point Numbers

• • • How to represent rational numbers in binary?

One way: define binary “point” between integer and fraction Analogous to point between integer and fraction for decimal numbers: integer

6.75

point fraction

3

Fixed-point Numbers

• • Point’s position is static (cannot be changed) E.g., point goes between 3 rd and 4 th bits of byte:

0110.1100

4 bits for integer component 4 bits for fraction component

4

Fixed-point Numbers

• • Integer component: binary interpreted as before LSB is 2

0

0110.1100

= 2

2

+ 2

1

= 4+2 =

6

5

Fixed-point Numbers

• • Fraction component: binary interpreted slightly differently MSB is 2

-1

0110.1100

= 2

-1

+ 2

-2

= 0.5 + 0.25

=

0.75

6

Fixed-point Numbers

= 2

2

+ 2

1

= 4+2 =

6

0110.1100

= 2

-1

+ 2

-2

= 0.5 + 0.25

=

0.75

6.75

7

Fixed-point Numbers

• • How to represent negative numbers?

2’s complement notation

-2.375

1101.1010

8

Fixed-point Numbers

1. Invert bits 2. Add 1 3. Convert to fixed-point decimal 4. Multiply by -1

-2.375

2.375

1101.1010

0010.0101

0010.0110

2 1 =

2

= 2 -2 + 2 -3 = 0.25 + 0.125

=

0.375

9

Outline

• • • • • • Fixed-point Numbers

Floating Point Numbers

Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing

3.141592653589793238462643383…

Floating Point Numbers

• • Analogous to scientific notation E.g., 4.1 ×

10 3 = 4100

• Gets around limitations of constant integer and fraction sizes • Allows representation of very small and very large numbers 10

Floating Point Numbers

• • • • • Just like scientific notation, floating point numbers have: sign ( ± ) mantissa (M) base (B) exponent (E)

4.1

×

10

3

= 4100

M = 4.1

E = 3 B = 10 11

Floating Point Numbers

• Floating point numbers in binary 32 bits 12 sign 1 bit exponent 8 bits mantissa 23 bits

13

Floating Point Numbers

• Example: convert 228 to floating point

228 = 1110 0100 = 1.1100100

×

2

7

sign = positive exponent = 7 mantissa = 1.1100100

base = 2 (implicit)

14

Floating Point Numbers

228 = 1110 0100 = 1.1100100

×

2

7

sign = positive (0) exponent = 7 mantissa = 1.1100100

base = 2 (implicit) 0 0000 0111 11100100000000000000000

Floating Point Numbers

• • In binary floating point, MSB of mantissa is always 1 • No need to store MSB of mantissa (1 is implied) Called the “implicit leading 1” 0 0000 0111 11100100000000000000000 15 0 0000 0111 11001000000000000000000

Floating Point Numbers

• • • • Exponent must represent both positive and negative numbers • • Floating point uses biased exponent Original exponent plus a constant bias 32-bit floating point uses bias 127 E.g., exponent -4 (2 E.g., exponent 7 (2 7 -4 ) would be -4 + 127 = 123 = 0111 1011 ) would be 7 + 127 = 134 = 1000 0110 0 0000 0111 11001000000000000000000 16 0 1000 0110 11001000000000000000000

Floating Point Numbers

• E.g., 228 in floating point binary (IEEE 754 standard) 0 1000 0110 11001000000000000000000 sign bit = 0 (positive) 8-bit biased exponent E = number –

bias

E = 134 – 127 =

7

17 23-bit mantissa without

implicit leading 1

Floating Point Numbers

• Special cases: 0, ± ∞, NaN

value

0 + ∞ ∞ NaN

sign bit

N/A 0 1 N/A

exponent mantissa

00000000 00…000 11111111 11111111 00…000 00…000 11111111 non-zero 18

Floating Point Numbers

• • Single versus double precision • Single: 32-bit float Range: ± 1.175494 × 10 -38 ---> ± 3.402824 × 10 38 • • Double: 64-bit double Range: ± 2.22507385850720 × 10 -308 ---> ± 1.79769313486232 × 10 308 float double

# bits (total)

32 64

# sign bits

1 1

# exponent bits

8 11

# mantissa bits

23 52 19

Outline

• • • • • • Fixed-point Numbers Floating Point Numbers

Superscalar Processors

Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing

3.141592653589793238462643383…

20

Superscalar Processors

• • • Multiple hardwired copies of datapath Allows multiple instructions to execute simultaneously • • • • E.g., 2-way superscalar processor Fetches / executes 2 instructions per cycle 2 ALUs 2-port memory unit 6-port register file (4 source, 2 write back) 21

Superscalar Processors

• Datapath for 2-way superscalar processor 22 6-port register file 2 ALUs 2-port memory unit

Superscalar Processors

• • Pipeline for 2-way superscalar processor 2 instructions per cycle: 23

Superscalar Processors

• • Commercial processors can be 3, 4, or even 6-way superscalar Very difficult to manage dependencies and hazards 24 Intel Nehalam (6-way superscalar)

Outline

• • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors

Multithreading

Homogeneous Multiprocessing Heterogeneous Multiprocessing

3.141592653589793238462643383…

25

Multithreading (Terms)

• • • Process: program running on a computer Can have multiple processes running at same time E.g., music player, web browser, anti-virus, word processor • • Thread: each process has one or more threads that can run simultaneously E.g., word processor: threads to read input, print, spell check, auto-save 26

Multithreading (Terms)

• • Instruction level parallelism (ILP): # of instructions that can be executed simultaneously for program / microarchitecture Practical processors rarely achieve ILP greater than 2 or 3Thread level parallelism (TLP): degree to which a process can be split into threads 27

Multithreading

• • Keeps processor with many execution units busy Even if ILP is low or program is stalled (waiting for memory) • • • For single-core processors, threads give illusion of simultaneous execution Threads take turns executing (according to OS) OS decides when a thread’s turn begins / ends 28

Multithreading

• When one thread’s turn ends: -- OS saves architectural state -- OS loads architectural state of another thread -- New thread begins executing • • This is called a context switch If context switch is fast enough, user perceives threads as running simultaneously (even on single-core) 29 context switch context switch

Multithreading

• • Multithreading does NOT improve ILP, but DOES improve

processor throughput

Threads use resources that are otherwise idle • • Multithreading is relatively inexpensive Only need to save PC and register file idle vs next task… 30

Outline

• • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading

Homogeneous Multiprocessing

Heterogeneous Multiprocessing

3.141592653589793238462643383…

31

Homogeneous Multiprocessing

• • • AKA symmetric multiprocessing (SMP) 2 or more identical processors with single shared memory Easier to design (than heterogeneous) • • Multiple cores on same (or different) chip(s) In 2005, architectures made shift to SMP 32

Homogeneous Multiprocessing

• • • Multiple cores can execute threads concurrently True simultaneous execution Multi-threaded programming can be tricky.. single-core threads w/ single-core vs. multi-core 33 core #1 core #2 core #3 core #4 multi-core

Outline

• • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing

Heterogeneous Multiprocessing

3.141592653589793238462643383…

34

Heterogeneous Multiprocessing

• AKA asymmetric multiprocessing (AMP) • • • 2 (or more) different processors Specialized processors used for specific tasks E.g., graphics, floating point, FPGAs • Adds complexity 35 Nvidia GPU

Heterogeneous Multiprocessing

• • •

Clustered:

Each processor has its

own memory

E.g., PCs connected on a

network

• • Memory not shared, must pass information between nodes… Can be costly 36