Transcript Floating Point Numbers & Parallel Computing
Floating Point Numbers & Parallel Computing
1
Outline
• • • • • •
Fixed-point Numbers
Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing
3.141592653589793238462643383…
2
Fixed-point Numbers
• • • How to represent rational numbers in binary?
One way: define binary “point” between integer and fraction Analogous to point between integer and fraction for decimal numbers: integer
6.75
point fraction
3
Fixed-point Numbers
• • Point’s position is static (cannot be changed) E.g., point goes between 3 rd and 4 th bits of byte:
0110.1100
4 bits for integer component 4 bits for fraction component
4
Fixed-point Numbers
• • Integer component: binary interpreted as before LSB is 2
0
0110.1100
= 2
2
+ 2
1
= 4+2 =
6
5
Fixed-point Numbers
• • Fraction component: binary interpreted slightly differently MSB is 2
-1
0110.1100
= 2
-1
+ 2
-2
= 0.5 + 0.25
=
0.75
6
Fixed-point Numbers
= 2
2
+ 2
1
= 4+2 =
6
0110.1100
= 2
-1
+ 2
-2
= 0.5 + 0.25
=
0.75
6.75
7
Fixed-point Numbers
• • How to represent negative numbers?
2’s complement notation
-2.375
1101.1010
8
Fixed-point Numbers
1. Invert bits 2. Add 1 3. Convert to fixed-point decimal 4. Multiply by -1
-2.375
2.375
1101.1010
0010.0101
0010.0110
2 1 =
2
= 2 -2 + 2 -3 = 0.25 + 0.125
=
0.375
9
Outline
• • • • • • Fixed-point Numbers
Floating Point Numbers
Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing
3.141592653589793238462643383…
Floating Point Numbers
• • Analogous to scientific notation E.g., 4.1 ×
10 3 = 4100
• Gets around limitations of constant integer and fraction sizes • Allows representation of very small and very large numbers 10
Floating Point Numbers
• • • • • Just like scientific notation, floating point numbers have: sign ( ± ) mantissa (M) base (B) exponent (E)
4.1
×
10
3
= 4100
M = 4.1
E = 3 B = 10 11
Floating Point Numbers
• Floating point numbers in binary 32 bits 12 sign 1 bit exponent 8 bits mantissa 23 bits
13
Floating Point Numbers
• Example: convert 228 to floating point
228 = 1110 0100 = 1.1100100
×
2
7
sign = positive exponent = 7 mantissa = 1.1100100
base = 2 (implicit)
14
Floating Point Numbers
228 = 1110 0100 = 1.1100100
×
2
7
sign = positive (0) exponent = 7 mantissa = 1.1100100
base = 2 (implicit) 0 0000 0111 11100100000000000000000
Floating Point Numbers
• • In binary floating point, MSB of mantissa is always 1 • No need to store MSB of mantissa (1 is implied) Called the “implicit leading 1” 0 0000 0111 11100100000000000000000 15 0 0000 0111 11001000000000000000000
Floating Point Numbers
• • • • Exponent must represent both positive and negative numbers • • Floating point uses biased exponent Original exponent plus a constant bias 32-bit floating point uses bias 127 E.g., exponent -4 (2 E.g., exponent 7 (2 7 -4 ) would be -4 + 127 = 123 = 0111 1011 ) would be 7 + 127 = 134 = 1000 0110 0 0000 0111 11001000000000000000000 16 0 1000 0110 11001000000000000000000
Floating Point Numbers
• E.g., 228 in floating point binary (IEEE 754 standard) 0 1000 0110 11001000000000000000000 sign bit = 0 (positive) 8-bit biased exponent E = number –
bias
E = 134 – 127 =
7
17 23-bit mantissa without
implicit leading 1
Floating Point Numbers
• Special cases: 0, ± ∞, NaN
value
0 + ∞ ∞ NaN
sign bit
N/A 0 1 N/A
exponent mantissa
00000000 00…000 11111111 11111111 00…000 00…000 11111111 non-zero 18
Floating Point Numbers
• • Single versus double precision • Single: 32-bit float Range: ± 1.175494 × 10 -38 ---> ± 3.402824 × 10 38 • • Double: 64-bit double Range: ± 2.22507385850720 × 10 -308 ---> ± 1.79769313486232 × 10 308 float double
# bits (total)
32 64
# sign bits
1 1
# exponent bits
8 11
# mantissa bits
23 52 19
Outline
• • • • • • Fixed-point Numbers Floating Point Numbers
Superscalar Processors
Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing
3.141592653589793238462643383…
20
Superscalar Processors
• • • Multiple hardwired copies of datapath Allows multiple instructions to execute simultaneously • • • • E.g., 2-way superscalar processor Fetches / executes 2 instructions per cycle 2 ALUs 2-port memory unit 6-port register file (4 source, 2 write back) 21
Superscalar Processors
• Datapath for 2-way superscalar processor 22 6-port register file 2 ALUs 2-port memory unit
Superscalar Processors
• • Pipeline for 2-way superscalar processor 2 instructions per cycle: 23
Superscalar Processors
• • Commercial processors can be 3, 4, or even 6-way superscalar Very difficult to manage dependencies and hazards 24 Intel Nehalam (6-way superscalar)
Outline
• • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors
Multithreading
Homogeneous Multiprocessing Heterogeneous Multiprocessing
3.141592653589793238462643383…
25
Multithreading (Terms)
• • • Process: program running on a computer Can have multiple processes running at same time E.g., music player, web browser, anti-virus, word processor • • Thread: each process has one or more threads that can run simultaneously E.g., word processor: threads to read input, print, spell check, auto-save 26
Multithreading (Terms)
• • Instruction level parallelism (ILP): # of instructions that can be executed simultaneously for program / microarchitecture Practical processors rarely achieve ILP greater than 2 or 3 • Thread level parallelism (TLP): degree to which a process can be split into threads 27
Multithreading
• • Keeps processor with many execution units busy Even if ILP is low or program is stalled (waiting for memory) • • • For single-core processors, threads give illusion of simultaneous execution Threads take turns executing (according to OS) OS decides when a thread’s turn begins / ends 28
Multithreading
• When one thread’s turn ends: -- OS saves architectural state -- OS loads architectural state of another thread -- New thread begins executing • • This is called a context switch If context switch is fast enough, user perceives threads as running simultaneously (even on single-core) 29 context switch context switch
Multithreading
• • Multithreading does NOT improve ILP, but DOES improve
processor throughput
Threads use resources that are otherwise idle • • Multithreading is relatively inexpensive Only need to save PC and register file idle vs next task… 30
Outline
• • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading
Homogeneous Multiprocessing
Heterogeneous Multiprocessing
3.141592653589793238462643383…
31
Homogeneous Multiprocessing
• • • AKA symmetric multiprocessing (SMP) 2 or more identical processors with single shared memory Easier to design (than heterogeneous) • • Multiple cores on same (or different) chip(s) In 2005, architectures made shift to SMP 32
Homogeneous Multiprocessing
• • • Multiple cores can execute threads concurrently True simultaneous execution Multi-threaded programming can be tricky.. single-core threads w/ single-core vs. multi-core 33 core #1 core #2 core #3 core #4 multi-core
Outline
• • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing
Heterogeneous Multiprocessing
3.141592653589793238462643383…
34
Heterogeneous Multiprocessing
• AKA asymmetric multiprocessing (AMP) • • • 2 (or more) different processors Specialized processors used for specific tasks E.g., graphics, floating point, FPGAs • Adds complexity 35 Nvidia GPU
Heterogeneous Multiprocessing
• • •
Clustered:
Each processor has its
own memory
E.g., PCs connected on a
network
• • Memory not shared, must pass information between nodes… Can be costly 36