Transcript Pr cis

A next-generation many-core processor with
reliability, fault tolerance and adaptive power
management features optimized for
embedded and high performance computing
applications
Simon McIntosh-Smith, VP of Applications,
[email protected]
HPEC, September 2008
1
Copyright © 2008 ClearSpeed Technology Inc. All rights reserved.
www.clearspeed.com
The CSX700 Processor
• Includes dual MTAP cores:
–
–
–
–
–
–
–
96 GFLOPS peak (32 & 64-bit)
48 GMACS peak (16x16  32+64)
10W max power consumption
250MHz clock speed
192 Processing Elements (2x96)
8 spare PEs for resiliency
ECC on all internal memories
• On-die temperature sensors
• Active power management
• Dual integrated 64-bit DDR2 memory
controllers with ECC
• Integrated PCI Express x16
• CCBR chip-to-chip bridge port
• IBM 90nm process
• 266 million transistors
• Shipping to customers since June 08
2
Copyright © 2008 ClearSpeed Technology Inc. All rights reserved.
www.clearspeed.com
The ClearSpeed AdvanceTM e710, e720 and CATS-700
• 96 GFLOPS e710 & e720 fit standard 1U & HP blade servers
– Low power consumption of 25W max, small, light, passively cooled
– Designed for high reliability (MTBF)
– All memory is error protected; no moving parts (e.g. fans) are required
• CATS-700 1U system
– 1.152 TFLOPS 32- and 64-bit floating point
– 96 GBytes/s memory bandwidth to 24 GB of ECC protected DDR2
– 300W typical power consumption
• Easy to use Software Development Kit
– ANSI C compiler, gdb-based debugger, advanced profiler
3
Copyright © 2008 ClearSpeed Technology Inc. All rights reserved.
www.clearspeed.com
CSX700 FFT performance and e710 power consumption
7.4
7.5
7.0
6.6
6.5
CSX700 core power (W)
6.5
6.0
5.7
5.9
5.5
4.9
5.0
128
5.2
256
512
4.5
1024
4.5
2048
4.0
3.8
3.5
3.6
50MHz
100MHz
150MHz
200MHz
250MHz
Core Clock Speed
1D FFT performance up to 20 GFLOPS, 2D FFT performance up to 16 GFLOPS
1D convolution performance up to 22 GFLOPS, ~3 GFLOPS/watt on FFTs
10,000,000
128
256
512
1024
2,267,810
1D convolutions per second
2048
1,816,039
1,362,637
1,000,000
995,686
909,053
797,284
597,996
454,215
435,204
398,708
348,092
261,323
199,524
190,808
174,218
100,000
152,549
114,468
87,118
82,093
76,330
65,734
49,282
38,132
32,872
16,455
10,000
Copyright © 2008 ClearSpeed Technology Inc. All rights reserved.
50MHz
100MHz
150MHz
www.clearspeed.com
Core Clock Speed
200MHz
4
250MHz
CSX700 and beyond
• The CSX700 is much more power efficient than cell
and GPUs for embedded processing.
–
–
–
–
–
E.g. for single precision complex 1024x1024 2D FFT:
Cell (8 SPE):
38 GFLOPS
40W 0.95 GFLOP/watt
S870 (Tesla) GPU: 50 GFLOPS
170W 0.07 GFLOP/watt
x86 core:
3 GFLOPS
25W 0.12 GFLOP/watt
CSX700:
20 GFLOPS
7W 2.86 GFLOP/watt
• Next generation processor “Carnac” in design now
– Focusing on 1- and 2D FFT performance
– Design goal is 100 GFLOPS/watt sustained on 2D FFTs
• ClearSpeed Federal Systems launched to support
defense programs
5
Copyright © 2008 ClearSpeed Technology Inc. All rights reserved.
www.clearspeed.com