Read more...

Download Report

Transcript Read more...

Performance Model for Future
Multicore Process Designs
Yipkei Kwok
02/06/2008
A Non-Work-Conserving Operating
System Scheduler For SMT
Processors
• Authors: A. Fedorova et. al
• Calculate optimal level of //ism of SMT
Processors at run time
• Analytical model
• Estimate the workload’s IPC for a given
degree of concurrency
• 1st id’fy performance bottle
• Suppressing L2 misses improves
performance the best
A Non-Work-Conserving Operating
System Scheduler For SMT
Processors
• Factors
–N
– perf_cache_CPI(N)
– L2_RMR
– L2_WMR
– L2_WBR_R
– L2_WBR_W
– WSC
– L2_MCOST
Non-Work-Conserving Operating
System Scheduler For SMT
Processors
• 2-phases scheduling
– Preparation phase
• Collect model inputs under full //ism
• W./ hardware counters
• Till the retirement of the 100 million-th instructions
– Optimization phase
• Estimate optimal N
• Enforce it
• Till … …
– New locality phase
Limitations
• 3-56% improvement but … ..
• Empirical model based on UltraSparc T1
• SMT only
– But expandable w./, hopefully, reasonable
effort
• Once expanded, performance prediction
• What’re needed?
– Extra factors?
What new factors?
•
•
•
•
•
Depends on systems to model
Shared-memory machine
Threaded // workloads
SMP of CMPs
SMT per core
What new factors?
• Architecture
– Homo/hetero cores
• Difference in speed, or functionality
– Level of cache sharing
– Interconnects
What new factors?
• Params
– #(cores)
– Cache size
– Degree of set-associativity
– #(cores) sharing a cache
– Bus, ring, crossbar, tiny-network
– Switching & flow mechanisms
– Routing algos
– Fault tolerance techniques
What new factors?
• Protocols
– Cache coherence protocol at dedicated/semishared cache
• Algorithms
– Block replacement algorithm
– Algorithms of cache coherence and data
consistency protocols
Potential uses
• Performance prediction for future
processors
• Scheduler
Similar work exists?
• Multi2Sim (2007)
– Framework simulating the system working as
a whole
– Yet, app-only simulation
– Evaluate multicore-multithreaded processors
– 3 major components simulated
• Core
• Cache hierarchy
• Interconnect
– Note: source code available
Enough?
• Limitations
– Homogenous core
– Topology
• Bus only
• W./ variable bus width though