Dissertation Proposal

Download Report

Transcript Dissertation Proposal

ISERD ICETM 2015 Bangkok, Thailand

“Early Estimation of Cache Properties for Multicore Embedded Processors”

May 16, 2015

ISERD ICETM 2015 Bangkok, Thailand

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Presenter:

Dr. Abu Asaduzzaman, Assistant Professor

Prepared by:

Mr. Kishore K. Chidella, PhD Student

Computer Architecture and Parallel Programming Laboratory (CAPPLab) Department of Electrical Engineering and Computer Science (EECS)

Wichita State University (WSU), USA May 16, 2015

“Early Estimation of Cache Properties for Multicore Embedded Processors” Outline

Introduction

  Embedded systems with multicore processors Pros and cons due to cache ■

Background and Motivation

  Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio ■

Proposed Cache Modeling Strategy

  Multicore architecture for embedded systems Work-flow diagram

■ ■

Experimental Results Discussion QUESTIONS?

Any time!

Dr. Zaman

3

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Authors

Kishore K. Chidella, PhD Student

 EECS Department, Wichita State University (WSU), USA ■

Muhammad F. Mridha, Assistant Professor

 CSE Department, University of Asia Pacific (UAP), Bangladesh ■

Abu Asaduzzaman, Assistant Professor

  EECS Department, Wichita State University (WSU), USA Director, Computer Arch & Parallel Prog Lab (CAPPLab)

Dr. Zaman

4

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Introduction

Multicore Embedded Systems

  Future embedded systems should have multicore processors.

Currently available single-core based simulation techniques are not adequate to design multicore embedded systems [1-4].

 Software applications are having more and more threads to take advantage of the available cores [5-8].

 Multicore processors are frequently deployed with multilevel cache memories [9].

 Parallel thread execution to achieve the best performance in such a multicore system is difficult as it relates to cache sharing.

 Complex embedded systems design methodology needs supports from early estimation techniques.

Dr. Zaman

5

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Background and Motivation

Some Early Work

 The technical challenges associated with the integration of homogeneous and heterogeneous multiple cores in embedded systems is elucidated in [1].

 However, a viable way to make early estimation on future embedded systems design is not provided.

 According to the experimental results published in [4], cache parameters and the application code size have impact on total power consumption and mean delay per task.  This approach is not focused on designing embedded systems and does not cover the cache locking aspect.

Dr. Zaman

6

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Background and Motivation (+)

Some Early Work

 Issues related to cache locking at level-1 and level-2 caches are discussed in [11, 12]. In [14], various algorithms to select a set of instructions to be locked in cache are compared. Cache locking may improve performance.

 Entire (100% of the cache size) level-1 cache locking is not efficient for some applications, especially when the data size to be locked is smaller compared to the cache size.  Worst-case performance with locked caches may degrade with large cache lines due to cache pollution [12].

Dr. Zaman

7

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Background and Motivation (+)

Some Early Work

 These techniques are developed for single-core systems and not suitable for contemporary multicore embedded systems. Also, these techniques are not useful to estimate power consumption, a crucial design factor for embedded systems.

Therefore, an early estimation technique to evaluate cache properties for multicore embedded systems is required. Dr. Zaman

8

“Early Estimation of Cache Properties for Multicore Embedded Processors” Outline

Introduction

  Embedded systems with multicore processors Pros and cons due to cache ■

Background and Motivation

  Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio ■

Proposed Cache Modeling Strategy

  Multicore architecture for embedded systems Work-flow diagram

■ ■

Experimental Results Discussion QUESTIONS?

Any time!

Dr. Zaman

9

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Proposed Cache Modeling Strategy

Multicore Cache Organization

 Level-1 • Private • Split into I1 and D1  Level-2 • Private or Shared • Unified  Level-3 • Optional (or Shared)

Dr. Zaman

10

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Proposed Cache Modeling Strategy (+)

Cache Locking

   Private first level cache?

Shared last level cache?

Entire locking or partial/way locking?

Dr. Zaman

11

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Proposed Cache Modeling Strategy (+)

Work-Flow

 Master Core     Select jobs Assign jobs Pre-load cache memory Mean delay; Total power  Core x    Select cache size Lock? (Yes or No) Assign task

Dr. Zaman

12

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Simulation

Simulation Tool

 VisualSim tool to develop the modeling platform ■    

Applications to Run the Simulation Program

  FFT (Fast Fourier Transform) GIF (Graphics Interchange Format) JPEG (Joint Photographic Experts Group) MPEG (Moving Picture Experts Group)-3 MPEG-4 Here, FFT is the smallest application (with code size 2.34 KB) and MPEG-4 is the biggest application (with code size 91.83 KB).

Dr. Zaman

13

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Input / Output Parameters

■   

Inputs

   Number of cores: 4 (fixed) I1 / D1 size (KB): 2 / 2 (fixed) Line size (Byte): 128 (fixed) Associativity level (n-way): 8 (fixed) CL2 cache size (KB): 32, 64, 128, 256, or 512 Locked CL2 cache size (%): 0.0, 12.5, 25.0, 37.5, 50.0

Outputs

  Mean delay per task Total power consumption

Dr. Zaman

14

“Early Estimation of Cache Properties for Multicore Embedded Processors” Outline

Introduction

  Embedded systems with multicore processors Pros and cons due to cache ■

Background and Motivation

  Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio ■

Proposed Cache Modeling Strategy

  Multicore architecture for embedded systems Work-flow diagram

■ ■

Experimental Results Discussion QUESTIONS?

Any time!

Dr. Zaman

15

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Experimental Results

Shared L2 Cache Size

 JPEG behaves almost like GIF and MPEG-3 behaves almost like MPEG-4.

 For CL2 cache size 32 KB to 128 KB, mean delay per task and total power consumption for MPEG-4 decrease significantly when we increase cache size and/or move from no locking to 25% locking.  It should be noted that the impact of shared CL2 on power consumption is more significant than that on delay.

Dr. Zaman

16

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Experimental Results (+)

Shared L2 Cache Size

 Only for CL2 cache size 32 KB, mean delay per task and total power consumption for GIF decrease when 25% locking is applied.  However, CL2 cache size/locking has no positive impact on mean delay per task and total power consumption for FFT.  Increasing CL2 size beyond 128 KB has no positive impact (consumes more power without reducing the delay).

Dr. Zaman

17

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Experimental Results (+)

Shared L2 Cache Locking

 Cache locking at shared CL2 has significant impact on mean delay per task and total power consumption for large applications (like MPEG-4) than small applications (like FFT).

 According to shared CL2 cache locking results, the optimal performance (delay)/power ratio is obtained for 25% cache locking for all the workloads.

Dr. Zaman

18

“Early Estimation of Cache Properties for Multicore Embedded Processors”

Conclusions

■ A simulation methodology is presented to early estimate the effective cache properties (parameters and locked cache size) for multicore embedded systems. ■ A quad-core system with shared CL2 is simulated using FFT, GIF, JPEG, MPEG-3, and MPEG-4 workloads.

■ Albeit both mean delay per task and total power consumption decrease when shared CL2 cache size is increased and/or cache locking is applied, it is noted that the impact of shared CL2 on power consumption is more significant than that on delay.

Dr. Zaman

19

ISERD ICETM 2015 in Bangkok, Thailand “Early Estimation of Cache Properties for Multicore Embedded Processors” QUESTIONS?

Contact: Abu Asaduzzaman E-mail: [email protected]

Phone: +1-316-978-5261 CAPPLab: http://www.cs.wichita.edu/~capplab/

Thank You!