Developing high performance applications with .NET Compact Framework Deepak Gulati ISV Developer Evangelist Microsoft Hardware/Drivers OEM/IHV Supplied Programming Model Data Device Building Tools BSP (ARM, SH4, MIPS) EDB SQL Server 2005 Mobile Edition Relational Native Server Side Standard PC Hardware and.

Download Report

Transcript Developing high performance applications with .NET Compact Framework Deepak Gulati ISV Developer Evangelist Microsoft Hardware/Drivers OEM/IHV Supplied Programming Model Data Device Building Tools BSP (ARM, SH4, MIPS) EDB SQL Server 2005 Mobile Edition Relational Native Server Side Standard PC Hardware and.

Developing high
performance applications
with .NET Compact
Framework
Deepak Gulati
ISV Developer Evangelist
Microsoft
Hardware/Drivers
OEM/IHV Supplied
Programming
Model
Data
Device Building
Tools
BSP
(ARM, SH4, MIPS)
EDB
SQL Server 2005 Mobile Edition
Relational
Native
Server Side
Standard PC
Hardware and Drivers
Windows XP DDK
Windows Embedded
Studio
Platform Builder
Lightweight
Managed
OEM Hardware and
Standard Drivers
SQL Server 2005 Express Edition
SQL Server 2005
Win32
MFC 8.0, ATL 8.0
.NET Compact Framework
ASP.NET Mobile Controls
.NET Framework
ASP.NET
Windows Media
DirectX
Multimedia
Location Services
MapPoint
Development Tools
Visual Studio 2005
Internet Security and Acceleration Server
Exchange Server
Live Communications Server
Speech Server
Communications
& Messaging
Device Update Agent
Management
Tools
Image Update
Software Update Services
Systems Management Server
Microsoft Operations Manager
Measuring Performance
Overview
Basic technique involves:
Find start time
Find end time
Calculate delta
Measuring Performance
Overview
Start and End times can be measured in
various ways
GetTickCount, a Win32 API function
Environment.TickCount is its managed
code equivalent
Both return int that represents time in ms that
has passed since the device was booted
Can also use System.DateTime and get
System.TimeSpan by subtracting Start
and End values
Measuring Performance
Overview
There can be issues with these
techniques:
For a device that has been on for a long
time, TickCount clips and goes negative
Not great for measuring ‘short’ operations,
there can be a variation of upto 500 ms
System.Date also suffers from accuracy
issues
Measuring Performance
Overview
QueryPerformanceCounter/QueryPerfor
manceFrequency to the rescue!
High resolution timer – OEM specific
implementation
Defaults to GetTickCount if not available
Measuring Performance
Overview
No managed implementation available for
QueryPerformanceCounter or Frequency
PInvoke QueryPerformanceFrequency and
get the clock frequency of the device/sec.
Divide by 1000 to get the clock frequency/ms
PInvoke QueryPerformanceCounter before
your call. Make your call. PInvoke
QueryPerformanceCounter again
End – Start / frequency/ms will give you time
for your call in ms
Demo
Using QueryPerformanceCounter
Common Language Runtime
Garbage Collector
Allocation rate
Allocation rate iter/sec
160000
140000
120000
100000
80000
60000
40000
20000
0
400
4000
20000
40000
Object size (bytes)
80000
Common Language Runtime
Garbage Collector
Allocation throughput
Allocation throughput Mb/sec
90
80
70
60
50
40
30
20
10
0
8
400
4000
20000
Object size (bytes)
40000
80000
Common Language Runtime
Where garbage comes from?
Unnecessary string copies
Strings are immutable
String manipulations (Concat(), etc.)
cause copies
Use StringBuilder
.stat
Run time 173 sec
Managed String Objects Allocated
Garbage Collections (GC)
Bytes of String Objects Allocate
Bytes Collected By GC
GC latency
20040
4912
5,800,480,574
5,918,699,036
107128 ms
.stat
Managed String Objects Allocated
Bytes of String Objects Allocated
Garbage Collections (GC)
Bytes Collected By GC
GC Latency
Run time 0.1 sec
56
2097718
2
1081620
21 ms
Last notes on StringBuilder
Remember it's all about reducing
memory traffic
If you roughly know the expected
length of your final string – allocate that
much before hand (StringBuilder
constructor)
Getting the string out of a StringBuilder
doesn't cause a new alloc, the existing
buffer is converted into a string
http://weblogs.asp.net/ricom/archive/2003/12/02/40778.aspx
Common Language Runtime
Where garbage comes from?
Unnecessary boxing
Value types allocated on the stack
(fast to allocate)
Boxing causes a heap allocation and a copy
Use strongly typed arrays and collections
(framework collections are NOT strongly typed)
Demo
String vs. StringBuilder
Common Language Runtime
Generics
Fully specialized implementation in .NET
Compact Framework v2
Pros
Strongly typed
No unnecessary boxing and type casts
Specialized code is more efficient than shared
Cons
Internal execution engine data structures and JITcompiled code aren’t shared
List<int>, List<string>, List<MyType>
http://blogs.msdn.com/romanbat/archive/2005/01/0
6/348114.aspx
Common Language Runtime
Finalization and Dispose
Cost of finalizers
Non-deterministic cleanup
Extends lifetime of object
In general, rely on GC for automatic memory
cleanup
The exceptions to the rule…
If your object contains an unmanaged resource
that the GC is unaware of, you need to implement a
finalizer
Also implement Dispose pattern to release unmanaged
resource in deterministic manner
Dispose method should suppress finalization
If the object you are using implements Dispose,
call it when you are done with the object
Assumes an unmanaged resource in the object chain
Common Language Runtime
Sample Code: Finalization and Dispose
Common Language Runtime
Sample Code: Finalization and Dispose
Common Language Runtime
Exceptions
Exceptions are cheap…until you throw
Throw exceptions in exceptional
circumstances
Do not use exceptions for normal
flow control
Use performance counters to track the
number of exceptions thrown
Replace “On Error/Goto” with
“Try/Catch/Finally” in Microsoft Visual
Basic® .NET
Common Language Runtime
Reflection
Reflection can be expensive
Reflection performance cost
Type comparisons (for example: typeof() )
Member enumerations (for example: Type.GetFields())
Member access (for example: Type.InvokeMember())
Think ~10-100x slower
Working set cost
Runtime data structures
Think ~100 bytes per loaded type, ~80 bytes per loaded method
Be aware of APIs that use reflection as a side effect
Override
Object.ToString()
GetHashCode() and Equals() (for value types)
Common Language Runtime
Building a Cost Model for Managed Math
Math performance
32 bit integers: Similar to native math
64 bit integers: ~5-10X cost of native math
Floating point: Similar to native math
ARM processors do not have FPU
.NET Compact Framework
Redist
FX
MSI Setup
(ActiveSync)
Per Device CAB
Install (SMS, etc)
Globalization
Microsoft.
VisualBasic
System.
Reflection
System
System.
Data
mscorlib
System.Xml
Debugger
JIT Compiler
& GC
Calendar
Data
Class
Loader
Assembly
Cache
Culture
Data
App Domain
Loader
Native
Interop
Process
Loader
Memory and
Threading
Crypto
System.
System.
Globalization Cryptography
I/O
Net
GUI
System.
IO.Ports
System.
WebServices
DirectX.
DirectD3DM
Microsoft.
Win32.Registry
System.Net.
Http*
Windows.
Forms
System.IO.
File
System.Net.
Sockets
System.
Drawing
File I/O
NTLM
Common
Controls
Registry
SSL
GDI/GWES
Sockets
D3DM
Visual Studio
Debug Engine
ICorDbg
Host
CLR
Sorting
Crypto API
Managed Loader
Cert/Security
File Mapping
Verification
Windows CE
Encodings
Casing
Base Class Library
Collections
Pre-size collection classes appropriately
Resizing creates unnecessary copies
Beware of foreach overhead, use indexer
when available
will be compiled into:
…
…
Windows Forms
Best Practices
Load and cache Forms in the background
Populate data separate from Form.Show()
Pre-populate data, or
Load data async to Form.Show()
Use BeginUpdate/EndUpdate when it is available
e.g. ListView, TreeView
Use SuspendLayout/ResumeLayout when
repositioning controls
Keep event handling code tight
Process bigger operations asynchronously
Blocking in event handlers will affect UI responsiveness
Form load performance
Reduce the number of method calls during initialization
Graphics And Games
Best Practices
Compose to off-screen buffers to minimize
direct to screen blitting
Approximately 50% faster
Avoid transparent blitting in areas that
require performance
Approximate 1/3 speed of normal blitting
Consider using pre-rendered images versus
using System.Drawing rendering primitives
Need to measure on a case-by-case basis
XML
Best Practices for Managing Large XML Data Files
Use XMLTextReader/XMLTextWriter
Smaller memory footprint than using XmlDocument
XmlTextReader is a pull model parser which only reads a
“window” of the data
XmlDocument builds a generic, untyped object model
using a tree
Type stored as string
OK to use with smaller documents (64K XML: ~0.25s)
Optimize the structure of XML document
Use elements to group
Allows use of Skip() in XmlReader
Use attributes to reduce size – processing attribute-centric
documents is faster
Keep it short! (attribute and element names)
Avoid gratuitous use of white space
XML
Creating optimized Reader/Writer
In v2 use XmlReader/XmlWriter factory
classes to create optimized reader or writer
Applying proper XMLReaderSettings can
improve performance
XmlReader reader = XmlReader.Create(“my.xml”,settings);
Up to 30% performance increase when
IgnoreWhitespace = true is specified
(depends on document format)
Demo
XmlDocument vs. XmlTextReader
XML
Reading local data with DataSet
DataSet is a database independent
container of relational data
Allows you to work with XML
ReadXml Allows you to load XML data into
DataSet
Simple to use, but performs badly,
especially with large XML files
If you must use DS.ReadXml, make sure
that you first supply the schema
Use XmlReader whereever possible for
traversing through your data
Demo
DataSet and .NET CompactFramework
Non-XML local data
Reading files locally
It might be required to read text file
stored locally on the device
StreamReader and FileStream classes
are typically employed
For large file sizes (>100 K), FileStream
outperforms StreamReader
StreamReader specifically looks for linebreaks, FileStream does not
Web Services
Where is a bottleneck
Are you network bound or CPU bound?
Use perf counters: socket bytes sent / received
Do you come close to the network capacity?
If you are network bound – work on reducing the size
of the message
Create a “canned” message, send over HTTP;
Compare performance with the web service;
If you are CPU bound, optimize the serialization
scheme for speed
http://blogs.msdn.com/mikezintel/archive/2005/03
/30/403941.aspx
Moving Forward
More tools
Live Remote Performance Counters
(new in v2)
Under construction:
Allocation profiler (CLR profiler)
Call profiler
Working set improvements
More speed
Summary
Make performance a requirement
and measure
Understand the APIs
Isolate exactly what is being measured
Repeat tests several times and ignore the first time which is
affected by JITting
Track the results in order for later comparisons and review
Ensure comparison of Apples to Apples
Use real code when possible
Test multiple designs and strategies - Understand the
differences or variation
Avoid unnecessary object allocation and copies due to
String manipulations
Boxing
Not pre-sized collections
Performance FAQ
http://blogs.msdn.com/netcfteam/archive/2005/05/04/414820.aspx