Developing high performance applications with .NET Compact Framework Deepak Gulati ISV Developer Evangelist Microsoft Hardware/Drivers OEM/IHV Supplied Programming Model Data Device Building Tools BSP (ARM, SH4, MIPS) EDB SQL Server 2005 Mobile Edition Relational Native Server Side Standard PC Hardware and.
Download ReportTranscript Developing high performance applications with .NET Compact Framework Deepak Gulati ISV Developer Evangelist Microsoft Hardware/Drivers OEM/IHV Supplied Programming Model Data Device Building Tools BSP (ARM, SH4, MIPS) EDB SQL Server 2005 Mobile Edition Relational Native Server Side Standard PC Hardware and.
Developing high performance applications with .NET Compact Framework Deepak Gulati ISV Developer Evangelist Microsoft Hardware/Drivers OEM/IHV Supplied Programming Model Data Device Building Tools BSP (ARM, SH4, MIPS) EDB SQL Server 2005 Mobile Edition Relational Native Server Side Standard PC Hardware and Drivers Windows XP DDK Windows Embedded Studio Platform Builder Lightweight Managed OEM Hardware and Standard Drivers SQL Server 2005 Express Edition SQL Server 2005 Win32 MFC 8.0, ATL 8.0 .NET Compact Framework ASP.NET Mobile Controls .NET Framework ASP.NET Windows Media DirectX Multimedia Location Services MapPoint Development Tools Visual Studio 2005 Internet Security and Acceleration Server Exchange Server Live Communications Server Speech Server Communications & Messaging Device Update Agent Management Tools Image Update Software Update Services Systems Management Server Microsoft Operations Manager Measuring Performance Overview Basic technique involves: Find start time Find end time Calculate delta Measuring Performance Overview Start and End times can be measured in various ways GetTickCount, a Win32 API function Environment.TickCount is its managed code equivalent Both return int that represents time in ms that has passed since the device was booted Can also use System.DateTime and get System.TimeSpan by subtracting Start and End values Measuring Performance Overview There can be issues with these techniques: For a device that has been on for a long time, TickCount clips and goes negative Not great for measuring ‘short’ operations, there can be a variation of upto 500 ms System.Date also suffers from accuracy issues Measuring Performance Overview QueryPerformanceCounter/QueryPerfor manceFrequency to the rescue! High resolution timer – OEM specific implementation Defaults to GetTickCount if not available Measuring Performance Overview No managed implementation available for QueryPerformanceCounter or Frequency PInvoke QueryPerformanceFrequency and get the clock frequency of the device/sec. Divide by 1000 to get the clock frequency/ms PInvoke QueryPerformanceCounter before your call. Make your call. PInvoke QueryPerformanceCounter again End – Start / frequency/ms will give you time for your call in ms Demo Using QueryPerformanceCounter Common Language Runtime Garbage Collector Allocation rate Allocation rate iter/sec 160000 140000 120000 100000 80000 60000 40000 20000 0 400 4000 20000 40000 Object size (bytes) 80000 Common Language Runtime Garbage Collector Allocation throughput Allocation throughput Mb/sec 90 80 70 60 50 40 30 20 10 0 8 400 4000 20000 Object size (bytes) 40000 80000 Common Language Runtime Where garbage comes from? Unnecessary string copies Strings are immutable String manipulations (Concat(), etc.) cause copies Use StringBuilder .stat Run time 173 sec Managed String Objects Allocated Garbage Collections (GC) Bytes of String Objects Allocate Bytes Collected By GC GC latency 20040 4912 5,800,480,574 5,918,699,036 107128 ms .stat Managed String Objects Allocated Bytes of String Objects Allocated Garbage Collections (GC) Bytes Collected By GC GC Latency Run time 0.1 sec 56 2097718 2 1081620 21 ms Last notes on StringBuilder Remember it's all about reducing memory traffic If you roughly know the expected length of your final string – allocate that much before hand (StringBuilder constructor) Getting the string out of a StringBuilder doesn't cause a new alloc, the existing buffer is converted into a string http://weblogs.asp.net/ricom/archive/2003/12/02/40778.aspx Common Language Runtime Where garbage comes from? Unnecessary boxing Value types allocated on the stack (fast to allocate) Boxing causes a heap allocation and a copy Use strongly typed arrays and collections (framework collections are NOT strongly typed) Demo String vs. StringBuilder Common Language Runtime Generics Fully specialized implementation in .NET Compact Framework v2 Pros Strongly typed No unnecessary boxing and type casts Specialized code is more efficient than shared Cons Internal execution engine data structures and JITcompiled code aren’t shared List<int>, List<string>, List<MyType> http://blogs.msdn.com/romanbat/archive/2005/01/0 6/348114.aspx Common Language Runtime Finalization and Dispose Cost of finalizers Non-deterministic cleanup Extends lifetime of object In general, rely on GC for automatic memory cleanup The exceptions to the rule… If your object contains an unmanaged resource that the GC is unaware of, you need to implement a finalizer Also implement Dispose pattern to release unmanaged resource in deterministic manner Dispose method should suppress finalization If the object you are using implements Dispose, call it when you are done with the object Assumes an unmanaged resource in the object chain Common Language Runtime Sample Code: Finalization and Dispose Common Language Runtime Sample Code: Finalization and Dispose Common Language Runtime Exceptions Exceptions are cheap…until you throw Throw exceptions in exceptional circumstances Do not use exceptions for normal flow control Use performance counters to track the number of exceptions thrown Replace “On Error/Goto” with “Try/Catch/Finally” in Microsoft Visual Basic® .NET Common Language Runtime Reflection Reflection can be expensive Reflection performance cost Type comparisons (for example: typeof() ) Member enumerations (for example: Type.GetFields()) Member access (for example: Type.InvokeMember()) Think ~10-100x slower Working set cost Runtime data structures Think ~100 bytes per loaded type, ~80 bytes per loaded method Be aware of APIs that use reflection as a side effect Override Object.ToString() GetHashCode() and Equals() (for value types) Common Language Runtime Building a Cost Model for Managed Math Math performance 32 bit integers: Similar to native math 64 bit integers: ~5-10X cost of native math Floating point: Similar to native math ARM processors do not have FPU .NET Compact Framework Redist FX MSI Setup (ActiveSync) Per Device CAB Install (SMS, etc) Globalization Microsoft. VisualBasic System. Reflection System System. Data mscorlib System.Xml Debugger JIT Compiler & GC Calendar Data Class Loader Assembly Cache Culture Data App Domain Loader Native Interop Process Loader Memory and Threading Crypto System. System. Globalization Cryptography I/O Net GUI System. IO.Ports System. WebServices DirectX. DirectD3DM Microsoft. Win32.Registry System.Net. Http* Windows. Forms System.IO. File System.Net. Sockets System. Drawing File I/O NTLM Common Controls Registry SSL GDI/GWES Sockets D3DM Visual Studio Debug Engine ICorDbg Host CLR Sorting Crypto API Managed Loader Cert/Security File Mapping Verification Windows CE Encodings Casing Base Class Library Collections Pre-size collection classes appropriately Resizing creates unnecessary copies Beware of foreach overhead, use indexer when available will be compiled into: … … Windows Forms Best Practices Load and cache Forms in the background Populate data separate from Form.Show() Pre-populate data, or Load data async to Form.Show() Use BeginUpdate/EndUpdate when it is available e.g. ListView, TreeView Use SuspendLayout/ResumeLayout when repositioning controls Keep event handling code tight Process bigger operations asynchronously Blocking in event handlers will affect UI responsiveness Form load performance Reduce the number of method calls during initialization Graphics And Games Best Practices Compose to off-screen buffers to minimize direct to screen blitting Approximately 50% faster Avoid transparent blitting in areas that require performance Approximate 1/3 speed of normal blitting Consider using pre-rendered images versus using System.Drawing rendering primitives Need to measure on a case-by-case basis XML Best Practices for Managing Large XML Data Files Use XMLTextReader/XMLTextWriter Smaller memory footprint than using XmlDocument XmlTextReader is a pull model parser which only reads a “window” of the data XmlDocument builds a generic, untyped object model using a tree Type stored as string OK to use with smaller documents (64K XML: ~0.25s) Optimize the structure of XML document Use elements to group Allows use of Skip() in XmlReader Use attributes to reduce size – processing attribute-centric documents is faster Keep it short! (attribute and element names) Avoid gratuitous use of white space XML Creating optimized Reader/Writer In v2 use XmlReader/XmlWriter factory classes to create optimized reader or writer Applying proper XMLReaderSettings can improve performance XmlReader reader = XmlReader.Create(“my.xml”,settings); Up to 30% performance increase when IgnoreWhitespace = true is specified (depends on document format) Demo XmlDocument vs. XmlTextReader XML Reading local data with DataSet DataSet is a database independent container of relational data Allows you to work with XML ReadXml Allows you to load XML data into DataSet Simple to use, but performs badly, especially with large XML files If you must use DS.ReadXml, make sure that you first supply the schema Use XmlReader whereever possible for traversing through your data Demo DataSet and .NET CompactFramework Non-XML local data Reading files locally It might be required to read text file stored locally on the device StreamReader and FileStream classes are typically employed For large file sizes (>100 K), FileStream outperforms StreamReader StreamReader specifically looks for linebreaks, FileStream does not Web Services Where is a bottleneck Are you network bound or CPU bound? Use perf counters: socket bytes sent / received Do you come close to the network capacity? If you are network bound – work on reducing the size of the message Create a “canned” message, send over HTTP; Compare performance with the web service; If you are CPU bound, optimize the serialization scheme for speed http://blogs.msdn.com/mikezintel/archive/2005/03 /30/403941.aspx Moving Forward More tools Live Remote Performance Counters (new in v2) Under construction: Allocation profiler (CLR profiler) Call profiler Working set improvements More speed Summary Make performance a requirement and measure Understand the APIs Isolate exactly what is being measured Repeat tests several times and ignore the first time which is affected by JITting Track the results in order for later comparisons and review Ensure comparison of Apples to Apples Use real code when possible Test multiple designs and strategies - Understand the differences or variation Avoid unnecessary object allocation and copies due to String manipulations Boxing Not pre-sized collections Performance FAQ http://blogs.msdn.com/netcfteam/archive/2005/05/04/414820.aspx