http://aka.ms/E2013Calc \Web Service(Default Web Site)\Current Connections \MSExchange Active Manager(_total)\Database Mounted.
Download ReportTranscript http://aka.ms/E2013Calc \Web Service(Default Web Site)\Current Connections \MSExchange Active Manager(_total)\Database Mounted.
http://aka.ms/E2013Calc \Web Service(Default Web Site)\Current Connections \MSExchange Active Manager(_total)\Database Mounted http://aka.ms/ExOnlineLimits Large Organization Configuration 36 Cores / 450 GB RAM per server Higher Mailbox Density Deployed Exchange 2013 in All-In-One configuration Hardware NLB configured for ‘Least Connections’ What Happened? Policy change required removal of local storage of email Outlook now required to run in “Online Mode” Impact Increased in network traffic Users frequently disconnected during peak periods ~2 weeks to isolate problem ~2 weeks to get remediation changes in place Network Load Balancer 40k users 3 4 1 5 6 2 Exchange.cohovineyard.com Exchange 2013 All-in-One 7 13 19 25 31 40 8 14 20 26 32 41 9 15 21 27 42 Virtual IP 28 10 16 22 43 11 44 17 23 29 45 12 18 24 30 Network Load Balancer 40k users 47 49 46 48 Exchange.cohovineyard.com Exchange 2013 All-in-One 1 7 13 19 25 31 40 2 8 14 20 26 32 41 3 9 15 21 27 42 54 4 10 50 52 55 5 44 51 53 56 57 58 59 60 61 62 63 Virtual IP ! Hardware NLB 40k users 3 4 1 5 2 23 Exchange.cohovineyard.com Exchange 2013 All-in-One 6 11 16 21 29 7 12 17 22 30 8 13 18 24 31 9 14 19 25 32 10 15 20 26 33 27 28 34 35 36 Virtual IP Lookup Active Mailbox Location IIS RpcHttp HttpProxy IIS RPC Client Access RpcHttp Store Worker /RPC Port 443 57 Port 444 Port 6001 MBxDB https.sys MSExchangeRpcProxyFrontEndAppPool (W3WP) https.sys MSExchangeRpcProxyAppPool (W3WP) M.E.RpcClientAccess M.E.Store.Worker Max 65535 Requests Connection Manager /RPC:44357 Request Router /RPC:443 W3WP Queue 58 64 59 65 66 60 67 61 Managed Availability /RPC:444 68 62 69 63 IIS /RPC:444 W3WP Queue Thread Thread Thread Thread Thread Thread System.Web Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer MSExchangeRpcProxyFrontEndAppPool (W3WP) inetpub\logs\LogFiles\W3SVC1\u_exXXXXXX.log date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status scsubstatus sc-win32-status time-taken 201 40721 07: 59: 44 192 .16 8.1 .1 RPC_IN_DATA /rpc/ rpcpr oxy.d ll 8416409b-081e4fe8-92007e54d8874d7c@cohov ineyard.com:6001&R equestId=fc60c1759c77-47d0-b435ae3d04acea1b 443 COHOVI NEYARD \SM_4f 3083c2 bd6a40 d8b 192.168 .1.5 MSRPC - 200 0 64 29513 inetpub\logs\LogFiles\W3SVC1\httperrXXXXX.log date time c-ip c-port s-ip sport Csversion Cs-method Cs-uri Scstatus Ssiteid S-reason S-queuename 201407-21 07:5 9:44 192.16 8.1.5 160 45 192.16 8.1.1 44 4 HTTP /1.1 RPC_IN _DATA /rpc/rpcproxy .dll?COHOEXCH.cohovine yard.com:6001 400 2 Connection_Dropped MSExchangeRpcPro xyAppPool 201407-21 07:5 9:44 192.16 8.1.5 160 45 192.16 8.1.1 44 3 HTTP /1.1 RPC_IN _DATA /rpc/rpcproxy .dll? 8416409b081e-4fe892007e54d8874d7c@ COHOEXCH.cohovine yard.com:6001 - 1 Connection_Dropped_List_Full MSExchangeRpcPro xyAppPool IIS indicating it cannot hand off connection because queue is full IIS Location File Names Perfmon Counter RpcHttp HttpProxy IIS RpcHttp RPC Client Access inetpub \logs \LogFiles \W3SVC1 Logging \RpcHttp \W3SVC1 Logging \HttpProxy \RpcHttp Inetpub \logs \LogFiles \W3SVC2 Logging \RpcHttp \W3SVC2 Logging \RPC Client Access u_exXXXXXX.log httperrXXXXX.log RpcHttpXXXXXXXXX.log HttpProxyXXXXXX XXXX-X.log u_exXXXXXX.log httperrXXXXX.log RpcHttpXXXXXXXXX.log RCA_XXXXXXXXXXX.log \Web Service(Default Web Site) \Current Connections \RPC/HTTP Proxy \Current Number of Incoming RPC over HTTP Connections \MSExchange HttpProxy \Accepted Connection Count \Web Service(Exchange Back End) \Current Connections \RPC/HTTP Proxy\ Current Number of Incoming RPC over HTTP Connections \MSExchange RPC ClientAccess \Current Connections Network CPU Memory Storage Network (Requests) \Web Service(Default Web Site)\Current Connections \MSExchangeIS Store(*)\RPC Average Latency < 100 ms \MSExchangeIS Client Type(*)\RPC Average Latency < 100 ms \MSExchangeIS Store(*)\RPC Operation/Sec \MSExchangeIS Client Type(*)\RPC Operation/Sec CAS Experience MoMT \MSExchange RpcClientAccess\RPC Averaged Latency \MSExchange RpcClientAccess\RPC Operations/sec EAS \MSExchange ActiveSync\Requests/sec \MSExchange ActiveSync\Current Requests EWS \MSExchangeWS\Average Response Time \MSExchangeWS\Requests/sec OWA \MSExchange OWA\Average Response Time \MSExchange OWA\Average Search Time \MSExchange OWA\Requests/sec POP \MSExchangePop3(*)\Average LDAP Latency \MSExchangePop3(*)\Average RPC Latency \MSExchangePop3(*)\Request Rate IMAP \MSExchangeImap4(*)\Average LDAP Latency \MSExchangeImap4(*)\Average RPC Latency \MSExchangeImap4(*)\Request Rate Management / Background Ops PS \MSExchangeRemotePowershell\Current Connection Sessions \MSExchangeRemotePowershell\Current Connected Unique Users Overall RPC Average Latency is not impacted Memory (Exchange Process Usage) \Memory\% Committed Bytes in Use < 80% \Memory\Available MBytes > 5% or RAM .NET CLR Memory(*)\% Time in GC Should be below 10% on average .NET CLR Exceptions(*)\# of Excepts Thrown / sec Should be less than 5% of total requests per second (RPS) (Web Server(_Total)\C onnection Attempts/sec * .05). .NET CLR Memory(*)\# Bytes in all Heaps Memory (WorkstationGC to ServerGC) .NET CLR Memory\Allocated Bytes/sec Sustained >50mb Only 30% bytes committed Storage (Exchange I/O) \MSExchange Active Manager(_total)\Database Mounted Balanced across all MBX servers \MSExchange Database ++> Instances(*)\I/O Database Reads (Attached) Average Latency < 20ms \MSExchange Database ++> Instances(*)\I/O Database Writes(Attached) Average Latency < 50ms \MSExchange Database ++> Instances(*)\I/O Log Writes Average Latency < 10ms \MSExchange Database ++> Instances(*)\I/O Database Reads (Recovery) Average Latency < 200ms \MSExchange Database ++> Instances(*)\I/O Database Writes(Recovery) Average Latency < read latency for same instance as above I/O is acceptable CPU (Exchange Processes) Processor(_Total)\% Processor Time Should be less than 75% on average. \Processor(_Total)\% Privileged Time (kernel) Should be less than 75% on average. \Processor(_Total)\%User Time Should be less than 75% on average. \Process (*)\% Processor Time <specific process> System\Processor Queue Length (all instances) Shouldn't be greater than 5 per processor. W3WP#3 is the MSExchangeRpcProxyFrontEndAppPool W3wp#3 high CPU Most Recent Usage Provides a periodic snapshot of executing code. Used by developers to track “hot” code paths Requires source code to interpret. Download Start http://aka.ms/perfview http://channel9.msdn.com/Serie s/PerfView-Tutorial ntdll!ZwWaitForMultipleObjects KERNELBASE!WaitForMultipleObjectsEx clr!WaitForMultipleObjectsEx_SO_TOLERANT clr!Thread::DoAppropriateAptStateWait clr!Thread::DoAppropriateWaitWorker clr!Thread::DoAppropriateWait clr!CLREventBase::WaitEx clr!AwareLock::EnterEpilogHelper clr!AwareLock::EnterEpilog clr!AwareLock::Contention clr!JITutil_MonContention System_Web_ni!System.Web.BufferAllocator.GetBuffer() System_Web_ni!System.Web.Hosting.RecyclableArrayHelper.GetIntPtrArray(Int32) System_Web_ni!System.Web.Hosting.IIS7WorkerRequest.FlushCachedResponse(Boolean) System_Web_ni!System.Web.HttpResponse.UpdateNativeResponse(Boolean) System_Web_ni!System.Web.HttpResponse.Flush(Boolean, Boolean) System_Web_ni!System.Web.HttpWriter.WriteFromStream(Byte[], Int32, Int32) mscorlib_ni!System.IO.Stream.<BeginWriteInternal>b__11(System.Object) mscorlib_ni!System.Threading.Tasks.Task`1[[System.Boolean, mscorlib]].InnerInvoke() mscorlib_ni!System.Threading.Tasks.Task.Execute() mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) mscorlib_ni!System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef) mscorlib_ni!System.Threading.Tasks.Task.ExecuteEntry(Boolean) mscorlib_ni!System.Threading.ThreadPoolWorkQueue.Dispatch() clr!CallDescrWorkerInternal clr!CallDescrWorkerWithHandler clr!MethodDescCallSite::CallTargetWorker clr!MethodDescCallSite::Call_RetBool clr!QueueUserWorkItemManagedCallback clr!ManagedThreadBase_DispatchInner clr!ManagedThreadBase_DispatchMiddle clr!ManagedThreadBase_DispatchOuter clr!ManagedThreadBase_DispatchInCorrectAD clr!Thread::DoADCallBack clr!ManagedThreadBase_DispatchInner clr!ManagedThreadBase_DispatchMiddle clr!ManagedThreadBase_DispatchOuter clr!ManagedThreadBase_FullTransitionWithAD clr!ManagedThreadBase::ThreadPool clr!ManagedPerAppDomainTPCount::DispatchWorkItem clr!ThreadpoolMgr::ExecuteWorkRequest clr!ThreadpoolMgr::WorkerThreadStart clr!Thread::intermediateThreadProc kernel32!BaseThreadInitThunk ntdll!RtlUserThreadStart Source From: http://referencesource.microsoft.com/#System.Web/BufferAllocator.cs Investigation Large number of connections to server in short timeframe ~4 weeks Preferred architecture not followed Network load balancer adds server to rotation RpcProxy FrontEnd AppPool requests backlogged Network load balancer takes server out of rotation Managed Availability Probe Fails Customer scaled beyond tested configuration NLB algorithm not optimized for Exchange load profile Resolution Least Connection / Slow Start on hardware LB Reduced Cores < 20 Scalability Improvements coming .NET 4.6 (In Preview) Managed Availability restarts service Large Organization Configuration 16 Cores / 92 GB RAM per server Deployed Exchange 2013 in All-In-One configuration NLB configured for ‘Round Robin’ What Happened? File writes failing, MA Probe failures, MDB Failovers Encountered bug with Anti-Virus Failed to deploy recommended fixes prior to migration Exposed new bug Impact Users frequently disconnected during peak periods ~8 weeks to isolate problem ~3 weeks to get fix and configuration changes in place IIS RpcHttp HttpProxy IIS RpcHttp RPC Client Access Store Worker Stalled I/O delaying clients response (dump showed 6min lock) I/O Manager File System Driver Is Valid File to Scan? Anti-Virus Filter Driver Device Driver Mini-Port Driver MBxDB Continued I/O delayed stalled forces MA to move Databases. Responders Goals Bring Office365 Capabilities On-Premises Monitor based upon end user experience Focus on recovery oriented computing Components Probes test components and user experience Monitors analyze probe(s) for Pass/Fail Responders take action based up monitor results When troubleshooting Restart BugCheck Reset AppPool Offline Failover MBX Escalate Services Monitors OutlookRpcCtpProbe OutlookProxyTestProbe OutlookRpcSelfTestProbe Monitor failures are a signal to a problem Consistent failures can force a bluescreen Performance Counters Event Logs Storage Some Database I/O Latencies, but overall all I/O is fairly healthy. CPU The server appears to be busy but uncertain if this normal or a bug… W3wp#11 CPU util running hot? Private Bytes reached 10GB+ before restarting Memory Massive growth in memory footprint of w3wp#11 process throughout the day. W3WP Process ID = 62192 AppDomain Used to enable isolation within a process 3 AppDomain by default Normal W3WP for Exchange has 3-4 AppDomains Created as a result of config change Exchange Leak in W3SVC/1= MSExchangeRpcProxyFrontEndAppPool Process Explorer View AppDomains and other .NET stats for running processes. Process Explorer Outlook Anywhere Servicelets used by Exchange for minor tasks RPCHTTPServicelet runs every 15 minutes RPCHTTPServicelet was writing update to the Default Web Site/Rpc site from “SSL” to “None” on every run. What was causing this change to continually be updated? Config Binaries Front-End AppDomain Front-End AppDomain Heaps Connections Back-End AppDomain Front-End AppDomain AppDomain (~125mb at startup) Default AppDomain MSExchangeRPCAppPool Every 15 Min Set SSLOffloading = true MSExchange Services Host Store Worker Instance System AppDomain RPC Client Access Front-End AppDomain MBxDB Investigation ~10 weeks of investigation Many iterations of data collected and analyzed Data Collection Deployment Guidance Missteps NLB Configuration Set to Round Robin Most recent CU Update + Hotfixes Resolution NLB Configuration changed to Slow Start Most recent CU Update + Hotfixes installed Interim configuration change until KB2925281 hotfix release Final fix in Exchange 2013 Service Pack 1 Analysis • • • • Exchange Server 2013 Performance Recommendations Exchange 2013 Sizing and Configuration Recommendations Exchange 2013 Performance Counters for troubleshooting • • • • • • • IIS Logs and Log Parser Studio Reports Exchange Performance Data Collection tool Exchange 2013 Performance Health Checker Script Windows Performance ToolKit (WPT) Performance Analysis of Logs (PAL) Tool Windows SysInternals • BRK3131: Exchange Design Concepts and Best Practices BRK3197: Exchange Server Preferred Architecture BRK3178: Exchange on IaaS: Concerns, Tradeoffs, and Best Practices BRK3173: Experts Unplugged: Exchange Server Deployment and Architecture BRK3158: Experts Unplugged: Exchange Top Issues BRK3129: Deploying Exchange Server 2016 BRK3102: Experts Unplugged: Exchange Server High Availability and Site Resilience http://myignite.microsoft.com