TMG Data Packager Analysis

Download Report

Transcript TMG Data Packager Analysis

TMG Data Packager Analysis
Part1
By Suraj Singh
Security Support Escalation Engineer
Microsoft Forefront EDGE team
Scope
• Intro
• Scenario
• Analyzing individual logs collected by TMG data packager.
• Correlating different logs to make sense and complete the picture and
understand what's going on.
INTRO
• TMGpackage.cab file is generated after we collect data using TMG data
packager tool while doing repro of the issue.
• More info about TMG packager tool http://blogs.technet.com/b/soorajsec/archive/2010/04/10/instructions-for-isa-data-packager-to-collect-datain-repro-mode.aspx
• This presentation talks about how to analyze the data collected by this tool.
• So first thing to do is unzip this cab file and proceed as explained in coming
slides.
• Idea is to give starting point to begin data analysis by yourself.
Logs collected
• It depends on the scenario and the issue and corresponding template
used on the TMG data packager wizard
• The logs collected also depends on the options we choose in the
modify options
For Basic repro the logs would look like this after
extracting the contents from the cab file.
BPA Reports
This report can be opened using ISABPA tool
What we get from BPA reports
•
•
•
•
•
•
•
Static configuration of the TMG
Build
EMS
Array
Monitoring logging and reporting.
Firewall policy
Networking(network configuration, ipconfig,routing table, NIC config, NLB
etc)
• Services , processes.
• Regkeys
TMGLogs
TMGLogs
Network Captures
ISAtracing
Tools
• Netmon(to read network traces)
• Excel(to read TMG logs)
• Text reader/editor tools for searching the text.
Scenario
• Issue : Outlook anywhere users disconnect intermittently and admin
sees, huge amount of failed connections in the TMG live logs with
different status codes e.g. 10054,64,1460.
• Troubleshooting
For status codes 10054 and 64 provided following blog post written by
Thomas deztner
http://blogs.technet.com/b/isablog/archive/2007/06/25/rpc-overhttp-logging-wildness.aspx
But there was no explanation for1460.
Data Analysis
• In this case data was already collected by Admin as per instructions
by previous engineer, it was taken as Basic repro at the time of issue.
• We used client ip of the client machine that was facing the issue to
filter the data.
• From the nature of the issue, it was clear that four important logs
were going to help us i.e. TMG live logs, Network captures, ISA
Tracing(this log can only be used by MS Engineers as it needs certain
files to parse this log, I m putting this here for reference to my peers
as well) ,ISA BPA(to look into config)as we needed to understand why
issue was happening and these on the fly logs were over best bets.
Where to start with
• We can choose BPA Reports- To understand the config, To know what
Ip ranges are in internal network, NIC configs and IP configs, NLB
config , Obvious alerts.
• Then in scenarios like this we can start with TMG logs, Admin
provided the client machine IP, so that was used to filter all the traffic
that came from this client.
• I have masked the client IP in the next snapshot but idea is that we
can use excel data filters to filter Live logs using client IP.
TMG Logs
After scrolling right on excel looked for filter info
column and corresponding request id
ISATracing(for my MS peers only)
• Used a text editor/reader tool to filter the ISAtracing log using the
request id found in the TMG Live Logs.
• Filtered the log and located the result code 1460 in it as we can see in
the next slode.
Zoom in
Noise:WPPISAPUBLIC:(x.x.x.x:1234 ==> x.x.x.x:443) (x.x.x.x:56789 --x.x.x.x:443), 0 bytes, "<NULL>", 1460(ERROR_TIMEOUT)
Network captures
• Filter the network montor traces taken on the internal NIC with this
combination 1.2.3.4:12345 --- 7.8.9.10:443(sample ip of TMG internal NIC
and CAS server and sample ports as I m masking the original ip/ports)
• With filter like
Tcp.port==12345 and tcp.port ==443 as it makes a unique session, but could
not find the traffic in the network capture as due to huge traffic netmon
buffers ran out.
From the error it was clear that there was time out and server did not
respond properly and time out happened , since we could not get
information about this unique session, next best plan was to see how good is
the connectivity between TMG and the CAS server, so once again looked at
the traces with a different display filter.
Conclusion and Action Plan
• After looking at three logs in this sequence we were able to find why we were
getting 1460 i.e. it was due to time out (ISAtracing) and then looking at the
network traces we found that there was really bad connection between TMG and
CAS server. To get the clear picture suggested Admin that from current data we
were able to determine that there is really bad connectivity between TMG and
CAS, we should also see the picture from CAS end to determine how it responds
to traffic sent by TMG at the time of the issue.
• If the CAS server does not respond due to bad NIC/drivers etc or its because of a
networking device between TMG and CAS. So suggested him to collect
simultaneous data from CAS, TMG and Client at the time of the issue.
 CAS : Network monitor captures
 TMG server-TMG data packager with web proxy and publishing template
 Client machine : Network monitor captures
Few Tricks with ISA Tracing using Text
editor/reader tools apart from Request ID.
• Keywords
Result code.
Error i.e. Keyword Error
Hexadecimal error that you are getting while doing repro of the issue e.g.
0x8007003a
IP address : port combination.
Failed (keyword)
Status code e.g. 1460
What to do if you see result of the keyword used
• Its hard work converted to smart work
• You need to locate your repro or issue traffic by finding IP address or port
or user name of the user who experienced the issue and once you are able
to locate that note down the context id or Request id and then search with
that context id in text analysis tool and you shall get one single
conversation and then observe through this conversation if you notice
issue in that.
• You might have to repeat this if you saw multiple request id with your
initial search to locate the problem.
• With practice this process becomes quicker so more you practice, better
and faster you become reading this log.
Q&A