Data Connections -- Options, Functionality, and Performance

Download Report

Transcript Data Connections -- Options, Functionality, and Performance

Data Connections -Options, Functionality,
and Performance
Objective
• Use VBA code to extract and store data from web
pages when the underlying database is not
accessible.
• Step one – Accessing the web pages
• Step two – Extracting the data
Accessing the Web Pages
• Internet Explorer Object
• WinHttpRequest Object
• XMLHttpRequest object
Retrieving the Data
• Document Object Model (DOM)
• Regular Expressions
IE Object
Dim objIE As Object
Dim varTables As Variant
Set objIE = CreateObject("InternetExplorer.Application")
URL = "http://google.com"
objIE.Visible = False
objIE.navigate URL
Do Until Not objIE.Busy
DoEvents
Loop
While objIE.Document.ReadyState <> "complete"
Wend
Set varTables = objIE.Document.all.tags("TABLE")
IE Object
Advantages
•
•
•
•
View web page in action
Credentials
Background tasks
DOM
Disadvantages
• Hangs up
• Can interfere with other browsers
• More resources
WinHttpRequest Object
Dim winHttpReq As Object
Set winHttpReq = CreateObject("WinHttp.WinHttpRequest.5.1")
URL = "http://greensboro.usps.gov/Operations/SETIarea/SETI_ReProcessScans.cfm?
requesttimeout=5000&SDate=" & ProcDate
winHttpReq.SetTimeouts 6000000, 6000000, 6000000, 6000000
winHttpReq.Open "GET", URL, False
winHttpReq.SetCredentials “username", “password",
HTTPREQUEST_SETCREDENTIALS_FOR_SERVER
winHttpReq.Send
result = winHttpReq.responseText
Set winHttpReq = Nothing
WinHttpRequest Object
Advantages
• More timeout control
• Waits for webpage to complete
Disadvantages
• Cannot see what is returned
• May need Credentials
• No DOM
Time Trial
Using two different versions of the Scan Error Tracking program, we
ran three separate scans of 542 zip codes. Execution time is in seconds.
IE Object
WinHttpRequest Object
470
497
452
691
414
259
DOM vs RegEx
• The DOM is preferred when parsing web pages
where the data is in uniform locations, using tables
and rows.
• Regular Expressions work best when attempting to
find data on a page where the location of the
information, or the structure of the page is not
known in advance.