Enterprise Geocoding Workshop: Architecture and Issues

Download Report

Transcript Enterprise Geocoding Workshop: Architecture and Issues

Enterprise Geocoding Workshop
Architecture and Issues
Craig Wolff, M.S. Eng
CA Environmental Health Tracking Program
Environmental Health Investigations Branch
CA Department of Health Services
Impact Assessment, Inc.
[email protected] http://ehib.org http://catracking.com
1
What is Enterprise Geocoding
• An address broker that extracts geographic
coordinates (lon/lat, region identifier) for multiple
users/applications across an enterprise
• An address broker provides address
standardization/verification, geocoding, and
region overlay
• Implicit is standards for geocoding and application
interoperability
[email protected] http://ehib.org http://catracking.com
2
Yeah? So who cares?
• Geocoding is almost always the first step to
linking environmental and health data
• Not all Tracking stakeholders have capacity
(expertise, data/application resources) or
interest/mandate to geocode
• Address and geocode quality can increase if
it’s done as close as possible to the time an
event is reported
[email protected] http://ehib.org http://catracking.com
3
The Transaction is Everything
• Enterprise Geocoding is handled by a unit of
interaction called a transaction
• Request to server, processing at server, response to
client
• A request can handle one or many addresses
• Server-side processing includes address
standardization, verification, geocoding versus
multiple street centerlines, and region overlay
• Response is the result of the processing
[email protected] http://ehib.org http://catracking.com
4
Web Services
• XML provides interoperability standard for
messaging over the Web
• SOAP provides an interface to methods that
use “serializable” objects. Client and server
implementations do not have platform
restrictions. (e.g. Microsoft talks to Java
talks to ESRI)
[email protected] http://ehib.org http://catracking.com
5
Serializable Objects
• Address – street (prefix, number, street name,
type, suffix, etc), zip, city, error (error codes from
CASS-certified standardizer/verifier)
• GeocodeOptions – Options for how you want
addresses geocoded in a session
• GeocodeRecord – Processed result of a geocoding
transacation
• RegionIDs – List of extracted regions for a single
geocode
[email protected] http://ehib.org http://catracking.com
6
GeocodeOptions
•
•
•
•
•
•
•
•
boolean doStreetID
boolean doStandardizedAddress
boolean doRegionID
boolean doZipAsZone
boolean doCityAsZone
boolean doFirstMatchingCoordOnly
boolean doMultiServiceErrorMetrics
boolean doResourceSpecificRegions
[email protected] http://ehib.org http://catracking.com
•
•
•
•
•
•
•
int sideOffset
int spellingSensitivity
int minimumMatchScore
int minimumCandidateScore
String [] streetResources
String [] standardizationResources
String [] regionResources
7
GeocodeRecord
•
•
•
•
•
•
•
•
•
•
String [] status (M/U/T)
short [] score (0-100)
String [] side (L or R)
double [] x
double [] y
String [] streetID
RegionIDs [] regionIDs
String [] metadataID
float [] averageError
Address standardizedAddress
[email protected] http://ehib.org http://catracking.com
8
SOAP Methods
• public void initializeGeocode (String user, String password)
• public void setGeocodeOptions (GeocodeOptions options)
• public GeocodeRecord findAddress (Address address)
• public GeocodeRecord [] findAddresses (Address [] addresses)
[email protected] http://ehib.org http://catracking.com
9
Technologies
• SQL Server and ArcSDE for Enterprise GIS
–
–
–
–
–
Storage of street centerline and region data
Geocoding engine
Application server for GIS operations
Java Client API for ArcSDE
http://arcsdeonline.esri.com
• ZP4
– CASS-certified address standardization
– C API
– http://www.semaphorecorp.com/cgi/zp4.html
[email protected] http://ehib.org http://catracking.com
10
More Technologies
• Apache Axis
– Java-based client/server web services tool
– Exposes Java methods and objects on server-side
– http://ws.apache.org/axis/
• Apache Tomcat
– J2EE application server
– Also runs ArcSDE Client API & Axis
– http://jakarta.apache.org/tomcat/index.html
[email protected] http://ehib.org http://catracking.com
11
Even More Technologies
• Java Topology Suite (JTS)
– ArcSDE Client API bug workaround
– More robust spatial analysis methods/objects
– http://www.vividsolutions.com/jts/jtshome.htm
• Visual Studio .NET, C#
– For creating ZP4 web service
– For creating web service clients
[email protected] http://ehib.org http://catracking.com
12
Building a City Geocoding Index
• Update street centerline attributes with soundex
(zip’s PO name) on left and right
• Build geocoding index on city soundex left/right;
note: ArcCatalog will overwrite any previous
indexes built for the same streets, see
http://forums.esri.com/Thread.asp?c=2&f=59&t=9
6397#271863 for creating a locator using a custom
template and command line interface
• Pass soundex(city name) from address table
• Never accept candidates who have tying score
[email protected] http://ehib.org http://catracking.com
13
Soundex
• Phonetic coding of a word; Geocoders use a 4 character
scheme
• Codes:
• First character in code is same as input
• Letters with codes of 0 are not included
• Words with less than 4 corresponding codes, receive
trailing zeros
• Examples
Poppy: P110
Santa Clara: S532
Oxford: O213
Main: M500
Santa Clarita: S532 Los Angeles: L252
[email protected] http://ehib.org http://catracking.com
14
Geocoding in Java
• Use ArcSDE Java Client API to communicate with
ArcSDE
• Use ArcSDE’s Server Side Application (SSA)
construct
• See http://forums.esri.com/Attachments/6591.pdf
for an example
[email protected] http://ehib.org http://catracking.com
15
Useful Patterns
• ArcSDE connections are notoriously slow to
initialize  re-use connections from a Connection
Factory, and close connections after timeout
• Lots of data sources, server names, passwords, etc.
 store this info in a database table, create an
object that encapsulates data resources; never
hardcode
• Use Axis/Tomcat sessions to minimize redundant
parameter passing
–
–
Server: ((HttpServletRequest)
MessageContext.getCurrentContext().getProperty(HTTPConstants.MC_HTTP_SERVLETREQUEST)).getSess
ion()
Client: setMaintainSession(true)
[email protected] http://ehib.org http://catracking.com
16
Client Implementations
• .NET thin desktop client
– Consumes centralized geocoding service and address
standardization service
– Input text, Access, or SQL Server table of addresses
– Requires Windows and .NET Framework
• Browser-based HTML thin client
– Better compatibility
– More effort in inputting addresses
– Easier to couple with environmental linkage services
[email protected] http://ehib.org http://catracking.com
17
Future Steps
• Tools developed thus far address automated
geocoding  still need tools for interactive
geocoding on a map with orientation layers
– Many map services (some free) from USGS, Google,
Microsoft that layer vectors and imagery in basemap
• Need quicker geocoding engine (commercial
service? Centrus?)
• Need less cumbersome address standardization
service (USPS?)
[email protected] http://ehib.org http://catracking.com
18