BCS Powerpoint template (arrows)

Download Report

Transcript BCS Powerpoint template (arrows)

GeoSpatial “Unstructured Data” Dan Rickman GeoSpatial SG

Agenda

What is geospatial data What does “structured” geospatial data look like?

General data modelling issues regarding geospatial data In search of the BLPU A brief history of OS maps – how structured are they (then and now) Raster map data EDRM Geo-parsers/gazetteers/metadata Web-based systems Future directions?

What is Geospatial Information? - 1

Spatial data which relates to the surface of the Earth Geodetic reference system as base e.g. WGS84 used for Global Positioning System (Earth as an ellipsoid), Latitude and Longitude (Earth as a sphere) Ordnance Survey (GB) define National Grid – projection onto flat surface – NB: OS(NI) use Irish grid Spatial relationships – defined around concept of neighbourhood – relates to two “laws” of geography: •

Most things influence most other things in some way

Nearby things are usually more similar than things which are far apart

What is Geospatial Information? - 2

Unstructured – spaghetti data Topology – information structured as networks, polygons GeoSpatial information requires metadata – e.g. minimal information such as map projection used GeoSpatial information may also temporal modelling – e.g. farm subsidies vary as utilisation and legislation change Field-based model versus object-based model of space, e.g. rainfall versus buildings on which rain falls GeoSpatial information requires ontology

– What is the “real world”, how classified

Relates to semantics

What are GeoSpatial Systems?

Known as Geographic Information Systems, Spatial Information Systems Enables capture, modelling, storage, retrieval, sharing, manipulation and analysis of geographically referenced data Database is at the heart – as is “attribute” data Model developing – perhaps GeoSpatial data better seen as “attribute” of alphanumeric business information Presentation does not have to be map-based in all cases Key element is spatial indexing – uses different techniques to alphanumeric indexing

Where used? Examples

Central government – DEFRA, ODPM, Land Registry, ONS Local government – planning, highways authorities Utilities – physical and logical network Insurance – flood plains Health – epidemiology Travel, multi-modal route planning More widespread use – addresses, postcode based data against regional boundaries, infrastructure (“geographies” used to divide country, catchment area) Fiat boundaries verus “bona fide” boundaries – what is “real world” how do we structure it?

CRM ERP

Structured geo-database Paradigm shift?

Relational Database (Attribute data) Spatial Data (proprietary format) Spatially extended RDBMS -Complex data types for spatial data -Computational geometry -Spatial indexing -DDL and DML extensions

GIS

ROMANSE - Hampshire CC

Roadwork Information

Geospatial data modelling

Field-based model versus object-based model Geographic Information Systems are object-based in practice Most common field based information, e.g. Digital Elevation Model (line of sight applications), attached to objects Objects rely on field-based model, i.e. spatial co-ordinates Initiatives such as Digital National Framework encourage organisations to structure data on references to objects, not re capture and duplicate data GeoSpatial equivalent of “referential integrity” Nevertheless duplication, lack of (referential) integrity is common place and hard to eradicate

In search of the BLPU

Basic Land and Property Unit “Holy grail” of industry – no Da Vinci code produced yet!

Example of Ordnance Survey Master Map (OSMM): "St Mary's football stadium, Southampton" is one object Typical detached house and its plot of land, likewise Complex entities such as "Southampton railway station" are defined in terms multiple objects: one for the main building, several for the platforms, one more for pedestrian bridge over the tracks. (NB: See Wikipedia article on TOID) Defining the candidate BLPU, their lifecycles and their attribute data and verifying that these are meaningful/practicable from the wide variety of business processes which apply to the BLPU and the aggregate entities which are created from them Dependencies so that data sets are based on the BLPU wherever possible limited by business use, e.g. field use change quite different from a tenant/owner perspective

paper records 1950 paper mapping 1970

Evolution of geographic information

database records digital records geographic information digital mapping 1990 2010

Raster map data

Scanned ortho-rectified map or map-based data – metadata is co ordinates, projection, extent For example Google Maps/Google Earth, Microsoft Virtual Earth Traditionally stored outside the database as external files, analogous to vector data storage, e.g. Oracle 10g GeoRaster Data stored as BLOBs, metadata required regarding number of bytes per pixel, compression algorithms and so on Benefits limited as “intelligence” in map requires interpretation Still limited progress on map-based pattern recognition – there are semi-automated solutions from companies such as Laser Scan

EDRM

Electronic document and records management Increase usage in local/central government due to Freedom of Information act Contain potentially significant geospatial data Most common example is address Requires capture of appropriate metadata or appropriate pattern recognition to identify addresses Requires gazetteers to provide reference to spatial co-ordinates NB: most familiar gazetteer – list of streets in AtoZ maps

Geo parsers/gazetteers/metadata

Geo-parsers: identify spatial tags (geo-tags) in data Context sensitivity and patterns of usage required E.g. Jordan (country) != Jordan (Katie Price) Can see an example at: http://edina.ac.uk/projects/geoxwalk/geoparser.html

Relies on and populates gazetteer of associated names Emerging standards for geo-parsing, e.g. Open GIS Consortium looking at:

– Gazetteer service – Geo-coder service – Web services (WMS/WFS)

Web-based systems Google Earth meets Flickr

Web-based systems (metacarta, KML, mashup)

Web-based systems

World wide wild west of unstructured data Increasing use of systems to control, coordinate and make this accessible Geo-enabled semantic web – raises issues of ontology www.metacarta.com

– provide web-based Geographic Text Search (GTS), has the ability to confine searches by geography and retrieve information that it detects using the keywords, and then displays this information geographically on a map interface (working now with Google Earth).

They know where you live

MetaCarta(R), Inc., a leading provider of geographic intelligence, announced today that it had won a one year contract with … the Department of Homeland Security [which] identifies and assesses current and future threats to the homeland, maps those threats against the nation's vulnerabilities, issues timely warnings and takes preventative and protective action… The product automatically identifies geographic references using advanced natural language processing (NLP) from any type of unstructured content in a customer's archives such as email, web pages, newswires or cables. It assigns a latitude and longitude to these references so that users can analyze their text archives using geographic maps, keywords and time as filters. The results of a query are displayed on a map with icons representing the locations found in the natural language text of the documents and as a text results list. Both the icons and text summaries are hyperlinked to the documents they represent. (Source: http://www.prnewswire.com/cgi bin/stories.pl?ACCT=109&STORY=/www/story/03-14 2005/0003193909&EDATE=)

The future (and summary)

Structured environment – will contain more “unstructured” data Web will continue to provide unstructured distributed data Success of semantic-based approach yet to be determined, experience with geospatial data indicates there are significant complexities based around our representations of the “real world” One issue is clear – increasingly less privacy, location is already accessible through mobile phones and linking this to other data can provide significant intelligence information Also clear – data quality issues will persist They will still get it wrong!