Lopez - The Center for Geographic Analysis, Harvard University

Download Report

Transcript Lopez - The Center for Geographic Analysis, Harvard University

Linking Open Data with Location: Gazetteers and the Semantic Web

Xavier Lopez, Director, Product Management

Overview

• • • • Linked Open Data (LOD) Concepts Role of Gazetteers in LOD Interconnected Web of Content Towards Geospatial Knowledge Management

Linked Data

“A method of publishing structured data, so that it can be interlinked and become more useful”

Wikipedia

Linked Open Data –

The Opportunity • Manage relationships for massive collections of structured and unstructured data • Flexible and extensible data model supports powerful search and end user discovery of related content • Enable users to define their social networks or communities based on common interests, subjects, image scenes, locations,

etc

.

• Rich platform for data integration, data repurposing, and better quality control and classification Semantic Aggregation & Navigation of Data • Tactical, non-invasive, iterative solution for strategic modernization

Simple Linked Data Architecture

User Domain & Task Ontologies

Query & results

Data Ontologies

(Reasoning/Inferencing) Engine Data Sources

Linked Open Data Cloud (2008)

Linked Open Data Cloud (2010)

Resource Description Framework (RDF)

RDF is a general framework for describing a Web site's

metadata

, or the information about the information on the site. It provides interoperability between

applications

that exchange machine-understandable information on the Web.

W3C

Modeling: A FOAF Example

Gazetteers and Linked Open Data Services

• • • • • Provide common terms (place names) to link across existing spatial data resources Enable consolidated view across the map layers Reconcile differences in data semantics so that they can all “talk”and interoperate Resolving semantic discrepancies across databases gazetteers and applications Integrate full breath of enterprise content continuum (structured, spatial, email, documents, web services)

Modeling: Enterprise Integration

• Ordnance Survey maintains definitive mapping data of Great Britain, the world’s largest and most detailed Geo DB • Semantic Web is used to integrate different, semantically diverse sources of data • General ontologies already developed to bridge differences in terminology • The data is queried efficiently via the ontology or RDF • Advantages include efficient data integration, data repurposing, and better quality control and classification

Source: http://www.w3.org/2001/sw/sweo/public/UseCases/

From Linked Data to Knowledge Management

Knowledge Management Conditions

• • • • • • Filtering search queries with “context” Discovery of data relationships across… • • Structured data (database, apps, web services, RSS schemas) Unstructured data (email, office documents) Queries are not defined in advance Schemas are continuously evolving Support Machine2Machine interaction Location can be common link, along with names, concepts, synonyms

What Can Linked Data Enable?

Mapping & Geotagging Social Network Relations Analysis of Complex Relations Rule-based Reasoning

Ontology-driven Map Apps Application Ontologies Situational Awareness National Map Core Datasets RDF & OWL Data Spatial Data Geographic Names Raster Data

• • • • • • Simple Features GeoRaster Topology Networks Gazateers … • • • • • • Data Integration National Map schemas Geographic names Temporal Naïve Geography …

Theater Targeting

A “Simple” Knowledge Query

Which hospitals within 30 mins of Alpine, CA provide burn treatment?”

• We need to associate a number of factors, including hospital type and facilities – its accessibility after a disaster – and the staff available • The query needs to be structured based on Concepts & Relationships that can be retrieved and then customized for the specific query. • Using this approach, a listing of the hospitals capable of dealing with large number of burn cases is returned to the user and information associated with the query retrieved.

“Typical” Analyst Query:

“ Which hospitals within 30 mins of Alpine, CA provide burn treatment?”

Feature Reference Type?

What does this mean?

Buffer or proximity?

Driving or Flying?

Road Closures?

Definition?

Where is this?

Centroid or outline?

Ontologies for Problem Solving

Burns Unit Burns Route Specialists Duty Rota Illness Treatment

Emergency Team

Skin Graft Location Hospital Weather Flood Hazards Vehicles Type A&E Helipad Roads Beds Obstructions Type Size

Oracle 11 g RDF/OWL Graph Data Management

Storage & Loading

Native W3C RDF graph data store

Fast Bulk, batch & Incremental load

Query

SQL: SEM_MATCH graph pattern query

SPARQL: supported via Jena plug-in

Reasoning

RDF, OWL Prime, RDF++ semantic rules

Forward chaining inference model

User defined rule base

Scalability

Scales to billions of triples

Partitioning, RAC, Adv. Compression

Standards & Interoperability

Aligned with W3C specifications

Supported by leading semantic tools Structured DBMS, Unstructured, Spatial, RSS, email, Documents

Conclusions

• Key semantic technologies are mature • Semantic technologies are key enablers for enterprise and Web • Reuse existing of authoritative gazetteers are needed • Model the real world rather than data artifacts

Find out more...

oracle.com/database/spatial.html

oracle.com/technology/products/spatial Q & A oracle.com/technology/products/spatial/htdocs/pro_oracle_spatial.html

Information Explosion

• • Structured data stores are growing in size Amount of semi-structured data is expanding (XML, RDF, Semantics, Spatial) • • Metric data, beacons, sensors supplying mega volumes Unstructured data is gathered at a staggering pace (email, documents, messages, streams, feeds)

Modeling Domain Information

Resource Description Framework (RDF)

RDF is a general framework for describing a Web site's

metadata

, or the information about the information on the site. It provides interoperability between

applications

that exchange machine-understandable information on the Web.

W3C

Simple Transitive Reasoning

:partOf :California :partOf :USA :partOf :NorthAmerica :partOf rdf:type

Asserted Facts

owl:TransitiveProperty :partOf rdf:type owl:TransitiveProperty :California :partOf :USA :USA :partOf :NorthAmerica

Derived Facts

:California :partOf :NorthAmerica Query : SELECT ?x ?y FROM … WHERE { ?x :partOf ?y }

Result

:

?x______

?y__________ :California :USA :California :NorthAmerica :USA :NorthAmerica

Integrated Bioinformatics Networks

Source: Siderean Software

Text/Spatial Mining Workflow Web Resources Information Extraction

Categorization, Feature/term Extraction

RDF/OWL Processed Document Collection Ontology Engineering Modeling Process OWL Ontologies Domain Specific Knowledge Base News, Email, RSS Content Mgmt. Systems Spatial Data

Explore Browsing, Presentation, Reporting, Visualization, Query

Analyst