Transcript Title

James Reid, project manager
[email protected]
Eddie Boyle, software developer
[email protected]
EDINA
Context - EDINA
• a JISC National Data Centre, 1995 – hosted by Edinburgh University Data Library, 1984 -
• mission...
to enhance productivity of research, learning and teaching
in UK higher & further education
• major provider within the JISC Information Environment
– range of bibliographic resources
– multimedia and image services
– key geo-spatial data and geo-referenced information
•
•
–
UKBORDERS (1994 - ) boundary outlines & geo- reference database
Digimap (2000 -) online source of Ordnance Survey mapping
development projects - geoXwalk,Go-Geo!,e-MapScholar,Pathfinder...
• strategic move toward interoperability & shared services role
– adoption of appropriate standards (OGC,ISO)
Context The JISC Information Environment is…
• variously stated as …
– a national digital library... for UK higher and further education
– a managed collection of quality assured resources
– a distributed resource supporting learning and research in the UK
• definitely heterogeneous
– ‘words, numbers, pictures, sound’: including geo-spatial data
• for use by researchers, students, teachers & support staff
• based on an underlying functional model
–
simplified to: search -> obtain -> use -> publish
–
{discover/locate} {request/access} {view/copy/amend/combine} {publish}
• now to have location-based searching
– requiring geo-referencing of information objects
The geoXwalk project
• funded under JISC DNER Development Programme
– builds on Phase I scoping study
– aims to develop a demonstrator gazetteer service suitable for
extension to full service.
• time-frame: start 1 June 2002 for 1 year
• project partners: EDINA and UK Data Archive
• aim: to develop a ‘proof of concept’ demonstrator
JISC Information Environment -geoXwalk as ‘shared service’
Content providers
Provision
layer
Shared services
Authentication
Authorisation
Broker/Aggregator
Fusion
layer
geoXwalk
Collect’n Desc
Service Desc
Portal
Portal
Resolver
Portal
Presentation
layer
Inst’n Profile
End-user
Geo-referencing: that’s what’s special
about the spatial
• subject content most often referenced by topic …
… but much (80%?) can be referenced to specific geographic
places
• broad disciplinary base for more powerful geographic
searching
– across the social, life & physical sciences as well as the
humanities
– also from libraries, archives and museums
– now from digital libraries, service providers & data providers
• geo-referencing thus a way of viewing information content:
–
subject, people, place and time
• geographic co-ordinates are persistent regardless of name,
political boundary or other changes
Why this is difficult...

How to search ‘geographically’ given that :
e.g. a postcode, a placename and an administrative area are all
valid geographies and yet every information system cannot know
about all the possible variations of what constitutes a ‘geography’!


Problem compounded by inconsistency of use even in the
‘standards’ e.g. placenames evolve, have alternative names
Long history in UK of boundary changes and changes in the
geographies used to record things e.g. electoral ward
boundary changes …
The vision

Make variations in definitions of ‘geography’ transparent

Provide a means to ‘crosswalk’ geographies
i.e. translate one geography into another - hence the name

‘Geographic agnosticism’
How?


A digital gazetteer that stores the different geographies and
can implicitly resolve the relationships between them
Provision as a service to service other services
Gazetteer - A list of geographic features together with their
associated spatial location
Digital Gazetteer - An electronic list of geographic features
together with their associated spatial location
(An authority database of places (and features?))
Digital Gazetteer Service - A network-addressable middle-ware
server supporting geographic referencing and searching.
A shared ‘terminology’ service.
Why not just use hierarchical thesauri?
(part of the ‘Document Tradition’)
United Kingdom………………………… (nation)
England …………………………..(country)
Devon………………………….. (county)
Barton………………………………..
Comment:
 one type of simple relationship between entries is exploited
 entries ordered from very general to very specific (BT, NT)
 can efficiently determine what a given area contains
 normally structured to handle alternative names (SY)
X rigid structure, one view only, typically geo-political
entities can belong in many hierarchies and new relationships evolve
X
X
X
X
names may not be unique
cannot deal with spatial proximity / contiguity
no way to relate to other geographies, e.g. postcodes
lack of simple hierarchies in UK (and other ‘old’) geographies …
There is underlying complexity, such as
Multiple Geographies …
Uses of geoXwalk Digital Gazetteer Service
1. As ‘shared service’, enabling other information services to
support full range of spatial searching (query constraints)
1. no need to hold all data (at service) to resolve spatial query
2. uses co-ordinates and (implicit) spatial relationships to ‘crosswalk’ between geographies
3. machine-to-machine (m2m) interaction to ‘shared service’
2. As reference facility for researchers, libraries & museums
– including means to resolve variant names etc.
3. As online facility to assist metadata creators and means to
semi-automatically geo-reference existing resources
geoXwalk Use Cases
Information
server
Geo-parsing &
indexing
Searching (1)
The geoXwalk
Server
e.g.
• Where is Aberdour?
• On what river is Dundee situated?
• By what alternative names has York been known?
• List me all places ending with ‘kirk’
Reference use
Information
server
Searching (2)
Task: Find resource about 'Liverpool docks’
Search using a ‘traditional’ gazetteer might yield:
co-ordinates allow (near) co-located places
to be co-identified.
Using spatial proximity in an active
gazetteer, the search can be widened:
Place
County/UA
Liverpool
Bebbington
Birkenhead
Bootle
New Brighton
Seacombe
Seaforth
Waterloo
Liverpool
Wirral
Wirral
Sefton
Wirral
Wirral
Wirral
Sefton
… that means more
& better hits …. !!!
Supporting service searching:
“Photographs of towns along the River Tweed”
Place name - River Tweed
Feature Type: River
Relation: ‘near’
Distance: 1/2 km
Target type: towns
Places...
Image finder server
(Images indexed on place names)
Peebles
Innerleithen
Melrose
Kelso
Coldstream
Berwick upon Tweed
Supporting cross searching:
geoXwalk in the Common Information Environment
Coordinate footprints - Dundee
(334995, 729203, 350609, 734710)
Places:
Barnhill
Broughty Ferry
Craigie
Douglas And Angus
Fintry
Lochee
Monifieth
West Ferry
Supporting cross searching different services
‘Find resources for this postcode’
(NB postcode often used to geo-reference survey data files)
Post code: L34 0HS?
Coordinate footprints
340900,392300 - 347217, 397660
Portal service
Knowsley
Content
Provider A
Place names
BX003
Parish names
Content
Provider B
geoXwalk
Server
Content Provider C
As online facility to assist metadata creation
• Most of the extant resources in the JISC IE have some form of
spatial reference e.g. placename, county name, postcode
• A ‘geoparser’ has been developed which will assist in the
semi-automatic indexing of these resources by using the
gazetteer as reference.
• The results of the geoparsing can be used to update the
documents metadata, making it directly geographically
searchable.
Need screen shot of parser here
<
Developments to Date
1. Creation & population of GB gazetteer database with:
1.
2.
3.
4.
Enhanced OS 1:50,000 Placename Gazetteer
Digital boundary data (UKBORDERS)
Additional Place Name Variants (partial for Scotland and Wales)
Derived multi-source data e.g. named woodlands and lakes
based on hybrid 1:50K gazetteer and OS products
2. Development of spatial extensions to database to support
enhanced geographic search capabilities
3. Development of middleware to support m2m and interactive
searching
4. Use of ADL content standard, feature type thesaurus, query
protocol
• Testing of alternative query protocols -ADL/SOAP/Z39.50(?)
• Development of a geoparser to support semi-automatic
indexing
ADL Gazetteer Content Standard
Geographic Feature ID
Geographic Name
Variant Geographic Name (R)
Type of Geographic Feature (R)
Other Classification Terms (R)
Geographic Feature Code (R)
Spatial Location (R)
Street Address
Related Feature (R)
Description
Geographic Feature Data (R)
Link to Related Source of Information (R)
Supplemental Note
Metadata Information
•
•
•
•
comprehensive description but with small set of core elements
temporal aspects of names, footprints, relationships, …
document source, spatial accuracy/scale of footprint
does permit explicit relationship types!
http://www.alexandria.ucsb.edu/gazetteer
Query for a placename
XML query fragments
<?xml version="1.0" encoding="UTF-8"?>
<gazetteer-service
xmlns="http://www.alexandria.ucsb.edu/gazetteer"
version="1.1">
<query-request>
<gazetteer-query>
<?xml version="1.0" encoding="UTF-8"?>
<name-query operator="equals”<gazetteer-service
text="Fife"/>
</gazetteer-query>
xmlns="http://www.alexandria.ucsb.edu/gazetteer"
<report-format>standard</report-format>
xmlns:gml="http://www.opengis.net/gml"
</query-request>
version="1.1">
</gazetteer-service>
<query-request>
<gazetteer-query>
<and>
<class-query thesaurus="Edina FT Thesaurus” term="towns"/>
<footprint-query operator="within">
<gml:Box>
<gml:coordinates> -0.02988,51.45753,
1.30798,52.07042 </gml:coordinates>
</gml:Box>
</footprint-query>
</and>
</gazetteer-query>
<report-format>standard</report-format>
</query-request>
</gazetteer-service>
Query by feature type and bounding box
Ongoing Work and Issues
• Merging geo-data from different scales & from different
sources
– how to accommodate historical data
– positional accuracy & expression of confidence?
– how to minimise effort in de-duplication of place(s)?
• places have multiple names, types, and footprints
• need to be able to identify duplicate entries for the same place
• Presenting geo-names on different occasions?
– many variant ‘proper’ names, what is preferred?
• what is the ‘name authority body’? - none in the Scotland or the UK
• preferred name varies with location and use and culture
– there are language and character code set issues
– ‘standard’ codes for postal addresses and other geographies
• IPR issues in metadata; and hence terms & conditions of use
• Service performance issues and appropriate protocols
Contact details
• [email protected]
EDINA, Data Library, University of Edinburgh
telephone +44 (0)131 650 3302
• For information on geoXwalk project:
www.geoXwalk.ac.uk