Transcript Title

A Geo-spatial Perspective
or
What’s Special about the Spatial?
Peter Burnhill
Director
EDINA, UK National Data Centre
University of Edinburgh
CoSMiC Terminologies Day,
National Library for Scotland, Edinburgh
Preamble
This is based on two earlier presentations:
1. Workshop on ‘Digital Gazetteers’, made (by AC & JJ) at
ACM/IEEE-CS Joint Conference on Digital Libraries 2002,
Portland, Oregon, USA, 14-18 July 2002
2. ‘New Directions in Metadata’, made (by PB & DMS) at
OCLC/SCURL Pre-IFLA Conference, Edinburgh August 2002
So, acknowledgements to:
David Medyckyj-Scott, Andy Corbett & James Reid
(Research & Geo-data Services, EDINA)
Purpose & Overview
• Set context
• Internet & Digital Libraries & GIS
• EDINA & the JISC Information Environment
• What’s special about the spatial?
• Referencing & spatial co-ordinates
• gazetteer models (nominal & active)
• Progress towards digital gazetteer services
• geoXwalk & other projects
• Summary & Conclusions
Internet, Digital Libraries & GIS
1. Some Big Issues
–
–
–
–
Metadata & Interoperability
Naming, Identifiers & Authority Files
Ontologies
Shared services
2. Information Science
–
Digital Library, as mix of document & computation traditions
Michael Buckland, ‘The Landscape Of Information Science’ JASIS
Special Issue "JASIS at 50", Wiley 1999. (‘Presidential Address’)
3. Subject-matter methodology
–
Geographic Information Systems
• deconstructing the Map: both as database & as display device
• referencing: the cartographic trick
‘surface of sphere to flat paper/screen’
EDINA
• a JISC National Data Centre, 1995 – part of Edinburgh University Data Library, 1984 -
• mission...
to enhance productivity of research, learning and teaching
in UK higher & further education
• major provider within the JISC Information Environment
– range of bibliographic resources
– launching sound and picture studio
– key geo-spatial data and geo-referenced information
•
•
UKBORDERS (1994 - ) boundary outlines & geo- reference database
Digimap (2000 -) online source of Ordnance Survey mapping
• strategic move toward interoperability & shared services role
– adoption of appropriate standards
The JISC Information Environment is…
• variously stated as …
– a national digital library... for UK higher and further education
– a managed collection of quality assured resources
– a distributed resource supporting learning and research in the UK
• definitely heterogeneous
– ‘words, numbers, pictures, sound’: including geo-spatial data
• for use by researchers, students, teachers & support staff
• based on an underlying functional model
–
simplified to: search -> obtain -> use -> publish [digital soup]
–
{discover/locate} {request/access} {view/copy/amend/combine} {publish}
• now to have location-based searching
– requiring geo-referencing of information objects
Q: What’s Special about the Spatial?
• subject content most often referenced by topic …
… but much (80%?) can be referenced to specific geographic places
• broad disciplinary base for more powerful geographic searching
– across the social, life & physical sciences as well as the humanities
– also from libraries, archives and museums
– now from digital libraries, service providers & data providers
• geo-referencing is thus a way of viewing information content:
–
subject, people, place and time
A: Geo-referencing, that’s what
So, what is geo-referencing? What are geo-data?
Statistical Account of Scotland
NUMBER XIII.
PARISH OF CULLEN.
(COUNTY OF BANFF, SYNOD OF ABERDEEN,
PRESBYTERY OF FORDYCE.)
By the Rev. Mr. ROBERT GRANT.
Royalty, Extent, Climate, etc.
CULLEN, as appears from old charters, was originally
called Inverculan, because it stands upon the bank of
the Burn of Cullen, which, at the N. end of the town, falls
into the sea: but now it is known by the name of Cullen only. Cullen is a royal burgh, formerly a constabulary, of
which the Earl of Findlater was hereditary constable. The
set, as it is called, of the council, consists of 19, in which number are included the Earl of Findlater, hereditary preses, 3
bailies, a treasurer, a dean-of-guild, and 13 counsellors. The
parish extends from the sea fouthward, about 2 English miles
in length.
Geo-spatial data
“data that have some form of spatial or geographic reference that enables them to be
located in two- or three-dimensional space”
Models of Gazetteer(1): Place Name Vocabularies
• simple list of place names
has many problems
e.g. non-uniqueness
• common form is {name, location}
"index" in atlas or "geographical
dictionary"
• ‘location’ field often has name of
larger area that ‘contains’ the
place
Barrow Street
Barrow upon Soar
Barrow uponBarton,
Trent
Barrowby Barton,
Barrowden Barton,
Barrowford Barton,
Barton,
Barry
Barton,
Barsby
Barton,
Barthlow
Barton (8)
Barton Bendish
but even then the name may still
not be unique
‘The Nominal Gazetteer’
Cambs
Ches.
Devon,
Glos,
Lancs (2)
N. Yorks
Warks
Example: Hierarchical Thesaurus
(part of the ‘Document Tradition’)
United Kingdom………………………… (nation)
England …………………………..(country)
Devon………………………….. (county)
Barton………………………………..
Comment:
 one type of simple relationship between entries is exploited
 entries ordered from very general to very specific (BT, NT)
 can efficiently determine what a given area contains
 normally structured to handle alternative names (SY)
X rigid structure, one view only, typically geo-political
entities can belong in many hierarchies and new relationships evolve
X names may still not be unique
X cannot deal with spatial proximity / contiguity
Fatal Flaw: no one single, simple hierarchy in Scotland
no way to relate to other, multiple (e.g. postcode) and ‘old’ geographies …
Boundaries in Fife, Scotland
Pause, to ponder the puzzle of place …
1. places can be defined in space (as an ‘area’, not a single ‘point’)
–
–
a named feature, e.g. Lake Geneva
a space taken for human settlement, e.g. Edinburgh
and those areas change over time, can be fuzzy, or even poetic
2. names of places are not unique, nor persistent, and have
considerable cultural ‘baggage’
–
a given place can have more than one proper name
• different languages
• alternative contemporary and historic names, even within a given language
Auchterderran, Fife, Scotland has 21 alternative names or name spellings
e.g. Auchterderay, Ochtirderay, Urchan, Hurkyndorath
 Paradox: geography is global, but naming is local
 Nevertheless, geo-referencing means more, requires more, than a
controlled vocabulary of (place) names
Getting geographic (1): Being coordinated
• how should we geo-reference?
–
with a co-ordinate system that can be related to a specific position or
location on the earth's surface
• geographic co-ordinates allow places to be represented by the
appropriate footprint
–
settlements, lakes as areas; roads, rivers as lines; stations as point
• and offer persistence, regardless of name, political boundary or
other changes and a consistent framework for spatial queries
• geographic co-ordinates allow proximate places, those close to one
another, to be identified
–
appropriate geo-referencing thus ‘enriches’ textual description
• as ever, not everyone uses the same standard spatial coding scheme
–
systems that relate to geo-graphic (Cartesian) coordinates are the
preferred metadata of choice, providing opportunity for ‘cross-walk’
Getting geographic (2): Models of Gazetteers(2)
•
Simple use of a geo-spatial reference in
the location field: the National Grid
Barton
Barton
Barton
Barton
Barton
Barton
Barton
Barton
(540620,
(344880,
(410080,
(351580,
(335223,
(423170,
(290950,
(410849,
255780)
354210)
225320)
437670)
409318)
508880)
67220)
251111)
Barrow Street
Barrow upon Soar
Barrow upon Trent
Barrowby
Barrowden
Barrowford
Barry
Barsby
Barthlow
Barton (8)
Barton Bendish
Towards ‘The Active Gazetteer’
Task: Find resource about 'Liverpool docks’
Search using a nominal gazetteer might yield:
co-ordinates allow (near) co-located places
to be co-identified.
Using spatial proximity in an active
gazetteer, the search can be widened:
Place
County/UA
Liverpool
Bebbington
Birkenhead
Bootle
New Brighton
Seacombe
Seaforth
Waterloo
Liverpool
Wirral
Wirral
Sefton
Wirral
Wirral
Wirral
Sefton
… that means more
& better hits …. !!!
‘Active’ Digital Gazetteer Services
 Gazetteer - A list of geographic features together with their
associated spatial location
 Digital Gazetteer - An electronic list of geographic features
together with their associated spatial location
An authority database of places (and features?)
An ‘Active Gazetteer”
 Digital Gazetteer Service - A network-addressable middleware server supporting geographic referencing and searching.
A shared ‘terminology’ service.
International Digital Gazetteer Initiatives
extant digital gazetteer services
•
•
•
•
•
USGS Geographic Name Information Systems
Canadian Geographical Names Data Base
GEOnet Names Server (NIMA)
Getty Thesaurus of Geographic Names
Columbia Gazetteer of the World Online
projects from which digital gazetteer services might spring
• Open GIS Consortium Geospatial Fusion Services Testbed
Geocoder, Gazetteer and Geoparser services
• Alexandria Digital Library Gazetteer Development *
• Electronic Cultural Atlas Initiative - ECAI *
• geoXwalk, JISC-funded collaborative project *
* Presentation to Workshop on ‘Digital Gazetteers’ at ACM/IEEE-CS Joint Conference on Digital
Libraries 2002, Portland, Oregon, USA, 14-18 July 2001
The geoXwalk project
• funded under JISC DNER Development Programme
– builds on scoping study
– aims to develop a demonstrator gazetteer service suitable
for extension to full service.
• time-frame: 1 June 2002 - 31 May 2003
• project partners: EDINA and History Data Service
• similar to the ADL approach (Linda Hill et al)
– reviews the ADL Gazetteer Content Standard
– builds on, and adapts ADL geographic feature ‘ontology’
• ‘near-contemporary’ geography focus, linking back
into history
• geo-X-walk demonstrator due in 2003
X-walk as digital gazetteer
service: use cases
Geo-parsing &
indexing
Information
server
The geoXwalk
Server
Information
server
Searching
Reference use
JISC Information Environment
Content providers
Provision
layer
Shared services
Authentication
Authorisation
Broker/Aggregator
Fusion
layer
geoXwalk
Collect’n Desc
Service Desc
Portal
Portal
Resolver
Portal
Presentation
layer
Inst’n Profile
End-user
Uses of ‘geo-X-walk’ Digital Gazetteer Service
1. As ‘shared service’, enabling other information services to
support full range of spatial searching (query constraints)
• no need to hold all data (at service) to resolve spatial query
• uses co-ordinates and (implicit) spatial relationships to ‘cross-walk’
between geographies
• machine-to-machine (m2m) interaction to ‘shared service’
2. As reference facility for researchers, libraries & museums
•
including means to resolve variant names etc.
3. As online facility to assist metadata creators
Helping to make simple searching more effective
Find me documents on the 'Liverpool docks’
Search terms: subject = “docks”, place = “liverpool”
Using spatial proximity
place search terms become
Liverpool
Bebbington
Birkenhead
Bootle
New Brighton
Seacombe
Seaforth
Waterloo
Supporting cross searching different services
‘Find resources for this postcode’
(NB postcode often used to geo-reference survey data files)
Post code: L34 0HS?
Coordinate footprints
340900,392300 - 347217, 397660
Portal service
Knowsley
Content
Provider A
Place names
BX003
Parish names
Content
Provider B
geoXwalk
Server
Content Provider C
Supporting reference: the “where is?” type of question
Where is Aberdour?
What is the largest town in Aberdeenshire?
What is at grid ref. NY 305 573 ?
List me all places ending with ‘kirk’
What parishes fall within the
Loch Lomond National Park?
Which Roman roads pass
through Scotland?
On what river is Dundee situated?
By what alternative names
has Edinburgh been known?
+ research use to resolve variant names etc.
As online facility to assist metadata creators (1)
• Traditional use of ‘controlled vocabulary’ for ‘found’
place names
but, to be ‘found’, metadata records on objects must
have appropriate geo-referencing
• This is achieved using an ‘analytical’ (geo-coded)
gazetteer
e.g. BLGO
a facility devised by EDINA for the British Library for use in its
NOF-funded ‘A Sense of Place’ activity
– uses 1:50 000 Gazetteer licensed from Ordnance Survey
– presently ‘in test’ at the British Library by over 20 archival
staff
As online facility to assist metadata creators (2)
The task of indexing place names in documents
• Place Names within the digitised pages of the
Statistical Accounts of Scotland (1790 & 1840s)
can be recognised semi-automatically.
[ http://edina.ac.uk/statacc/ ]
• We call this geo-parsing ...
As online facility to assist metadata creators (2)
Need screen shot of parser here
Some Success, but also ‘Current Challenges’
1. Merging geo-names from different scales & from different
sources
1. when place names differ, should all names be regarded as proper!
– do we trust positional accuracy & how do we express confidence?
– how to minimise effort in de-duplication of place(s)?
• places have multiple names, types, and footprints
• need to be able to identify duplicate entries for the same place
2. Presenting geo-names on different occasions?
–
many variant ‘proper’ names, what is preferred?
• what is the ‘name authority body’? - none in the Scotland or the UK
• preferred name varies with location and use and culture
–
–
there are language and character code set issues
standard codes for postal addresses and other geographies
3. There is IPR in metadata; and hence terms & conditions of use
4. There are always service performance issues
Summary Conclusions
1. geographic referencing is needed in the digital library
•
for indexing information objects & for finding out what is where
2. names as words are not enough
•
places need their co-ordinate numbers
3. ‘active’ digital gazetteer services can add value
•
a few initiatives internationally, now beginning to collaborate
•
•
licensing and copyright are serious issues
•
•
common data model, protocols, sharing data & interoperability
particularly if want wider, global access
a variety of interesting technical challenges to tackle
4. EDINA has developed digital gazetteer services for Scotland
 global answer is geo-referencing by co-ordinates
 but to ‘act global’ we must also ‘think local’ …
 local to object, to location, to geographic vocabulary of user
Active Gazetteers offer more than Nominal Gazetters
Contact details
• Authors contactable at:
[email protected] and [email protected]
• For EDINA services contact: http://edina.ac.uk
EDINA, Data Library, University of Edinburgh
[email protected] or telephone +44 (0)131 650 3302
• For information on geoXwalk project:
Dr David Medyckyj-Scott, Project Director
Cressida Chappel, Head of History Data Service
([email protected])
Some Definitions
 Gazetteer - A list of geographic features together with
their associated spatial location
Digital Gazetteer
Digital Gazetteer Service
Review of ADL Gazetteer Content Standard
Geographic Feature ID
Geographic Name
Variant Geographic Name (R)
Type of Geographic Feature (R)
Other Classification Terms (R)
Geographic Feature Code (R)
Spatial Location (R)
Street Address
Related Feature (R)
Description
Geographic Feature Data (R)
Link to Related Source of Information (R)
Supplemental Note
Metadata Information
1.
2.
3.
4.
5.
http://www.alexandria.ucsb.edu/gazetteer
each feature is self-contained making model very flexible
comprehensive description but with small set of core elements
temporal aspects of names, footprints, relationships, …
documents source, spatial accuracy/scale of footprint
permits explicit relationship types!
Beyond ‘Settlement’ Place Names to Ontologies:
Geographic Feature Types
• incorporate dictionary of terms
defining each feature type
• thus, support queries such as
–
–
“What schools exists in Leeds
and where are they?”
“Show lakes in Cornwall”
• hierarchy of feature types
preferred
• propose to adopt the ADL Feature
Type Thesaurus
• some problems… but ADL
acknowledge these
• adapting thesaurus for UK
–
US & UK use words differently
hydrographic features
. aquifers
. bays
. . fjords
. channels
. deltas
. drainage basins
. estuaries
. floodplains
. streams
. . rivers
. . . bends (river)
. . . rapids
. . . waterfalls