Transcript Title

A Sense of Place Developing a Gazetteer Service

Peter Burnhill

Director, EDINA, JISC National Data Centre

David Medyckyj-Scott

Manager, Geo-data Services, EDINA

OCLC SCURL Pre-IFLA Conference 2002 Edinburgh

Overview

• context    Internet & Digital Libraries & GIS EDINA & the JISC Information Environment why geographic referencing?

• gazetteer models & spatial co-ordinates • progress towards digital gazetteer services – geoXwalk & other projects • summary - conclusions

Internet, Digital Libraries & GIS

1. the Big Issues – Metadata & Interoperability – Naming, Identifiers & Authority Files – Ontologies – Shared services 2. Information Science – Mix of document & computation traditions Michael Buckland, ‘The Landscape Of Information Science’ JASIS Special Issue "JASIS at 50", Wiley 1999. (‘Presidential Address’

)

3. Subject-matter methodology • Geographic Information Systems – deconstructing the Map: both as database & as display device – referencing: the cartographic trick ‘surface of sphere to flat paper/screen’

EDINA

• a JISC National Data Centre, 1995 – hosted by Edinburgh University Data Library, 1984 • mission...

to enhance productivity of research, learning and teaching in UK higher & further education • major provider within the JISC Information Environment – range of bibliographic resources – launching sound and picture studio – key geo-spatial data and geo-referenced information • • UKBORDERS (1994 - ) boundary outlines & geo- reference database Digimap (2000 -) online source of Ordnance Survey mapping • strategic move toward interoperability & shared services role – adoption of appropriate standards

The JISC Information Environment is…

• variously stated as … – a national digital library... for UK higher and further education – a managed collection of quality assured resources – a distributed resource supporting learning and research in the UK • definitely heterogeneous – ‘words, numbers, pictures, sound’: including geo-spatial data • for use by researchers, students, teachers & support staff • based on an underlying functional model – – simplified to: search -> obtain -> use -> publish {discover/locate} {request/access} {view/copy/amend/combine} {publish} • now to have location-based searching – requiring geo-referencing of information objects

So, what is geo-referencing? What are geo-data?

Geo-spatial data “data that have some form of spatial or geo graphic reference that enables them to be located in two- or three-dimensional space” Statistical Account of Scotland

NUMBER XIII.

PARISH OF CULLEN.

(COUNTY OF BANFF, SYNOD OF ABERDEEN, PRESBYTERY OF FORDYCE.) By the Rev. Mr. ROBERT GRANT.

Royalty, Extent, Climate, etc.

CULLEN, as appears from old charters, was originally called Inverculan, because it stands upon the bank of the Burn of Cullen, which, at the N. end of the town, falls into the sea: but now it is known by the name of Cullen on ly. Cullen is a royal burgh, formerly a constabulary, of which the Earl of Findlater was hereditary constable. The set, as it is called, of the council, consists of 19, in which num ber are included the Earl of Findlater, hereditary preses, 3 bailies, a treasurer, a dean-of-guild, and 13 counsellors. The parish extends from the sea fouthward, about 2 English miles in length.

Geo-referencing: that’s what’s special about the spatial

• subject content most often referenced by topic … … but much (80%?) can be referenced to specific geographic places • broad disciplinary base for more powerful geographic searching – across the social, life & physical sciences as well as the humanities – also from libraries, archives and museums – now from digital libraries, service providers & data providers • geo-referencing thus a way of viewing information content: – subject, people, place and time

Some Definitions

 Gazetteer A list of geographic features together with their associated spatial location Digital Gazetteer Digital Gazetteer Service

Models of Gazetteer(1): Place Name Vocabularies

• simple list of place names has many problems e.g. non-uniqueness • common form is {name, location} "index" in atlas or "geographical dictionary" • ‘location’ field often has name of larger area that ‘contains’ the place but even then the name may still not be unique Barrow Street Barrow upon Soar Barrow upon Trent Barrowby Barton, Ches.

Barrowden Barton, Devon, Barrowford Barton, Glos, Barry Barton, Lancs (2) Barsby Barton, N. Yorks Barthlow Barton, Warks Barton (8) Barton Bendish

‘The Nominal Gazetteer’

Hierarchical Thesaurus: (

part of the ‘Document Tradition’) United Kingdom………………………… (nation) England …………………………..(country) Devon………………………….. (county) Barton………………………………..

Comment:  one type of simple relationship between entries is exploited  entries ordered from very general to very specific (BT, NT)   can efficiently determine what a given area contains normally structured to handle alternative names (SY) X rigid structure, one view only, typically geo-political entities can belong in many hierarchies and new relationships evolve X names may still not be unique X cannot deal with spatial proximity / contiguity X no way to relate to other geographies, e.g. postcodes X lack of simple hierarchies in UK (and other ‘old’) geographies …

Boundaries in Fife, Scotland

Pause, to ponder the puzzle of place …

1. places can be defined in space (as an ‘area’, not a single ‘point’) – a named feature, e.g. Lake Geneva – a space taken for human settlement, e.g. Edinburgh and those areas change over time, can be fuzzy, or even poetic 2. names of places are not unique, nor persistent, and have considerable cultural ‘baggage’ – a given place can have more than one proper name • different languages • alternative contemporary and historic names, even within a given language Auchterderran, Fife, Scotland has 21 alternative names or name spellings e.g. Auchterderay, Ochtirderay, Urchan, Hurkyndorath   geo-referencing means more, requires more, than a controlled vocabulary of (place) names geography is global, but naming is local

Getting geographic (1): Being coordinated

• how should we geo-reference?

– with a co-ordinate system that can be related to a specific position or location on the earth's surface • geographic co-ordinates allow places to be represented by the appropriate footprint – settlements, lakes as areas; roads, rivers as lines; stations as point • and offer

persistence ,

regardless of name, political boundary or other changes and a

consistent

framework for spatial queries • geographic co-ordinates allow proximate places, those close to one another, to be identified – appropriate geo-referencing thus ‘enriches’ textual description • as ever, not everyone uses the same standard spatial coding scheme – systems that relate to geo-graphic (Cartesian) coordinates are the preferred metadata of choice, providing opportunity for ‘cross-walk’

Getting geographic (2): Models of Gazetteers(2)

• Simple use of a geo-spatial reference in the location field: the National Grid Barton (540620, 255780) Barton (344880, 354210) Barton (410080, 225320) Barton (351580, 437670) Barton (335223, 409318) Barton (423170, 508880) Barton (290950, 67220) Barton (410849, 251111) Barrow Street Barrow upon Soar Barrow upon Trent Barrowby Barrowden Barrowford Barry Barsby Barthlow Barton (8) Barton Bendish

Towards ‘The Active Gazetteer’

Task: Find books about 'Liverpool docks’ Search using a nominal gazetteer might yield:

co-ordinates allow (near) co-located places to be co-identified.

Using spatial proximity in an active gazetteer, the search can be widened: Place Liverpool Bebbington Birkenhead Bootle New Brighton Seacombe Seaforth Waterloo County/UA Liverpool Wirral Wirral Sefton Wirral Wirral Wirral Sefton … that means more & better hits …. !!!

Definitions

 Gazetteer A list of geographic features together with their associated spatial location  Digital Gazetteer An electronic list of geographic features together with their associated spatial location

An authority database of places (and features?) An ‘Active Gazetteer”

 Digital Gazetteer Service A network-addressable middle ware server supporting geographic referencing and searching. A shared ‘terminology’ service.

The geoXwalk project

• funded under JISC DNER Development Programme – builds on scoping study – aims to develop a demonstrator gazetteer service suitable for extension to full service. • time-frame: 1 June 2002 - 31 May 2003 • project partners: EDINA and History Data Service • similar to the ADL approach (Linda Hill et al) • 4 ‘M’s: – Metadata model – Multi-source – Multi-scale – Multi-problem!

• focus on ‘near contemporary’ geography, with links into history

JISC Information Environment

Content providers Shared services Authentication Authorisation

geoXwalk

Collect’n Desc Service Desc Resolver Inst’n Profile Provision layer Fusion layer

Portal Broker/Aggregator Portal Portal

End-user

Presentation layer

One of several Digital Gazetteer initiatives

extant digital gazetteer services

• USGS Geographic Name Information Systems • Canadian Geographical Names Data Base • GEOnet Names Server (NIMA) • Getty Thesaurus of Geographic Names • Columbia Gazetteer of the World Online

projects from which digital gazetteer services might spring

• Open GIS Consortium Geospatial Fusion Services Testbed Geocoder, Gazetteer and Geoparser services • Alexandria Digital Library Gazetteer Development * • Electronic Cultural Atlas Initiative - ECAI *

• geoXwalk, JISC-funded collaborative project *

* Presentation to Workshop on ‘Digital Gazetteers’ at ACM/IEEE-CS Joint Conference on Digital Libraries 2002, Portland, Oregon, USA, 14-18 July 2001

Review of ADL Gazetteer Content Standard

Geographic Feature ID Geographic Name

Variant Geographic Name (R)

Type of Geographic Feature (R)

Other Classification Terms (R) Geographic Feature Code (R)

Spatial Location (R)

Street Address Related Feature (R) Description Geographic Feature Data (R) Link to Related Source of Information (R) Supplemental Note Metadata Information http://www.alexandria.ucsb.edu/gazetteer 1. each feature is self-contained making model very flexible 2. comprehensive description but with small set of core elements 3. temporal aspects of names, footprints, relationships, … 4. documents source, spatial accuracy/scale of footprint 5. permits explicit relationship types!

Beyond ‘Settlement’ Place Names to Ontologies: Geographic Feature Types

• incorporate dictionary of terms defining each feature type • thus, support queries such as – “What schools exists in Leeds and where are they?” – “Show lakes in Cornwall” • hierarchy of feature types preferred • propose to adopt the ADL Feature Type Thesaurus • some problems… but ADL acknowledge these • adapting thesaurus for UK – US & UK use words differently

hydrographic features . aquifers . bays . . fjords . channels . deltas . drainage basins . estuaries . floodplains . streams . . rivers . . . bends (river) . . . rapids . . . waterfalls

geoXwalk use cases Geo-parsing & indexing

The geoXwalk Server

Information server Information server

Searching Reference use

geoXwalk use case 1: reference use Answering the 'where is?' type of question

Where is Ormskirk?

What is the county town of Shropshire?

What is at grid ref. NT 258 728?

List me all places ending with ‘chester’ What parishes fall within the Lake District National Park?

On what river is Liverpool situated?

Which Roman roads pass through Leicestershire?

By what alternative names has York been known?

geoXwalk use case 2: simple searching Find me books about the 'Liverpool docks’

Search terms: subject = “docks”, place = “liverpool”

Using spatial proximity place search terms become

Liverpool Bebbington Birkenhead Bootle New Brighton Seacombe Seaforth Waterloo

geoXwalk use case 3: simple cross searching ‘Find resources for this postcode’

(NB postcode often used to geo-reference survey data files) Post code: L34 0HS? Coordinate footprints 340900,392300 - 347217, 397660

Portal service

Knowsley Content Provider A

Place names BX003 Parish names

geoXwalk Server

Content Provider B Content Provider C

geoXwalk use case 4:(semi) automatic indexing Need screen shot of parser here

Demo

Use summary for geoXwalk

1. as reference source for researchers, libraries and museums 2. to assist metadata creators – – – converting different geographic identifiers to standard coding scheme (e.g. BL-led, NOF-funded ‘A Sense of Place’ project) geo-parser for semi-automatic indexing (e.g. Statistical Account) facilities to resolve variant names etc.

3. to provide information services with means to support full range of spatial searching (query constraints) – – – no need to hold all data (at service) to resolve spatial query can use implicit spatial relationships to ‘cross-walk’ between geographies machine-to-machine (m2m) interaction to ‘shared service’

(Some of the) Challenges

1. merging data from different sources at different scales which is correct?

– positional accuracy and confidence of answers – de-duplication of place(s) • places have multiple names, types, and footprints • need to be able to identify duplicate entries for the same place 2. which place name for given occasion?

– many variant ‘proper’ names, what is preferred? • authorised naming bodies - none in the UK • preferred name varies with location and use and culture – language and character code set issues – standard codes for postal addresses and other geographies 3. IPR in metadata; terms and conditions of use 4. service performance issues

Summary Conclusions - From Nominal to Active Gazetteers

1. geographic referencing is needed in the digital library • for indexing information objects & for finding out what is where 2. names as words are not enough • places need their co-ordinate numbers 3. digital gazetteer services as network of shared services in information infrastructure • • • a few initiatives internationally, now beginning to collaborate • common data model, protocols • sharing of data and interoperability licensing and copyright are serious issues • particularly if want wider, global access a variety of interesting technical challenges to tackle 4. geo-X-walk demonstrator due in 2003 5. need to agree to ‘act globally’ but to ‘think local’ … • local to object, to location, to geographic vocabulary of user

Contact details

• Authors contactable at:

[email protected]

and [email protected]

• For EDINA services contact: http://edina.ac.uk

EDINA, Data Library, University of Edinburgh [email protected] or telephone +44 (0)131 650 3302

• For information on geoXwalk project: Dr David Medyckyj-Scott, Project Director Cressida Chappel, Head of History Data Service

([email protected])