Transcript Document

‘aggregation as a tactic’ - to support
discovery
Peter Burnhill & Stuart Macdonald
EDINA national data centre
University of Edinburgh
CERN workshop on Innovations in Scholarly Communication (OAI7)
University of Geneva, 23 June 2011
Context
RDTF Vision:
The joint JISC / RLUK Resource Discovery Task Force (RDTF) Vision:
“UK researchers and students will have easy, flexible, and ongoing access to
content and services through a collaborative, aggregated and integrated resource
discovery and delivery framework which is comprehensive, open and sustainable ”
Making content more discoverable both by people and machine via a
mixed economy of technological solutions.
The Discovery Initiative aims to:
•
•
•
•
Engage stakeholders across libraries, archives and museums
Build critical mass of open content to inspire others to participate
Encourage development of ‘purposeful aggregations and compelling
applications’ - mashing at the macro-level
Exemplify what can be done across domains to free data and explore how to
make that data work harder
No one-size fits all solution!
‘aggregation as a tactic’ - a phrase coined to end an
an impasse during a meeting to discuss technical aspects of the RDTF Vision
statement to identify stakeholder groups
Key concept in RDTF Vision is aggregation, directly or represented through metadata
– to unlock the online & digital riches held in our organisations
‘Regard aggregation as intervention to exploit the telematic opportunity for things
[that] are 'remote, digital & published’ - a phrase derived from an IASSIST conference
in 1990 exploring what it meant with the Internet if we regarded all [content] as
‘remote and published’.
The Web in mid-1990s simplified and thus improved
Unfortunately, even now, much which is online and on the Web is badly or
inadequately published …
We have to improve, re-interpreting what it means to be ‘well-published’
The term aggregation is used a lot in computer science for:
•
•
“objects … assembled or configured together to create a more complex
object” UML, IBM
“aggregating resources based on … properties. … they are owl:sameAs and
their other properties can be intermixed.”
For purposes of RDTF aggregation means:
•
•
an assembly of data sources
– more than a collection of objects (image banks, data services, catalogues,
activity data) – related or otherwise
for machine-as-user – independent of presentation layer
However aggregation is not a goal nor an end in itself - It is an
intervention to be used for a twofold strategic purpose:
•
‘improvement’ - merge & match, customisation and consumption, multiple
output formats, reduce duplication of effort
•
‘discoverability’ – via ‘promiscuous’ or ‘well-dressed’ metadata through e.g.
Google or tailored services
Language & Perspectives
Digital Library has mixed parentage - a ‘re-mix’ of the document
tradition & the computation tradition
•
“approaches based on a concern with documents, with signifying
records: archives, bibliography, documentation, librarianship, records
management, and the like …
[Content Provider speak]
•
“approaches based on uses of formal techniques, whether mechanical
(such as punch cards and data-processing equipment) or
mathematical/computational (as in algorithmic procedures).”
[Developer speak]
Prof. Michael Buckland,
Presidential Address, American Society for Information Science,
JASIS’s 50th (1998)
http://people.ischool.berkeley.edu/~buckland/asis62.html
Perspectives … as provider
•
EDINA - develops and delivers JISC-sponsored national online services
– adding value to data and content
• Digimap Collections (OS mapping; SeaZone; BGS)
• NewsfilmOnline (various; digitised with JISC £)
• UK Access Management Federation (institutions; authentication)
•
Data Library – move from support to middle folk
• Research data support for Edinburgh researchers
• Research data management guidelines, training, OER materials
• Edinburgh DataShare – open data repository
• RADAR – Researching A Data Asset Registry
•
Maybe as ‘middle folk’ - c.f. those who deal in middleware
• sometimes having the role of creator and supplier of some service
• sometimes being the user of what others supply
• ‘inter-operator’
Perspective … as aggregator:
developing and delivering JISC-sponsored aggregation
services
• JISCMediahub - links to collections & hosted content (c. 1m
resources)
CultureGrid; First World War Poetry; Films of Scotland; Getty images (all
content searchable and viewable within JISC Media Hub)
• GoGeo! - metadata registry for spatially-referenced data
Geodoc Metadata creation tool, ShareGeo Open
• SUNCAT– serials union catalogue: 80 libraries
metadata/links to full text, download MARC records (& XML & SUTRS - Simple
Unstructured Text Record Syntax - data exchange format widely used in
Z39.50)
• PEPRS - e-journal preservation registry jointly led by EDINA with the
ISSN International Centre
metadata registry of available back copy e-journals - aggregated from
preservation agencies (incl. British Library, UK LOCKSS Alliance, CLOCKSS)
Some RDTF-related projects @ EDINA
•
GOgeo Linked Data (GOLD) – triplify INSPIRE compliant metadata
•
SUNCAT: Exploring Open [bibliographic] Metadata (working with
•
Sharing OpenURL Activity Data - monthly usage data: date &
to – improve discoverability of metadata records via search engines
OKF to open up data sent by contributing libraries – convert to RDF)
time; anonymised IP address/inst. ID; title; author; ISSN, DOI
Uses – article/journal recommendations, publishers reviewing
what content is of interest to specific communities, innovative
services to meet users’ needs
•
CHALICE – Use data mining to extract placenames from the
•
AddressingHistory – Geo-parsing of Scottish Post Office
•
3 further case studies on other EDINA services illustrating how
other collections can benefit from the same techniques.
English Place Name Survey to create a UK historic gazetteer
published as Linked Data & link it to the Geonames ontology on the
semantic web.
Directories, API onto digitised content, output in XML, CSV, JSON
The end is the start of a new beginning …
•
In earlier ‘web time’ we had the MODELS ‘user-verbs’:
Discover -> Locate -> Request -> Access (Deliver)
Dempsey, Russell & Murray (1999)
http://www.ukoln.ac.uk/dlis/models/publications/utopia/
where Access was the end game for us ‘middle folk’ even if the
beginning & part of a deeper process for researchers, students …
•
Now there is call for more than bilateral & negotiated interoperability,
where Access is the beginning for developers and for other services
•
RDF/Linked Data enables information to be shared in a more Web-friendly
way
•
RDF/Linked Data enables structure and content of those data sources to
be explicit - vocabularies, ontologies, relationships
Exposing the complexity and relationship in the underlying data,
hanging the insides on the outside!
The treasures are on show inside, but …
Centre
Pompidou
10
… and so to summarise..
Early web approaches focused on making content accessible for humans
•
hiding the complexity and relationship in the underlying data
•
paying attention to the user interface: HCI & GUI; Usability and Accessibility
However to ensure content gets noticed it must be made easier for machines to
understand by:
•
exposing the complexity and relationship in the underlying data
•
having in mind the machine-as-user: API as well as HCI
Aggregation should be seen as intervention, with strategic purpose:
1. to engage in value-added improvement of content
2. to enhance the discoverability of that which is ‘aggregated’
• to be a focus of attention (thro’ promiscuous metadata!)
If it is with RDF, then that’s good don’t make a fuss if not
• Publish RDBMS schemas, catalogue records, codebooks, and
ancillary or related content in multiple, machine-readable formats
The Many Minds principle
“the coolest thing to do with your data will be thought of by someone else“
Using data as the building platform
Jo Walsh & Rufus Pollock (2007-05-17). Open Data and Componentization. XTech 2007 (slide 14)
"Benefits of freeing data are many, arguably being the most relevant one
the “Many Minds principle”: there’ll always be someone that will find out
a way to reuse data that you wouldn’t have even figured.“
José Manuel Alonso, Notes from the 5th Internet, Law and Politics Conference: The Pros and Cons of Social Networking Sites, organized by the Open
University of Catalonia, School of Law and Political Science, and held in Barcelona, Spain, on July 6th and 7th, 2009.
THANK YOU
[email protected]
[email protected]
http://edina.ac.uk/
Repository Fringe 2011 –
call for participants:
http://www.repositoryfringe.org/
CC BY-NC-ND 2.0 - image by enggul courtesy of Flickr –
http://www.flickr.com/photos/enggul/2361808668/