Transcript Document
To Boldly Go
PC-Axis Reference Group, Copenhagen, 2014
Central Statistics Office, Cork, Ireland
Kevin Healy , [email protected] (00353 21) 453 5719
Eoin MacCuirc [email protected] (00 353 21) 453 5504
Linked Open Data
The Tower of Babel
“If as one people
speaking the same
language they have
begun to do this,
then nothing they
plan to do will be
impossible for
them. Come, let us
go down and
confuse their
language so they
will not understand
each other.”
Tim Berners Lee – Founder
of the Web
“In an extreme view, the world
can be seen as only connections,
nothing else. We think of a
dictionary as the repository of
meaning, but it defines words
only in terms of other words. I
liked the idea that a piece of
information is really defined only
by what it's related to, and how
it's related. There really is little
else to meaning. The structure is
everything. There are billions of
neurons in our brains, but what
are neurons? Just cells. The brain
has no knowledge until
connections are made between
neurons. All that we know, all that
we are, comes from the way our
neurons are connected.”
How open is the data? - Linked Open
Data star scheme
Tim Berners-Lee suggested a 5-star deployment scheme for Linked Open
Data and Ed Summers provided a nice rendering of it. In the following,
examples are given for each level. The example data used throughout is
'the temperature forecast for Galway, Ireland for the next 3 days':
★
make your stuff available on the Web (whatever format)
under an open license
1 example ...
★★
make it available as structured data (e.g., Excel instead of
image scan of a table)
2 example ...
★★★
use non-proprietary formats (e.g., CSV instead of Excel)
3 example ...
★★★★ use URIs to identify things, so that people can point at your
stuff4 example ...
★★★★★
link your data to other data to provide context 5 example
http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
Linked Open Data cloud
Media
User-generated
Government
Publications
Cross-domain
Geo
http://lod-cloud.net/
Life sciences
Linked open data -The Semantic Web
Copenhagen – 99,100,000 hits
looking for a needle in a haystack
URI – Uniform Resource Identifier
give the thing a name and an address
The following picture shows the desired relationships between a resource
and its representing documents:
Tim’s cool URIs
Cool URIs don't change
What makes a cool URI?
A cool URI is one which does not change.
What sorts of URI change?
URIs don't change: people change them.
It is the the duty of a Webmaster to allocate URIs
which you will be able to stand by in 2 years, in 20
years, in 200 years. This needs thought, and
organization, and commitment.
The Web of Things – The Internet of Things
The Internet of Things is coming, but it needs a semantic backbone to
flourish. With some 25 billion devices expected to be connected to the
Internet by 2015 and 50 billion by 2020, providing interoperability among
the things on the IoT “is one of the most fundamental requirements to
support object addressing, tracking, and discovery as well as information
representation, storage, and exchange.” So write the authors of Semantics
for the Internet of Things: Early Progress and Back to the Future, Payam
Barnaghi and Wei Wang, Centre for Communication Systems Research,
University of Surrey, Guildford, UK and Cory Henson, Kno.e.sis – Ohio
Center of Excellence in Knowledge-enabled Computing.
“The suite of technologies developed in the Semantic Web … such as
ontologies, semantic annotation, Linked Data and semantic Web services
… can be used as principal solutions for the purpose of realizing the IoT,”
they state. “Defining an ontology and using semantic descriptions for data
will make it interoperable for users and stakeholders that share and use
the same ontology.”
Where is the CSO with all this?
•
In partnership with DERI/NUIG/INSIGHT
•
One of the first NSIs in the world to upload
census data as linked open data – data.cso.ie –
Census 2011
•
One of the organisations involved in the EU
Open Cube pilot projects
•
Launched apps4gaps competition
data.cso.ie
Census – Linked Open Data
• 12 million RDF triples from Census
• Geographical entities (counties, cities, etc.)
• Codelists
CSO/NUIG collaboration summary
position
• Most technical work done by students/interns
at NUIG
• CSO supplied data, use cases, and expertise
• Lots of manual work and ad-hoc solutions
• Results not fully “owned” by CSO
• Skills needed to maintain/extend are mostly in
NUIG
18-19 November 2013
OpenCube kick-off meeting
15
Open Cube Project Pilots
18-19 November 2013
OpenCube kick-off meeting
OpenCube Pilots
Pilot
Focus
Tool/platform Data sets
Type of
users
Number of
users
Evaluation
Cycle
DCLG
Publish
Swirrl’s
50-100 open
PublishMyDa datasets
ta
regarding
finance,
planning
Performance,
land use,
housing and
homlessness.
Public
servants
(members of
the DCLG
statistical data
management
team) as well
as
statisticians/
researchers
3-4 members
of the data
management
team and 5
test users
(statisticians,
research
analysts)
2 evaluation
cycles: M9M12 and
M18-M21
Flemish Gov
Publish/ FluidOps’
Reuse
IWB
1100 open
datasets
VRIND
A varied
5-10
audience
ranging from
public
servants to
data scientists
2 evaluation
cycles: M9M12 and
M18-M21
Central
Statistics
Office
Publish/ OpenCube
Reuse
toolkit
2011 Census
dataset &
StatBank
dataset
Public
servants
2 evaluation
cycles: M9M12 and
M18-M21
25 employees
Open Cube business case for the CSO
• Publishing statistics from StatBank as linked
data
• Publishing statistics from StatBank as SDMXML
• Facilitate the creation of general reports
aimed at the general public
• Assist with answering queries from the public
• Help third parties to tell stories with CSO data
CSO goals (independent from
OpenCube)
• Own the data.cso.ie process and technology
– Enable in-house maintenance, changes, etc.
• Publish StatBank* data as Linked Open Data
– Ongoing publication process
– Adhering to release schedule is critical
– Publish data that are regularly updated (monthly,
quarterly, annual) as linked open data ( Census 2011 static
data)
*StatBank is the CSO published time series database (PC Axis)
• Deploy tools that enable analytics and exploitation of
linked data
– Both internally and externally
The Role of the CSO in the Future of
Linked Data in Ireland
As the technology trends that drive adoption of Linked Data continue further, and the
importance of Open Data increases, the CSO is well-positioned to play a leading role as a “hub”
in the Irish data Web.
Some key steps include:
1. Proactively encourage the adoption of standard classifications and metadata
for Open Data that are published by different public bodies within Ireland. The CSO is
already documenting classifications on its StatCentral (Portal) website, and has more experience
in disseminating data on the Web than perhaps any other organisations in the public
sector. Ideally, the classifications themselves would be published as Linked Data.
2. Going beyond pure classifications, encourage the use of standard identifiers (URIs) for
geographical areas.
3. Support Linked Data as a new dissemination format for the CSO StatBank. Key
economic and demographic statistics are necessary in all sorts of data analysis tasks,
and ideally they should be published as Linked Data directly by the source (CSO).
Application Programming Interface (API)
StatBank API
StatBank API – by theme
StatBank API – Download
http://www.cso.ie/StatbankServices/StatbankServices.svc/jsonservice/responseinstance/AAA01
Key Indicators , quick tables and multi-quick tables
Key Economic Indicators
http://www.cso.ie/indicators/Maintable.aspx
Quicktables
http://www.cso.ie/Quicktables/GetQuickTables.aspx?FileName=CNA13.asp&TableName=Population+1901+-+2011&StatisticalProduct=DB_CN
Multi-quicktables
http://www.cso.ie/multiquicktables/quickTables.aspx?id=qnq34
Public Sector Statistics Network (PSSN)
PSSN – Organisations hosted
OGP as a driver
http://www.ogpireland.ie/
data.gov.ie – Irish OGP portal
http://data.gov.ie/dataset
Context and Impact Indicators
CSO - Context and Impact
Context and Impact Indicators
2011
2012
2013
238
306
304
Visits
2,387,000
2,303,441
2,718,287
Page views
10,070,000
13,997,031
17,034,035
Downloaded files
1,539,000
1,733,833
1,856,176
400,400
1,042,750
1,282,674
Visits
131,400
158,117
179,527
Page views
300,200
418,564
451,788
3,030
5,644
8,548
-28%
-4.7%
n/a
Printed output
No. of releases and publications
Online output – CSO website
StatBank table accesses
Online output – StatCentral site
Publication of statistics on social media
Followers (at year-end)
Burden Reduction
Annual reduction in statistical burden on business
Questions?