Opening data - The National Archives

Download Report

Transcript Opening data - The National Archives

Linked Government Data John Sheridan

2 March 2011

“With linked data, when you have some of it, you can find other, related, data.”

Tim Berners-Lee, “Linked Data Design Issues”

http://www.w3.org/DesignIssues/LinkedData.html

Henry Maudslay (1771 –1831)

He also developed the first industrially practical screw-cutting lathe in 1800, allowing standardisation of screw thread sizes for the first time. This allowed the concept of interchangeability (a idea that was already taking hold) to be practically applied to nuts and bolts. Before this, all nuts and bolts had to be made as matching pairs only. This meant that when machines were disassembled, careful account had to be kept of the matching nuts and bolts ready for when reassembly took place. http://en.wikipedia.org/wiki/Henry_Maudslay

5

Five stars

* ** *** **** ***** make your stuff available on the Web (whatever format) under an open licence make it available as structured data (e.g., Excel instead of image scan of a table) use non-proprietary formats (e.g., CSV instead of Excel) use URIs to identify things, so that people can point at your stuff link your data to other data to provide context

6

Three projects

• data.gov.uk

o Supporting the transparency agenda with Linked Data • legislation.gov.uk

o First step towards a Linked Data Statute Book • nationalarchives.gov.uk

o Semantic Knowledge Base for the Web Archive

“The Government believes that we need to throw open the doors of public bodies, to enable the public to hold politicians and public bodies to account.”

The Coalition Agreement.

“We will ensure that all data published by public bodies is published in an open and standardised format, so that it can be used easily and with minimal cost by third parties.”

The Coalition Agreement.

We are:

developing standards for responsible publishing

of key types of data (financial data, organisation data, aggregate statistics, location data) •

developing guidance, practices and tools

that make it easy to publish data in Linked Data form, at low cost • making it

easy for people to consume data

in a programmatic way (the Linked Data API as well as native Linked Data techniques such as the provision of SPARQL Endpoints)

10

STANDARDS

Director (Operations) Director General Director (Strategy) A B C D E

2008

1,345 2,112 2,345 6,342 7,435

2009

1,456 3,543 2,987 6,256 7,432

2010

2,301 2,111 2,455 6,123 8,102 Deputy Director (A) Deputy Director (A)

Transaction

A-1263 A-1264 A-1265 A-1266 A-1267

Date

09/09/2010 09/09/2010 09/09/2010 09/09/2010 09/09/2010

Supplier

Spottiswoode & Co JSB & Sons BLG Ltd Spottiswoode & Co BLG Ltd

Amount

£ 2,345 £ 2,111 £ 2,455 £ 6,123 £ 8,102

12

Standards

• Re-use where we can, create where we must • Small, high level, light weight vocabularies o Examples include datacube, organization, provenance • Create local specialisations o Examples include payments, central-government • Post hoc linking

13

DATA

http://reference.data.gov.uk/id/day/2011-01-13 http://reference.data.gov.uk/id/department/CO http://transport.data.gov.uk/id/station/WAT http://education.data.gov.uk/id/school/341451 http://location.data.gov.uk/id/3245677362123 http://www.legislation.gov.uk/id/ukpga/2009/12/section/2

15

PRODUCTION

16

Gridworks (Google Refine)

17

Gridworks: map and export Linked Data

18

PUBLICATION

19

20

Linked Data API

• Open Standard • Generic approach for creating APIs from Linked Data • Sits on top of a Linked Data store • Several implementations, most mature is Puelia • Examples for education and transport • Also, organisations, payments information

21

22

And wouldn’t it be cool if we had…

UNAMBIGUOUS DEFINITIONS

26

27

Legislation as data

• Three considerations for legislation as data o Typographic layout o o Semantics • Semantic representation using RDF and Linked Data o Versioning / changes over time URIs for things o RDF data model o subject - property - object • Requires granular URIs to name things o o o Identifier Document Representation

28

“A” changes “B” when “C” says so

29

“A” changes “B” when “C” says so

30

“A” changes “B” when “C” says so

Academies Act 2010 Section 19 (2) Confers power Secretary of State Makes Academies Act 2010 Section 12 (4) Commences Inserts text into Charities Act 1993 Schedule 2 (ca) SI 2010/1937 Schedule 3

31

Legislation URIs

• Identifier o http://www.legislation.gov.uk/id/{type}/{year}/{number}/section/{number} o eg http://www.legislation.gov.uk/id/ukpga/2010/32/section/12/4 • Document o http://www.legislation.gov.uk/{type}/{year}/{number}/section/{number} o eg http://www.legislation.gov.uk/ukpga/2010/32/section/12#section-12-4 • Representations o o /data.xml

/data.xht

o o o /data.pdf

/data.rdf

and for any list, /data.feed

32

Legislation URIs, time and extents

• Identifier o http://www.legislation.gov.uk/id/{type}/{year}/{number}/section/{number} • eg http://www.legislation.gov.uk/id/ukpga/2010/32/section/12/4 • Document versions o In force • eg http://www.legislation.gov.uk/ukpga/2010/32/section/12 o Prospective • eg http://www.legislation.gov.uk/ukpga/2010/32/section/12/prospective o Point in time • eg http://www.legislation.gov.uk/ukpga/2010/32/section/12/2010-12-01 o Extents • eg http://www.legislation.gov.uk/ukpga/2010/32/section/12/england • eg http://www.legislation.gov.uk/ukpga/2010/32/section/12/scotland

33

34

35

Web Archive - Semantic Knowledge Base

• The National Archives operates the UK Government Website Archive • Second most used web archive in the world • Links to withdrawn documents are maintained – preserving wide variety of information, from datasets to documents and press releases • Web archives are notoriously difficult to search using standard search technology – size, number of duplicates • Procured SKB, competitive process • Solution being delivered by a consortium (technologies from Ontotext, University of Sheffield)