Pliny: what is it for - Bliss bibliographic classification

Download Report

Transcript Pliny: what is it for - Bliss bibliographic classification

Annual Bliss Classification Association Lecture
19 April 2013
Facets and structured
research data in the Digital
Humanities
John Bradley and Michele Pasin
[email protected]
[email protected]
1
Annual Bliss Classification Association Lecture
19 April 2013
Facetted modelling in the
dynamic world of
structured humanities
scholarship
John Bradley
2
Structured Data DDH projects














Bpi1700: (British Printed Images before 1700) http://www.bpi1700.org.uk
CVMA: (Corpus Vitrearum Medii Aevi—Medieval stained glass)
http://www.cvma.ac.uk
PBE: Prosopography of the Byzantine Empire (published on CD)
PBW: Prosopography of the Byzantine World http://www.pbw.kcl.ac.uk
PASE: (Prosopography of Anglo-saxon England) http://www.pase.ac.uk
CCEd: Clergy of the Church of England:
http://www.theclergydatabase.org.uk
EMLoT: Early Modern London Theatres: http://www.emlot.kcl.ac.uk.
DIAMM: Digital Image Archive of Medieval Music:
http://www.diamm.ac.uk/index.html
OCVE: Online Chopin Variorum Edition: http://www.ocve.org.uk/index.html
PoMS: Paradox of Medieval Scotland: http://www.poms.ac.uk
Art of Making: http://www.artofmaking.ac.uk
Records of Early English Drama
The Making of Charlemagne’s Europe
The Breaking of Britain
3
Text in, Text out
Source
Source
Primary
Sources
Historical
Research
Article
Article
Article
Article
Book /
Article
Source
Second’ry
Source
“the historical work as what it most manifestly is:
a verbal structure in the form of a narrative
prose discourse” Hayden White (1973), quoted in
Jörn Rüsen (1987). “Historical Narration: Foundation,
Types, Reason” in History and Theory Vol 26 No 44
Historians and narrative text

“Multiplicity is inherent in the word-narratives used to
communicate history. Words are complex forms of
information; they have 'halos of meaning', making them
wonderfully evocative but imprecise and slippery. [...]
Historians embrace this range of meanings. We prefer
the medium of words and narratives because it permits
us to represent the past as multidimensional, complex,
and nonlinear, even though structurally our prose and
our logic are sequential.”

David J. Bodenhamer (2008), "History and GIS: Implications for the
Discipline", in Anne Kelly Knowles (ed.) (2008). Placing History:
How Maps, Spatial Data, and the GIS Are Changing Historical
Scholarship. Redlands, CA: ESRI Press. p. 224
5
Structured Data: Appropriate to the
Humanities?



“humanistic inquiry reveals itself as an activity
fundamentally dependent upon the location of
pattern.”
“Of all the technologies in use among computing
humanists, databases are perhaps the best suited to
facilitating and exploiting [pattern].”
“To build a database one must be willing to move
from the forest to the trees and back again; to use a
database is to reap the benefits of the enhanced
vision which the system affords.”

Stephen Ramsay (2004). “Databases” in A Companion to
Digital Humanities”.
6
Appropriate to the Humanities?



the underlying ontology [that a database represents] has
considerable intellectual value.
A well-designed database that contains information
about people, buildings, and events in New York City
contains not static information, but an entire set of
ontological relations capable of generating statements
about a domain.
A truly relational database, in other words, contains not
merely "Central Park", "Frederick Law Olmstead", and
"1857", but a far more suggestive string of logical
relationships (e.g., "Frederick Law Olmstead submitted
his design for Central Park in New York during 1857").

(from Steven Ramsay, “Databases” in A Companion to Digital Humanities”)
7
Three example projects
BPI1700: British Printed Images before
1700
 EMLoT: Early Modern London Theatres
 PASE: Prosopography of Anglo-Saxon
England

9
bpi1700

“This website, funded by the Arts and
Humanities Research Council, makes
available a database of thousands of
prints and book illustrations from early
modern Britain in fully-searchable form.”
(from bpi1700 webpage)
 Online
version gives free access to all these
images
10
Purpose behind bpi1700

11
“Printed images are striking and revealing, potentially
serving a wide range of illustrative and interpretative
uses. They range from high art to crude satire, and
significant conclusions can be drawn from their
circulation and consumption about the culture of their
period. Yet they are surprisingly little used by
researchers, partly because they are currently difficult to
access. This project seeks to rectify this by making a
comprehensive collection of early modern British prints
available online, and by promoting research on their
relationship to their milieu.” (proposal to the AHRC)
BPI1700 online
12
13
Bpi1700 DB structure overview
Person
Appears in
Work
Created by
Producer
Represented in
Subject
Created using
Has these
Role in production
State
Producer Type
Technique
Evidenced by
Impression
Other data
FRBR?
14
The “real” DB structure
Impression
data: from
Merlin and
the V&A
Bpi1700 added
value: Works
and subject
index
15
Early Modern London Theatres

Transmission and understanding: “Most of what we know about the early
London theatres, which developed before, during and shortly after the life of
Shakespeare, has been passed down to us through a complex process of
filtration. Documents written at the time have been selected, copied,
adapted, and interpreted over subsequent centuries, and that process has
shaped our understanding. In turn, what we do with this received
information will determine how future generations view the early theatres.”

"EMLoT lets you see what direct use has been made, over the last four
centuries, of pre-1642 documents related to professional performance in
purpose-built theatres and other permanent structures in the London area.
[...] It tells you who used them, and when, and where you can find evidence
of that use. It also gives you some access to what was used, because it
includes a brief account (or ‘abstract’) of the transcription’s contents,
together with a reference to the location of the original document.“ (from the
EMLoT website)
16
Elements of the EMLoT
structure
2ndary Source
Primary Source
Document
Auspices
Privy Council
Office of the Revels
Court of Requests
Lord Chamberlain’s
Office
[...]
Source
playhouse context
court case
playhouse business
payment
player context
[...]
Event Type
Transcription
Record
Troupe
Admiral’s Men
Queen’s Men
Worchester’s Men
King’s Men
Oxford’s Boys
[...]
Event
Venue
Globe Theatre
Fortune
Blackfriars
Bel Savage
Boar’s Heads
[...]
17
Structured Prosopographical projects






PBE: Prosopography of the Byzantine Empire (published on CD)
PBW: Prosopography of the Byzantine World http://www.pbw.kcl.ac.uk
PASE: (Prosopography of Anglo-saxon England) http://www.pase.ac.uk
CCEd: Clergy of the Church of England:
http://www.theclergydatabase.org.uk
Breaking of Britain
 PoMS: Paradox of Medieval Scotland: http://www.poms.ac.uk
 PoNE: People of Northern England Database
The Making of Charlemagne’s Europe
18
A “Source Assertion”
An assertion made by the project team that a source "S" at
reference “R" states something ("F") about a person or
persons ("P")
19
Core structure for DDH’s
Prosopographical databases
Assertion Type
Typed by
Person
Connected to
Authority Lists
Assertion
Instance of
Appears in
Connected to
Connected to
Source
Location
Possession
20
PASE’s “real” structure
Person
Factoid
Types
Assertion (Factoid)
Authority Lists
Possesion
Place
Sources
21
Marriages in PASE
22
Facetted Thinking and structure

Facetted Classification:
 An
approach to organise a body of materials
using facetted principles.

Facetted Browsing:
 The
exploiting of facets to facilitate the
exploration of a body of materials.
23
Facetted Browsing Principles

"Remember the purpose of the
classification and the users. Who will use
it? Why? Will they search it, browse it, or
both? How well do they know the subject?
Always remember it is meant for them to
use.“

Denton 2003: “How to make a Facet Classification and put it on the Web” referencing Kwasnick,
Barbara H. 1999. The role of classification in knowledge representation and discovery. Library
Trends 48 (1): 22-47.
24
Faceted classification:
advantages

"Kwasnick (1999, 40-42) lists several things in favour of
faceted classifications: they do not require complete
knowledge of the entities or their relationships; they are
hospitable (can accommodate new entities easily); they
are flexible; they are expressive; they can be ad hoc and
free-form; and they allow many different perspectives on
and approaches to the things classified.“

Denton 2003.
25
Searching the Clergy of the
Church of England database
26
WWW Facetted browsing
principles
1.
2.
3.
4.

The user should not be able to form a query
that is known to have no results.
Users must always know where they are in the
classification
Users must always be able to refine their query
or adjust their navigation to see what is nearby
in the classification
The URL is the notation of the classification.
Denton 2003
27
Facetted Browsing in bpi1700
Facets
28
Selected Works
Facets in PASE
29
Facets in Early Modern London
Theatres (EMLoT)
30
Facets in Early Modern London
Theatres (EMLoT)
31
Facets in Early Modern London
Theatres (EMLoT)
32
Faceted classification: problems

"She [Kwasnick 1999] lists three major
problems: the difficulty of choosing the
right facets; the lack of the ability to
express the relationships between them;
and the difficulty of visualizing it all.”

Denton 2003
33
Metadata

Definition:
 “Metadata
is sometimes defined literally as
'data about data,' but the term is normally
understood to mean structured data about
resources that can be used to help support a
wide range of operations.”



(UKOLN (2001): “Metadata in a nutshell”
(http://www.ukoln.ac.uk/metadata/publications/nutshell/)
Metadata and Dublin Core

“Perhaps the most well-known metadata initiative is
the Dublin Core.” (UKOLN 2001)
34
DC: The fifteen elements
Creator
Title
Subject
Contributor
Date
Description
Publisher
Type
Format
Coverage
Rights
Relation
Source
Language
Identifier
From Weibel, Stuart (2007), Dublin Core Metadata
Tutorial. OCLC Research
DC (Metadata) base syntax
implied
verb
one of 15
properties
property value
(an appropriate
literal)
DC:Creator
DC:Title
DC:Subject
DC:Date...
implied
subject
Resource has
property
X
qualifiers
(adjectives)
From Weibel, Stuart (2007), Dublin Core Metadata
Tutorial. OCLC Research
36
Resource has Subject
Resource has
Date
"Languages -- Grammar"
"2000-06-13"
From Weibel, Stuart (2007), Dublin Core Metadata
Tutorial. OCLC Research
37
http://www.tutorialsonline.info/Common/DublinCore.html
DC.Creator: Alan Kelsey
DC.Subject: Dublin Core Meta Tags
DC.Format: text/html
DC.Publisher: Alan Kelsey, Ltd.
DC.Date: 2007-01-06
DC.Coverage: Hennepin Technical College
DC.Language: EN
DC.Rights: Copyright 2011, ...
38
Metadata: a “world view” of
structure
Resources Metadata

Metadata: two kinds of data:
 Resource:
The object being
classified
 Metadata: The classification
data


Classification data could be
used as facets
Does this rather “flat” model
suit our purposes?
39
BLISS: BC2 standard facets













thing/entity
kind
part
property
material
process
operation
patient
product
by-product
agent
space
time
"These fundamental thirteen categories have
been found to be sufficient for the analysis of
vocabulary in almost all areas on knowledge.
It is however quite likely that other general
categories exist; it is certainly the case that
there are some domain specific categories,
such as those of form and genre in the field of
literature" (pp 79-80).
Vanda Broughton (2001): Faceted classification as a basis for
knowledge organization in a digital environment; the Bliss
Bibliographic Classification as a model for vocabulary management
and the creation of multidimensional knowledge structures, New
Review of Hypermedia and Multimedia, 7:1, 67-102
“BC2 makes an excellent starting point for
thinking of how to make a faceted
classification. Its facets can be renamed and
adapted to suit your particular
circumstances.” (Denton 2008)
40
Modelling: elements of the
EMLoT structure
2ndary Source
Primary Source
Document
Auspices
Privy Council
Office of the Revels
Court of Requests
Lord Chamberlain’s
Office
[...]
Source
playhouse context
court case
playhouse business
payment
player context
[...]
Event Type
Transcription
Record
Troupe
Admiral’s Men
Queen’s Men
Worchester’s Men
King’s Men
Oxford’s Boys
[...]
Event
Venue
Globe Theatre
Fortune
Blackfriars
Bel Savage
Boar’s Heads
[...]
41
Modelling

"In terms of humanities computing, modelling is
an iterative process of constructing and
developing something like a computational
'knowledge representation' as this is defined in
computer science. In fact we might say that a
model is a manipulable knowledge
representation.”

Willard McCarty 2002. “Humanities Computing: Essential
Problems, Experimental Practice” in Literary and
Linguistic Computing Vol 17 No 1. pp.103-125
42
Analytical Modelling: the utility
of failure

"the digital model illumines analytically by
isolating what would not compute. In other
words, the failures of analytic modelling
are where its success is to be found.”

Willard McCarty (2008). “What’s going on?” in Literary
and Linguistic Computing, Vol 23 No 3. p. 256
43
Structure as a scholarly outcome,
and its public presentation

The tension between that and the need for
a public face to the project.
 Classification:
“user focus”: focus on universal
structure
 Modelling Scholarly structure: “scholar focus”:
focus on individual scholarly exploration and
assertion
44
Where are the facets in a DB
structure?
45
Spiteri 1998: CRG Principles:
Fundamental Categories

g) Fundamental Categories: "there exist
no categories that are fundamental to all
subjects, and ... categories should be
derived based upon the nature of the
subject being classified" (pp 18-19)

Spiteri, Louise. (1998). “A Simplified Model for Facet
Analysis”. Now online at
http://iainstitute.org/en/learn/research/a_simplified_model_for
_facet_analysis.php
46
Spiteri 1998: CRG Principles:
Relevance

b) Relevance: "when choosing facets by
which to divide entities, it is important to
make sure that the facets reflect the
purpose, subject, and scope of the
classification system" (1998, 6).
47
Spiteri 1998: CRG (Classification
Research Group) Principles:
Differentiation

a) Differentiation: "when dividing an entity
into its component parts, it is important to
use characteristics of division (i.e., facets)
that will distinguish clearly among these
component parts" (Spiteri 1998, 5). For
example, dividing humans by sex.
48
Structured data requires clear
categories: authority lists

Authority lists provide a classification mechanism
CaseType (PoNE)
id
alGenderKey
alGender
alGenderAbrv
alOfficeTermKey
alOfficeTerm
type
29
agreement
64
appeal of breach of peace
63
appeal of homicide
28
assize of last presentation (darrein presentm
1
assize of mort d'ancestor
45
assize of novel disseisin
14
deforcement
52
grand assize
37
last presentation (darrein presentment)
38
mort d'ancestor
2
novel disseisin
59
plea de namio vetito
47
plea in ecclesiastical court
25
plea of acquittal
7
plea of advowson
18
plea of agreement
27
plea of an extent
42
plea of appeal
62
plea of breach of peace
8
plea of charter-warrant
56
plea of death
5
plea of debt
21
plea of detention
23
plea of disseisin
10
plea of dower
22
plea of ejection
31
plea of false judgment
58
plea of false testimony
[...]
1
1
2
(Other)
(Other)
2
(Other)
3
Male
M
3
King
4
Female
F
4
Secundarius
5
Institutio
n
Inst
5
Judge
6
Pincerna
6
M/F
M/F
7
Comes
7
Undefine
d
(Undefined)
8
Pope
9
Queen
10
Bishop
11
Counsellor
12
Abbess
13
Archbishop
14
Dux
15
Priest
16
Minister
49
Spiteri 1998: CRG Principles:
Ascertainability

c) Ascertainability: "it is important to
choose facets that are definite and can be
ascertained" (1998, 6).
50
Location Data in CCE:
kinds of locations
51
Spiteri 1998: CRG Principles:
Homogeneity & Mutual Exclusivity
e) Homogeneity: "facets must be
homogeneous" (1998, 18).
 f) Mutual Exclusivity: facets must be
"mutually exclusive," "each facet must
represent only one characteristic of
division" (1998, 18).

 “i.e.,
that the contents any two facets cannot
overlap, and that each facet must represent
only one characteristic of division.”
52
PASE: Office/Status/Occupation
53
Spiteri 1998: CRG Principles:
Permanence

d) Permanence: facets should "represent
permanent qualities of the item being
divided" (1998, 18).
54
PASE’s event types: an
evolving understanding
55
Revised event types (PASE II)

Acts of crime, law-breaking/violence


Hostility, Burh-abandonment, Lust, Disobedience, Burning, Insulting ...
Legal/governmental/administrative acts and legitimate use of violence

Legal/governmental/administrative acts


Legitimate use of violence


Life Events


Retirement, Journey, Naming, Betrothal, Marriage, Birth ...
Social/economic acts and relations

Visit, Promise, Begging, Ship-buiding, Slave-selling, Godparenting ...
Power-taking and power-leaving

Political Acts


Conquest, Agreement, Throne-sitting, Message-sending ...
Taking/leaving power


Imprisonment, Execution, Campaigning, War, Outlawing ...
Life-events/social and economic acts and relations


Challenge, Archiepiscopal see: restoration, Property-given/selling ...
Appointment of abbot, royal insignia-entrusting, Coronation, Deposition of bishop, ...
Religious/ecclesiastical acts

Acts of Christian piety


Commemoration of the dead, Martyrdom, Church going, Easter-observance, Confession ...
Acts of ecclesiastical authority

Baptism, Confraternity, Tonsuring, Liturgical celebration, Ecclesiastical reform, Mission sending ...
56
Conclusions


Facetted thinking in our structured projects arises out of
an exploratory and somewhat dynamic modelling rather
than classification activity.
It provides a way for the public to have better access to a
data structure that emerges from the project team’s
emerging and shifting understanding and interests in
their data.



It has to fit with a model of data that has a mix of different entity
types and no specific entity centre.
It has to fit with a model that is subject to change and evolution
Although facetted representation of our models is not a
perfect fit with their nature, it has allowed for a browsing
view of the data that enables the public to engage much
better with the complexities of these project’s materials.
57
58
DC.Creator: Alan Kelsey
DC.Subject: Dublin Core Meta Tags
DC.Format: text/html
DC.Publisher: Alan Kelsey, Ltd.
DC.Date: 2007-01-06
DC.Coverage: Hennepin Technical
College
DC.Language: EN
DC.Rights: Copyright 2011, ...
59
Why facets here?
Complex structure?
 Sparse data
 Public interface
 CCE query example

60