Hypertext 2010 - Analysis of graphs for Digital Preservation Suitability presentation (be careful, there are mouse click and timed animations)

Download Report

Transcript Hypertext 2010 - Analysis of graphs for Digital Preservation Suitability presentation (be careful, there are mouse click and timed animations)

Charles L. Cartledge
Michael L. Nelson
Old Dominion University
Department of Computer Science
Norfolk, VA 23529 USA


Why the problem is of
interest
Picking apart the title
◦ Preservation
◦ Graph
◦ Suitability



A game
Results
Conclusion
2


In 2007, Bob received
a photograph from an
analog age
Bob wants to preserve
the photograph into a
digital age
3


Scanned image
of the
photograph
Metadata
◦
◦
◦
◦
Data
Name
Date
Image type
etc.
Metadata
dc.name = “Josie McClure”
dc.date = “28 Feb 1907”
dc.type = “image/tiff”
…
Other data: TBD
{
4
+
=
5
dc.name = “Josie McClure”
dc.date = “28 Feb 1907”
dc.type = “image/tiff”
…
Other data: TBD
6
Can web objects (WO) be constructed to act in an
autonomous manner to create a network of WOs
that live on the web architecture and can be
expected to outlive the people and institutions
that created them?
7
8
Title: Analysis of Graphs for Digital Preservation Suitability



Repurpose one thing to do
something else
To revisit how something
works and utilize it in a
new and novel way
“To bravely go where no
one …”
9
Title: Analysis of Graphs for Digital Preservation Suitability




Random – global
construction
Power Law – global
construction
Small World – global
construction
Unsupervised Small
World (USW) – local
construction
“The number of systems of terminology
presently used in graph theory is equal,
to a close approximation, to the
number of graph theorists.”
Enumerative Combinatorics, 1986
10


Robustness – a complex
network is robust if it
keeps is basic
functionality even under
failure of some of its
components
Resilience – is how a
network responds
against repeated
component failure
Brandes, “Network Analysis,
Methodological Foundations”, 2005
11


There are lots of
ways to quantify
the characteristics
of a graph
This equation
captures our
intuition of damage
to a graph based
on its structure
12


Centrality “denotes an
order of importance on
the vertices or edges of
a graph by assigning
real values to them.”
A centrality index “is
only depending on the
structure of the graph.”
Brandes, “Network Analysis,
Methodological Foundations”, 2005
13



The number of
shortest paths
between all
nodes that go
through an
edge
Highest = 57
(more than
one)
Lowest = 4
14



The number of
shortest paths
that go through
a vertex
Highest = 69
Lowest = 0
(more than
one)
15



The number of
edges incident
to a vertex
Highest = 4
(more than
one)
Lowest = 1
(more than
one)
16


An attack profile uses a centrality measurement to decide
which graph component to eliminate
Mallory will use an attack profile during the game
Attack
profile
# of
unique
graphs
Max.
depth
Min.
depth
Mean
depth
St. dev.
Depth
D-V-L
428,580
20
4
15.57
3.65
D-V-H
8
2
1
1.87
0.35
B-E-L
7
6
6
6
0.00
B-E-H
2
2
2
2
0.00
B-V-L
53,155
20
15
19.56
0.82
B-V-H
1
2
2
2
n/a
17

As the path length grows, graph knowledge grows
from Local to Global
18



Mallory’s goal - destroy the
graph, or give up
Bob’s graph’s goal - survive
Rules of the game
◦ Alternating turns
◦ Mallory has to maintain the same
attack profile through out
◦ Mallory has local knowledge only
◦ Mallory can only remove/destroy
a maximum number of edges or
vertices per turn
◦ Bob’s graph can only attempt to
recreate a fixed percentage of the
graph per turn
19

Sample graph
◦ 20 vertices
◦ 24 edges
◦ Random degree
distribution

Attack parameters
◦ Attack profile: B-V-H
◦ Malory has 2 shots
per turn
◦ Path length: 2 edges
20


Graph has 1,000 nodes
Attack parameters
◦ Attack profile: B-V-H
◦ Attacker has 100 shots per turn
◦ Path length: 10 edges

Resilience parameters
◦ Graph repair: 4% of nodes
selected for potential
reconstruction
◦ Same repair parameters as
creation

Game ends at 10 turns or when
the graph is disconnected

Results
◦ Power law graph – 1
vertex
◦ Random graph – 100
vertices
◦ Small world graph 140
vertices
◦ USW – 170 vertices
21




Title: Analysis of Graphs for Digital Preservation Suitability
WO contains digital data to be
preserved
Others
WO contains links to copies of
Self
itself and to other WOs
When WO is accessed, it checks
Reconstruct Accessed
the availability of its own copies
and connections to
“neighboring” WOs
If copies are lost, then initiate
reconstruction processes
22


A USW graph is more robust than small-world,
random or power law graphs
USW has shown to have better preservation
potential than other tested graphs
Charles L. Cartledge
Michael L. Nelson
Old Dominion University
Department of Computer Science
Norfolk, VA 23529 USA
This work was funded in part by the National Science Foundation.
23