Transcript Document

CERIA Laboratory
LH*RSP2P: A Scalable Distributed
Data Structure for P2P
Environment
W. LITWIN
T. SCHWARZ
H.YAKOUBEN
Paris Dauphine University
Santa Clara University (USA)
Paris Dauphine University
[email protected]
[email protected]
[email protected]
Plan
 Objective
 Overview: SDDS & P2P
 LH*RSP2P




Architecture
Addressing
Properties
Churn Management
 Conclusion
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
2
Objective
Very
Large
Scalable
Files
LH*RS key
High
search
availability
requires
to
deal withat
most one
churn
P2P
Design a new
SDDS for a
structured P2P
environment
most one
AAtHigh
forwarding
available
message for key
search or insert
Data
or scan
Structure
(fastest known
and
performance)
treatment
of CHURN
forwarding
message
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
3
SDDS (1993)
A File of records identified by keys
 SDDS client nodes face the applications
and send queries to SDDS server nodes
 No centralized addressing
 Servers contain application or parity data

 In
buckets
Overflowing servers split on new servers
 Servers do not notify clients about splits

LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
4
SDDS (1993)

Clients use images of the file state for addressing
 Key based
 Range queries
 Scans
…

Images get adjusted towards the file state during
queries by Image Adjustment Messages
 Triggered


by incorrect addressing by the client
IAMs reflect the file evolution by splits or, rarely,
merges.
IAMs reflect also the location changes because of
failures and recovery
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
5
SDDS Typology
Data Structures
SDDS(1993)
Tree
Hash
1-dimensional
LH*, DDH, EH*,
CHORD...
Classics
d-dimensional
IH*…
1-d Tree
RP*,
m-d Tree
k-RP*, SD-Rtree, DRT*,
BATON
VBI-Tree
High Availability
Structured P2P Schemes
LH*m LH*g
k-Availability
LH*rs
Security
LH*sa
LH*s
Alg. Sign…
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
6
SDDS Expansion
Growth through splits under inserts
Peer
New
Peer
Clients
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
7
SDDS Client Image Evolution
Clients
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
8
SDDS 2007 Prototype
Available at CERIA site
 Announced at DbWorld
 Managing LH* RS and RP* files

 In distributed RAM
Uner Windows
 Over 1gbs Ethernet


Various functions
Response time reaching 30 microsec
 Up to 300 times faster than disk files

9
P2P (1995 ?)


Autonomous nodes store and search data
By flooding in early systems


Freenet, Napster, Gnutella…
Structured P2P reduce the flooding
 Using decentralized data structures

Distributed Hash Table (DHT) especially

Few folks know the concept is due to B. Devine
 FODO 93
 Chord, P-tree, VBI, Baton…

Structured P2P schemes are specific SDDS schemes
10
LH*RSP2P Addressing
 Global Addressing
a  hi(C ) ;
if a < n then a  hi+1(C ) ;
Rule
/* a is the address of peer destination of the key C*/
/* (i, n) state of an SDDS file, they are only known to the
file coordinator node
hi (C ) = C mod 2i
 Client Address
Calculus
a’  hi’(C ) ;
if a’ < n’ then a  hi’+1(C ) ;
/* a’ is the address of peer destination of the key C*/
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
11
LH*RSP2P File Expansion








File starts with i = 0 and n = 0 and a single data
bucket 0
Every bucket m keeps the bucket level j of hash
function hi last used to split, j = 0 initially.
Overflowing bucket m alerts the coordinator
Coordinator notifies bucket n to split
Bucket n applies hi + 1
About half of keys migrates to new bucket n + 2i
Bucket n and the new one set j = j + 1
Coordinator performs
n=n+1
 if n = 2i then i = i + 1 and n = 0
12
LH*RSP2P

Architecture based on LH*RS
LH*RSP2P Peer
Server Part
LH*P2P
LH*RS
DB
LH*RS
Client
LH*RS
PB
Pupils
j
i’
n’
LH*RS
Client
LH*RSP2P Peer
Client Part
Peer
Candidate
Peer
Client &
Spare
Storage
Candidate
Peer
LH*RSP2P Peer
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
13
Peer & Pupil Image Adjustment During Peer Split
i’ = j-1 ;
/* j value before the split
n‘ = a +1
/* a is the splitting bucket
if n’ = 2i’ then i’ = j + 1 ; n’ = 0 ;
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
14
Example
i’= j =1;
n’= m+1= 1+1;
If n’=21 then n’=0; i’= i’+1
and
(i’, n’)= (2,0)
j=2
j=1
j=2
j=2
j=2
j=2
j=2
i’=1
n’=1
i’=1
n’=0
i’=1
n’=1
i’=1
n’=1
i’=2
n’=0
i’=1
n’=1
i’=2
n’=0
P2
P0
P2
P3
P0
P1
P1
i=1
n=1 Coordinator Peer (CP)
Before splitting
i=2
CP
n=0
After splitting
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
15
Server Address Calculus
a’  hj (C ) ;
if a’= a then exit
else send C to bucket a
exit;
/* Bucket a is the correct one
/* Forwarding to bucket a’
 Simpler and faster than for LH*
 As only one forwarding is possible
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
16
Peer Image Adjustment by IAM
 IAM comes from the correct bucket
 Bucket a is the forwarding one
 Bucket level j is that of the correct bucket
 0f the forwarding one as well
i’ j - 1, n’ a + 1 ;
if n’ >2i’ then n’ 0 ; i’ i’ + 1 ;
• Same algorithm as for the adjustment of the local client and of
pupils after a split
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
17
Peer Image Adjustment by IAM
Checking and
forward the key
using A2
Pairs
9
IAM
a=1
j=4
j=4
j=4
j=3
j=4
i’=3
n’=1
i’=3
n’=2
i’=2
n’=1
i’=3
n’=2
P0
P1
P4
9
i =3
n=2
P9
PC
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
18
Peer Image Adjustment by IAM
9
IAM
a=1
j=4
Pairs
j=4
j=4
j=3
j=4
i’=3
n’=1
i’=3
n’=2
i’= 3
n’= 2
i’=3
n’=2
P0
P1
P4
9
i =3
n=2
P9
PC
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
19
LH*RSP2P

TUTOR,
Example of the
File Expansion
Update
Pupil
Peers
j=3
j=2
j=3
j=3
j=3
i’=2
n’=1
i’=2
i’=1
n’=3
n’=1
i’=2
n’=2
i’=2
n’=3
P0
P2
P5
i’=2
i’=0
n’=1
n’=0
Candidate
PupilPeer
Assign a Tutor for
Candidate Peer:
LH-hash of the client
IP Address
P6
i=2
i=2
n=2
n=3
PC
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
20
Properties of LH*RSP2P :
1.
2.
3.
The maximal number of forwarding
messages for the key search is one.
The maximal number of rounds for the
scan search can be two.
The worst case addressing performance of
LH*RSP2P as defined by Property 1 is the
fastest possible for any SDDS or a
practical structured P2P addressing
scheme.
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
21
Proof Property 1
 Case 1 : i’ = i and n’ < n
 Peer a addresses peer a’, using its image (i’,n’) from last
split
 No IAM came since.
j = i’+1
j = i’
0
a
n
a’
No forwarding
2i’
n+2i’
a+2i’
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
22
Proof Property 1
 Case 1 : i’ = i and n’ < n
 Peer a addresses peer a’, using its image (i’,n’) from last
split
 No IAM came since.
j = i’+1
j = i’
0
a
n
a’
2i’
n+2i’
a+2i’
Forwarding possible for any address a’ between (a, n)
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
23
Proof Property 1
 Case 2 : i = i’ + 1 and n < n’
 Peer a addresses peer a’, using its image (i’,n’) from last
split
 No IAM came since.
j = i’+2
j = i’+1
j = i’
0
na
2i’
a’
2i’+1 n+2i’+1
Forwarding possible for any address a’ beyond [n, a]
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
24
Proof Property 2
Peer a sends the scan to all buckets in its image
• Including its image (i’, n’)
• Receiving peer a’ can have bucket level j as in the
image
• j (a) = j’ (a)
• No forwarding of the scan
• Or, bucket a’ split
• Once and only once
• j (a) = j’ (a) + 1
• See the figs for the key address calculus
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
25
Proof Property 2
Peer a’ forwards the scan to its (only) child
• No child can have a child
• Peer a would first need to split again as
well
•Every peer gets thus the scan and only
once
•There at worst two rounds
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
26
Proof Property 2
•The only faster worst case performance is
zero forwarding messages
•Every split has to be notified then to every
peer
•It would be against the scalability goal of
every SDDS & structured P2P scheme
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
27
LH*RSP2P Churn Management
Bucket reliability group with k parity buckets protect
against up to k bucket failures per group
Data Record
Tutoring records
5
4
3
2
1
0
Rank
Parity Record
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Data Peer
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Parity Peer
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
28
LH*RSP2P Churn Management
Peer leaves with notice
Say that’s
OK
j
Coordinator
Peer
i’,n’
Notification P0
…
j
…
j
i’,n’
i’,n’
i’,n’
Pl
Pm
Candidate
Peer
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
29
LH*RSP2P Churn Management
Peer leaves without notice or fails
LH*RS Bucket
Recovery
Forward
Coordinator
Peer
j
j
j
i’,n’
Pl-1
i’,n’
i’,n’
i’,n’
Pl
Pm
Parity Peer
Query
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
30
LH*RSP2P Churn Management
Peer leaves without notice or fails
LH*RS Bucket
Recovery
j
i’,n’
j
Coordinator
Peer
Pl
i’,n’
Pl-1
j
i’,n’
i’,n’
Pm
Parity Peer
Answer
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
31
LH*RSP2P Churn Management
Sure Search : Protects
against outdated
server read (transient
communication or
peer failure)
j
i’,n’
Pl
Coordinator
Peer
j
j
j
i’,n’
Pl-1
i’,n’
i’,n’
i’,n’
Pl
Pm
Parity Peer
AnswerQuery
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
32
Conclusion

LH*RSP2P require at most one forward message when
addressing error occur

Is the fastest known SDDS and P2P key based
addressing algorithm

Protects efficiently against churn

Allows to manage very large scalable files

Should have numerous applications
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
33
Current & Future Work

Implementation of the peer node
architecture and the tutoring functions
 Using existing LH*RS prototype
 Created
2004
by Rim Moussa & shown at VLDB
Performance Analysis
 Variants

LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
34
END
Thank you for Your Attention
Work partly funded by the IST eGov-Bus project
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
35
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Adina Crainiceanu, Prakash Linga, Johannes Gehrke, and Jayavel Shanmugasundaram.
Querying Peer-to-Peer Networks Using P-Trees. In Proceedings of the Seventh International Workshop on
the Web and Databases (WebDB 2004). , June 2004.
Bolosky W. J, Douceur J. R, Howell J. The Farsite Project: A Retrospective. Operating System Review,
April 2007, p.17-26
Devine R. Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm, Proc. Of
the 4th Intl. Foundation of Data Organisation and Algorithms –FODO, 1993.
Litwin, W. Neimat, M-A., Schneider, D. LH*: Linear Hashing for Distributed Files. ACMSIGMOD Int. Conf. On Management of Data, 93.
Litwin, W., Neimat, M-A., Schneider, D. LH*: A Scalable Distributed Data Structure. ACMTODS, (Dec., 1996).
Litwin, W., Neimat, M-A. High Availability LH* Schemes with Mirroring, Intl. Conf on Cooperating
systems, , IEEE Press 1996.
Litwin, W. Moussa R, Schwarz T. LH*rs- A Highly Available Distributed Data Storage. Proc of 30th
VLDB Conference, , 2004.
Litwin, W. Moussa R, Schwarz T. LH*rs- A Highly Available Scalable Distributed Data Structure.
ACM-TODS, Sept 2005.
Steven D. Gribble, Eric A. Brewer, Joseph M. Hellerstein, and David Culler. Scalable,
Distributed Data Structures for Internet Service Construction, Proceedings of the Fourth Symposium on
Operating Systems Design and Implementation (OSDI 2000)
Stoica, Morris, Karger, Kaashoek, Balakrishma. CHORD : A scalable Peer to Peer Lookup Service
for Internet Application. SIGCOMM’O, August 27-31, 2001,
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
36
LH*RSP2P: A Scalable Distributed Data Structure for P2P Environment
37