Transcript Document

Double Middleware-based
Mobile Data Service
Jinsuo Zhang
CISE Department
University of Florida
April 2002
Outline
•
•
•
•
•
•
•
Introduction
Double Middleware-based Architecture
Automatic Data Hoarding
Data Management & Consistency Control
Heterogeneity Support
Mobile Network Adaptation
Summary & Future Work
History of Data Management
Monitor
File System
Network
File System
Hierarchy
Location Transparency
Name Transparency
Name Transparency
Mobile File
Service
?
Current Computation Model
InterNet
Challenges
Data Availability
Automatic Hoarding
What, When, Where, Who, Why,
How
Data Consistency
Heterogeneous
Communication
Data Content
Mobile Network Adaptation
Motivation
• Data Access From Anywhere at Any Time
– Data access in disconnection, weak connection
and strong connection mode
• Device Independence
– laptop, PDA,…
• Heterogeneity Support
– Communication
– Data content
Contribution
• A New Architecture & Implementation
• Filtering Mechanism & a New Hybrid Prioritybased Algorithm
• XML-based Protocol
• Asynchronous Consistency Model
• Network Log Optimization
• Incremental-based Weak Connection Adaptation
• Simulation-based & Live Experiment Validation
Trace Information
Trace Collecting Period
Footprint
(Number of Files)
Trace 1
63 days
8k
Trace 2
62 days
8k
Trace 3
232 days
37k
Trace 4
132 days
100k
Trace 5
61 days
4k
Outline
•
•
•
•
•
•
•
Introduction
Double Middleware-based Architecture
Automatic Data Hoarding
Data Management & Consistency Control
Heterogeneity Support
Mobile Network Adaptation
Summary & Future Work
Double Middleware-based Architecture
AbiWord
vi
F-MEM
M-MEM
MetaData
Server
Operating System (Linux)
InterNet
MS Word
Visual Studio
M-MEM
Operating System
(Windows)
Data
Server
rr
Role of Data
Publish/Import
Mobile Data
Managed by both Host
FS and MFS
Normal Data
Retire
Managed only by Host FS
Data Propagation
Publish
Import
MDSS
Synchronization
……
……
MH 1
MH 2
MH 3
MH 4
Mobile Client Model
MFS Utility
Application
Analyzer
Cache
Manager
Data
Synchronizer
…
User Space
M-MEM
Kernel Space
Logic Layer of File System
MFS Extension
Physical Layer of File System
M-MEM Responsibility
•
•
•
•
•
•
•
Observe File Access Pattern
Decide Active Files On-the-Fly
Publish/Import files
Cache Management
Data Synchronization
Heterogeneity Support
Mobile Network Adaptation
M-MEM Workflow
File Event
Piggybacked
Message
Filtering
Non-Write
Analyzer
Write
Event
Optimizer
Cache
Manager
Adaptor
MD
Agent
Network
Adaptor
Synchronizer
Scheduler
Event Queue
HTTP/SMTP/POP3
F-MEM Architecture
Data Server
Output
Packager
Request
Interpreter
Output Event Queue
Reply
Authentication
Messenger
Request
Input Event Queue
Meta-Data Server
Input
Decoder
F-MEM
XML Protocol
Parser
System Summary
Dimension
Our solution
Mobile Data Spectrum
Any File in Supported OS
Data Selection
Automatic
Application Transparency
Transparent
Conservative/Optimistic
Optimistic
Client/Server or Peer-to-Peer
Hybrid
Immediate/delayed propagation
Delayed
Push/Pull Model
Hybrid Push/Pull
Replication Granularity
Per-File
Replacement Policy
Hybrid Priority
Updated Data Shipping
Incremental Update
Outline
•
•
•
•
•
•
•
Introduction
Double Middleware-based Architecture
Automatic Data Hoarding
Data Management & Consistency Control
Heterogeneity Support
Mobile Network Adaptation
Summary & Future Work
File Selection Workflow
Hooks in OS
Kernel
File access event
Filter
Filtered event
Analyzer
Hoarding List
User’s Profile
in Server
Hoard Instruction
Mobile
Computer
Hooks on Operating System
• Linux
– Modified Linux Kernel
– Introduce a Pseudo char-Driver as Bridge between
Kernel & Analyzer.
• Windows (95,98,ME,NT & 2000)
– Introduce a Filter Device Driver to Intercept File
Access Event (DDK)
– Forward Event to User-space Analyzer
• CE.Net by Ajay
• Real World Problem
– Translation
– Kernel-Application Communication
File Selection Workflow
Hooks in OS
Kernel
File access event
Filter
Filtered event
Analyzer
Hoarding List
User’s Profile
in Server
Hoard Instruction
Mobile
Computer
Location Distribution
System/Software Package Files
Other Files
100%
80%
60%
40%
20%
0%
Trace 1
Trace 2
Trace 3
Trace 4
Trace 5
Average
Program Access Distribution
Percentage of File Access (%)
Trace 1
Trace 2
Trace 3
Trace 4
Trace 5
Average
100
80
60
40
20
0
1
2
3
4
5
6
7
8
9
Number of Program (ordered by number of accesses)
10
Filtering Mechanism
• Filter Types:
– Path Based Filter
• /tmp,/etc,/usr/bin,/usr/lib,/dev,$HOME/.pine, …
• /WinNT, /Program files
– Program Base Filter
• Find, daemon, service task, virus tool,……
– Extension Name Based Filter
• *.bak, *.tmp, *.old, *.swp, …
– File Type Based Filter
• Pipe, device
– Time Based Filter
• Backup, scan virus
– Derive Based Filter
• *.c-> *.o, *.tex->*.dvi,*.ps
– Meta-info Based Filter
• Size, Date, Permission, Ownership……
Simulation Methodology
Trace Interpreter
M-MEM Utilities
User Space
X Disabled
Kernel Space
Operating System
MFS Extension
Effectiveness of Filters
100%
90%
80%
70%
60%
50%
40%
30%
Non-Filtered
20%
10%
Filtered Out
0%
Trace 1
Trace 2
Trace 3
Trace 4
*Left bar: Unique Files. Right bar: Access
Trace 5
Average
Daily Working Set for Trace 1
Number of Unique Files
400
W/o Filter
W/ Filter
350
300
250
200
150
100
50
0
0
10
20
30
40
Day
50
60
70
Daily Working Set for Trace 2
Number of Unique Files
350
w/o Filter
w/ Filter
300
250
200
150
100
50
0
0
10
20
30
40
Day
50
60
70
Daily Working Set for Trace 3
Number of Unique Files
w/o Filter
w/ Filter
1400
1200
1000
800
600
400
200
0
0
50
100
150
Day
200
250
Daily Working Set for Trace 4
4000
w/o Filter
w/ Filter
3500
3000
2500
2000
1500
1000
500
0
0
20
40
60
80
100
120
140
Daily Working Set for Trace 5
Number of Unique Files
400
w/o Filter
w/ Filter
350
300
250
200
150
100
50
0
0
10
20
30
40
Day
50
60
70
File Selection Workflow
Hooks in OS
Kernel
File access event
Filter
Filtered event
Analyzer
Hoarding List
User’s Profile
in Server
Hoard Instruction
Mobile
Computer
Observations (1 of 2)
<= 1min
<= 10min
<= 1 hour
<= 1 day
>1day
Percentage of Unique Files
100%
80%
60%
40%
20%
0%
Trace 1
Trace 2
Trace 3
Trace 4
Trace 5
Left bar: Non-filtered, Right bar: Filtered
Average
Observations (2 of 2)
Percentage of Unique Files
Freq=1
Freq=2
Freq=3
Freq=4
Freq>=5
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Trace 1
Trace 2
Trace 3
Trace 4
Trace 5
Left bar: Non-filtered, Right bar: Filtered
Average
Hybrid Priority-based Algorithm
F (t, f, a) = 1 F1(t) + 2 F2(f) + 3 F3(a)
• F1(<current time>) = H0 – <aging
parameter> * (<current time> – last
access time)
• F2(freq) = FA * freq
• F3(<active period>) = AA * <active
period> > H0 ? H0 : AA * <active period>
Algorithm Analysis
• Property 1: If F (t, f1, a1) > F (t, f2, a2), then for   >
t, F (, f1, a1) > F (, f2, a2)
Property 1 means if two files are ordered by the hybrid priority and
no file is touched since ordering, then the order is always kept.
• Property 2: The file list ordered by the hybrid priority
need not be updated between any two file access intervals.
• Property 3: If one file in the file list is accessed, the
order after this file in the file list still holds.
Hybrid Priority-based Algorithm
Bigger HP
1
Header
2
Smaller HP
3
4
5
6
7
Tail (existing file) (example)
8
9
10
Tail (new file)
1. Records history information, reset when newly inserted.
2. Dynamic just-in-time re-computing
3. Binary insertion, complexity O(log(n))
Hit Ratio for Trace 1
80
Hit Ratio (%)
70
60
50
40
30
OPT
HP
LRU
20
10
20
30
40
50
Cache Size (number of files)
60
Hit Ratio for Trace 3
70
Hit Ratio (%)
65
60
55
50
45
OPT
HP
LRU
40
35
20
30
40
Cache Size (number of files)
50
60
Hit Ratio for Trace 4
Hit Ratio (%)
70
60
50
40
OPT
HP
LRU
30
20
20
30
40
50
Cache Size (number of files)
60
Outline
•
•
•
•
•
•
•
Introduction
Double Middleware-based Architecture
Automatic Data Hoarding
Data Management & Consistency Control
Heterogeneity Support
Mobile Network Adaptation
Summary & Future Work
Data Management
• Distributed Management
– Naming:URI
– Version Control
– Timestamp
• Data Spectrum
• Role Change
• Automatic Selection, Publish, Importing,
Consistency Maintenance
Data Representation in MDSS
Mobile Profile
Replica Descriptor
Data
URI
URI1
Version
URI
Data 1
……
Mobile User
……
URI
URI2
Version
Data 2
……
Meta-Data Server
(Apache Xindice)
Data Server
Consistency Model
1. Update Detection
2. Update Propagation
MDSS
3. Buffer Notification
5. Refresh Request
6. Delta Data
4. Piggybacked Notification of Data Staleness
Outline
•
•
•
•
•
•
•
Introduction
Double Middleware-based Architecture
Automatic Data Hoarding
Data Management & Consistency Control
Heterogeneity Support
Mobile Network Adaptation
Summary & Future Work
Comm. Between M-MEM and FMEM
XML-based
Protocol
XML Command
XML-based
Protocol
Optimized XML Command
Mobile Message
Adaptor
Message
Transport Agent
Mobile Message
Adaptor
HTTP/SMTP/POP3
Network
•
•
•
Message-based,Asynchronous, Durable, Reliable
XML-based
File granularity, incremental based
Message
Transport Agent
XML Communication Protocol
•
File Management
–
Publish
–
Import
–
Retire
–
Delete
–
Rename
•
User Management
–
–
–
–
•
PublicKeyReq
Login
Logout
Profile
Consistency Maintenance
–
–
Update
Refresh
A Publish Example
<?xml version="1.0"?>
<MobileDataMessage>
<OPERATOR>
PUBLISH
</OPERATOR>
<FileInfo>
<URI> mfs://mymachine/mypath/myname.dat </URI>
<MODIFYTIME> 12:34:56 01/02/2000 </MODIFYTIME>
<SIZE> 12345 </SIZE>
<ACCESS>
<OWNER> somebody </OWNER>
<GROUP> somegroup </GROUP>
<ACCESSATTRIBUTE>rwxrwxrwx</ACCESSATTRIBUTE>
<ACCESS>
<VERSION> 1 </VERSION>
<FileInfo>
<AGENTINFO>
<ID>M-MEM 1.0</ID>
<HOSTOS> Linux </HOSTOS>
</AGENTINFO>
</MobileDataMessage>
Outline
•
•
•
•
Introduction
Double Middleware-based Architecture
Automatic Data Hoarding
Data Management & Consistency
Control
• Heterogeneity Support
• Mobile Network Adaptation
• Summary & Future Work
Mobile Network Adaptation
•
•
•
•
•
Hoarding
Filtering Mechanism
Working Set Locality
Log Optimization
Incremental Update/Hoarding
Relative Daily Working Set Overlap
vs. Previous Day (%)
Locality for Trace 3
100
90
80
70
60
50
40
30
20
10
0
0
20
40
60
80
100 120
Day
140 160
180 200
220
Relative Daily Working Set
Overlap vs. Previous Day (%)
Locality for Trace 4
100
90
80
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
Day
80
90 100 110 120 130
Trembling Phenomenon
File Access Sequence: ABXYABUVAB
Publish
Publish
Publish
Publish
Retire
Publish
Retire
Publish
A
B
X
Y
X
U
Y
V
Publish window size =4
Publish
Publish
Retire
Publish
Retire
Publish
Retire
Publish
Retire
Publish
Retire
Publish
Retire
Publish
Retire
Publish
Retire
Publish
A
B
A
X
B
Y
X
A
Y
B
A
U
B
V
U
A
V
B
Publish window size =2
Publish
Publish
A
B
Number of Publish Requests
Trembling for Trace 3
7000
6000
5000
4000
3000
2000
1000
0
20
100
200
300
400
500
Publish Window Size (# of files)
600
700
Number of Publish Request
Trembling for Trace 4
25000
23000
21000
19000
17000
15000
13000
11000
9000
7000
5000
20
100
200
300
400
500
Publish Window Size (number of files)
600
700
Accumulative Percentage of Interreference Interval (%)
Inter Access Interval
100
90
80
70
60
50
40
30
20
10
0
Trace 5
Trace 3
Trace 2
Trace 4
Trace 1
0
60
120
File Access Interval (Seconds)
180
Number of Publish Requests
Log Optimization for trace 3
7000
No Delay
Delay 3min
6000
5000
4000
3000
2000
1000
0
20
30
40
50
60
70
80
Publish Window Size (number of files)
90
100
Number of Publish Requests
Log Optimization for trace 4
25000
Nodelay
20000
Dealy 3min
15000
10000
5000
0
20
30
40
50
60
70
Publish Window Size
80
90
100
Incremental Update/Hoarding
Design Method
Client FS
M-MEM
Network
F-MEM
Versioned File Manager
Mobile Host
Versioned File
Manager
1. Version control
2. Versioned File Archive
Server End
Experiment Environment
FMEM + NistNET
D-Link Router
M-MEM + NistNET
Workload Introduction
Redhat 7.1 Distribution
RedHat 7.2 Distribution
Workload
Version
Size
(Bytes)
Number
of files
Version
Size
(Bytes)
Number
of files
Apache
1.3.19
8.9M
752
1.3.20
9.3M
773
Bash
2.04
8.2M
731
2.05
8.5M
761
Glade
0.5.9
8.9M
549
0.6.2
10.8M
598
Groff
1.16.1
7.7M
668
1.17.2
7.9M
654
GNU Spell
32.6
6.3M
446
33.7
7.4M
507
Percentage of Network Traffic
Relative to Value Shipping
Network Traffic
35
30
25
20
15
10
5
0
Apache Bash
Glade
Groff
Spell
Avg
Apache
ReIntegration Time (Seconds)
1500
Value Shipping
Incremental
1200
900
600
300
4KB
5KB
6KB
7KB
8KB
Bandw idth (1KB=1024 Bytes)
9KB
10KB
Bash
ReIntegration Time (Seconds)
1500
Value Shipping
Incremental
1200
900
600
300
4KB
5KB
6KB
7KB
8KB
Bandw idth (1KB=1024 Bytes)
9KB
10KB
Glade
ReIntegration Time (Seconds)
2400
Value Shipping
Incremental
2100
1800
1500
1200
900
600
4KB
5KB
6KB
7KB
8KB
Bandw idth (1KB=1024 Bytes)
9KB
10KB
Groff
1500
ReIntegration Time (Seconds)
Value Shipping
Incremental
1200
900
600
300
4KB
5KB
6KB
7KB
8KB
Bandw idth (1KB=1024 Bytes)
9KB
10KB
GNU Spell
2100
Value Shipping
Incremental
ReIntegration Time (Seconds)
1800
1500
1200
900
600
4KB
5KB
6KB
7KB
8KB
Bandw idth (1KB=1024 Bytes)
9KB
10KB
Outline
•
•
•
•
•
•
•
Introduction
Double Middleware-based Architecture
Automatic Data Hoarding
Data Management & Consistency
Heterogeneity Support
Mobile Network Adaptation
Summary
Summary
• A New Architecture & Implementation
–
–
–
–
•
•
•
•
•
•
Linux & Windows
Xindice Native XML DB
Libxml XML Parser
C/C++/STL, ~20K Lines
Filtering Mechanism & a New Hybrid Priority-based Algorithm
XML-based Protocol
Asynchronous Consistency Model
Network Log Optimization
Incremental-based Weak Connection Adaptation
Simulation-based & Live Experiment Validation
Questions
?