Transcript Document
Double Middleware-based Mobile Data Service Jinsuo Zhang CISE Department University of Florida April 2002 Outline • • • • • • • Introduction Double Middleware-based Architecture Automatic Data Hoarding Data Management & Consistency Control Heterogeneity Support Mobile Network Adaptation Summary & Future Work History of Data Management Monitor File System Network File System Hierarchy Location Transparency Name Transparency Name Transparency Mobile File Service ? Current Computation Model InterNet Challenges Data Availability Automatic Hoarding What, When, Where, Who, Why, How Data Consistency Heterogeneous Communication Data Content Mobile Network Adaptation Motivation • Data Access From Anywhere at Any Time – Data access in disconnection, weak connection and strong connection mode • Device Independence – laptop, PDA,… • Heterogeneity Support – Communication – Data content Contribution • A New Architecture & Implementation • Filtering Mechanism & a New Hybrid Prioritybased Algorithm • XML-based Protocol • Asynchronous Consistency Model • Network Log Optimization • Incremental-based Weak Connection Adaptation • Simulation-based & Live Experiment Validation Trace Information Trace Collecting Period Footprint (Number of Files) Trace 1 63 days 8k Trace 2 62 days 8k Trace 3 232 days 37k Trace 4 132 days 100k Trace 5 61 days 4k Outline • • • • • • • Introduction Double Middleware-based Architecture Automatic Data Hoarding Data Management & Consistency Control Heterogeneity Support Mobile Network Adaptation Summary & Future Work Double Middleware-based Architecture AbiWord vi F-MEM M-MEM MetaData Server Operating System (Linux) InterNet MS Word Visual Studio M-MEM Operating System (Windows) Data Server rr Role of Data Publish/Import Mobile Data Managed by both Host FS and MFS Normal Data Retire Managed only by Host FS Data Propagation Publish Import MDSS Synchronization …… …… MH 1 MH 2 MH 3 MH 4 Mobile Client Model MFS Utility Application Analyzer Cache Manager Data Synchronizer … User Space M-MEM Kernel Space Logic Layer of File System MFS Extension Physical Layer of File System M-MEM Responsibility • • • • • • • Observe File Access Pattern Decide Active Files On-the-Fly Publish/Import files Cache Management Data Synchronization Heterogeneity Support Mobile Network Adaptation M-MEM Workflow File Event Piggybacked Message Filtering Non-Write Analyzer Write Event Optimizer Cache Manager Adaptor MD Agent Network Adaptor Synchronizer Scheduler Event Queue HTTP/SMTP/POP3 F-MEM Architecture Data Server Output Packager Request Interpreter Output Event Queue Reply Authentication Messenger Request Input Event Queue Meta-Data Server Input Decoder F-MEM XML Protocol Parser System Summary Dimension Our solution Mobile Data Spectrum Any File in Supported OS Data Selection Automatic Application Transparency Transparent Conservative/Optimistic Optimistic Client/Server or Peer-to-Peer Hybrid Immediate/delayed propagation Delayed Push/Pull Model Hybrid Push/Pull Replication Granularity Per-File Replacement Policy Hybrid Priority Updated Data Shipping Incremental Update Outline • • • • • • • Introduction Double Middleware-based Architecture Automatic Data Hoarding Data Management & Consistency Control Heterogeneity Support Mobile Network Adaptation Summary & Future Work File Selection Workflow Hooks in OS Kernel File access event Filter Filtered event Analyzer Hoarding List User’s Profile in Server Hoard Instruction Mobile Computer Hooks on Operating System • Linux – Modified Linux Kernel – Introduce a Pseudo char-Driver as Bridge between Kernel & Analyzer. • Windows (95,98,ME,NT & 2000) – Introduce a Filter Device Driver to Intercept File Access Event (DDK) – Forward Event to User-space Analyzer • CE.Net by Ajay • Real World Problem – Translation – Kernel-Application Communication File Selection Workflow Hooks in OS Kernel File access event Filter Filtered event Analyzer Hoarding List User’s Profile in Server Hoard Instruction Mobile Computer Location Distribution System/Software Package Files Other Files 100% 80% 60% 40% 20% 0% Trace 1 Trace 2 Trace 3 Trace 4 Trace 5 Average Program Access Distribution Percentage of File Access (%) Trace 1 Trace 2 Trace 3 Trace 4 Trace 5 Average 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 Number of Program (ordered by number of accesses) 10 Filtering Mechanism • Filter Types: – Path Based Filter • /tmp,/etc,/usr/bin,/usr/lib,/dev,$HOME/.pine, … • /WinNT, /Program files – Program Base Filter • Find, daemon, service task, virus tool,…… – Extension Name Based Filter • *.bak, *.tmp, *.old, *.swp, … – File Type Based Filter • Pipe, device – Time Based Filter • Backup, scan virus – Derive Based Filter • *.c-> *.o, *.tex->*.dvi,*.ps – Meta-info Based Filter • Size, Date, Permission, Ownership…… Simulation Methodology Trace Interpreter M-MEM Utilities User Space X Disabled Kernel Space Operating System MFS Extension Effectiveness of Filters 100% 90% 80% 70% 60% 50% 40% 30% Non-Filtered 20% 10% Filtered Out 0% Trace 1 Trace 2 Trace 3 Trace 4 *Left bar: Unique Files. Right bar: Access Trace 5 Average Daily Working Set for Trace 1 Number of Unique Files 400 W/o Filter W/ Filter 350 300 250 200 150 100 50 0 0 10 20 30 40 Day 50 60 70 Daily Working Set for Trace 2 Number of Unique Files 350 w/o Filter w/ Filter 300 250 200 150 100 50 0 0 10 20 30 40 Day 50 60 70 Daily Working Set for Trace 3 Number of Unique Files w/o Filter w/ Filter 1400 1200 1000 800 600 400 200 0 0 50 100 150 Day 200 250 Daily Working Set for Trace 4 4000 w/o Filter w/ Filter 3500 3000 2500 2000 1500 1000 500 0 0 20 40 60 80 100 120 140 Daily Working Set for Trace 5 Number of Unique Files 400 w/o Filter w/ Filter 350 300 250 200 150 100 50 0 0 10 20 30 40 Day 50 60 70 File Selection Workflow Hooks in OS Kernel File access event Filter Filtered event Analyzer Hoarding List User’s Profile in Server Hoard Instruction Mobile Computer Observations (1 of 2) <= 1min <= 10min <= 1 hour <= 1 day >1day Percentage of Unique Files 100% 80% 60% 40% 20% 0% Trace 1 Trace 2 Trace 3 Trace 4 Trace 5 Left bar: Non-filtered, Right bar: Filtered Average Observations (2 of 2) Percentage of Unique Files Freq=1 Freq=2 Freq=3 Freq=4 Freq>=5 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Trace 1 Trace 2 Trace 3 Trace 4 Trace 5 Left bar: Non-filtered, Right bar: Filtered Average Hybrid Priority-based Algorithm F (t, f, a) = 1 F1(t) + 2 F2(f) + 3 F3(a) • F1(<current time>) = H0 – <aging parameter> * (<current time> – last access time) • F2(freq) = FA * freq • F3(<active period>) = AA * <active period> > H0 ? H0 : AA * <active period> Algorithm Analysis • Property 1: If F (t, f1, a1) > F (t, f2, a2), then for > t, F (, f1, a1) > F (, f2, a2) Property 1 means if two files are ordered by the hybrid priority and no file is touched since ordering, then the order is always kept. • Property 2: The file list ordered by the hybrid priority need not be updated between any two file access intervals. • Property 3: If one file in the file list is accessed, the order after this file in the file list still holds. Hybrid Priority-based Algorithm Bigger HP 1 Header 2 Smaller HP 3 4 5 6 7 Tail (existing file) (example) 8 9 10 Tail (new file) 1. Records history information, reset when newly inserted. 2. Dynamic just-in-time re-computing 3. Binary insertion, complexity O(log(n)) Hit Ratio for Trace 1 80 Hit Ratio (%) 70 60 50 40 30 OPT HP LRU 20 10 20 30 40 50 Cache Size (number of files) 60 Hit Ratio for Trace 3 70 Hit Ratio (%) 65 60 55 50 45 OPT HP LRU 40 35 20 30 40 Cache Size (number of files) 50 60 Hit Ratio for Trace 4 Hit Ratio (%) 70 60 50 40 OPT HP LRU 30 20 20 30 40 50 Cache Size (number of files) 60 Outline • • • • • • • Introduction Double Middleware-based Architecture Automatic Data Hoarding Data Management & Consistency Control Heterogeneity Support Mobile Network Adaptation Summary & Future Work Data Management • Distributed Management – Naming:URI – Version Control – Timestamp • Data Spectrum • Role Change • Automatic Selection, Publish, Importing, Consistency Maintenance Data Representation in MDSS Mobile Profile Replica Descriptor Data URI URI1 Version URI Data 1 …… Mobile User …… URI URI2 Version Data 2 …… Meta-Data Server (Apache Xindice) Data Server Consistency Model 1. Update Detection 2. Update Propagation MDSS 3. Buffer Notification 5. Refresh Request 6. Delta Data 4. Piggybacked Notification of Data Staleness Outline • • • • • • • Introduction Double Middleware-based Architecture Automatic Data Hoarding Data Management & Consistency Control Heterogeneity Support Mobile Network Adaptation Summary & Future Work Comm. Between M-MEM and FMEM XML-based Protocol XML Command XML-based Protocol Optimized XML Command Mobile Message Adaptor Message Transport Agent Mobile Message Adaptor HTTP/SMTP/POP3 Network • • • Message-based,Asynchronous, Durable, Reliable XML-based File granularity, incremental based Message Transport Agent XML Communication Protocol • File Management – Publish – Import – Retire – Delete – Rename • User Management – – – – • PublicKeyReq Login Logout Profile Consistency Maintenance – – Update Refresh A Publish Example <?xml version="1.0"?> <MobileDataMessage> <OPERATOR> PUBLISH </OPERATOR> <FileInfo> <URI> mfs://mymachine/mypath/myname.dat </URI> <MODIFYTIME> 12:34:56 01/02/2000 </MODIFYTIME> <SIZE> 12345 </SIZE> <ACCESS> <OWNER> somebody </OWNER> <GROUP> somegroup </GROUP> <ACCESSATTRIBUTE>rwxrwxrwx</ACCESSATTRIBUTE> <ACCESS> <VERSION> 1 </VERSION> <FileInfo> <AGENTINFO> <ID>M-MEM 1.0</ID> <HOSTOS> Linux </HOSTOS> </AGENTINFO> </MobileDataMessage> Outline • • • • Introduction Double Middleware-based Architecture Automatic Data Hoarding Data Management & Consistency Control • Heterogeneity Support • Mobile Network Adaptation • Summary & Future Work Mobile Network Adaptation • • • • • Hoarding Filtering Mechanism Working Set Locality Log Optimization Incremental Update/Hoarding Relative Daily Working Set Overlap vs. Previous Day (%) Locality for Trace 3 100 90 80 70 60 50 40 30 20 10 0 0 20 40 60 80 100 120 Day 140 160 180 200 220 Relative Daily Working Set Overlap vs. Previous Day (%) Locality for Trace 4 100 90 80 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 Day 80 90 100 110 120 130 Trembling Phenomenon File Access Sequence: ABXYABUVAB Publish Publish Publish Publish Retire Publish Retire Publish A B X Y X U Y V Publish window size =4 Publish Publish Retire Publish Retire Publish Retire Publish Retire Publish Retire Publish Retire Publish Retire Publish Retire Publish A B A X B Y X A Y B A U B V U A V B Publish window size =2 Publish Publish A B Number of Publish Requests Trembling for Trace 3 7000 6000 5000 4000 3000 2000 1000 0 20 100 200 300 400 500 Publish Window Size (# of files) 600 700 Number of Publish Request Trembling for Trace 4 25000 23000 21000 19000 17000 15000 13000 11000 9000 7000 5000 20 100 200 300 400 500 Publish Window Size (number of files) 600 700 Accumulative Percentage of Interreference Interval (%) Inter Access Interval 100 90 80 70 60 50 40 30 20 10 0 Trace 5 Trace 3 Trace 2 Trace 4 Trace 1 0 60 120 File Access Interval (Seconds) 180 Number of Publish Requests Log Optimization for trace 3 7000 No Delay Delay 3min 6000 5000 4000 3000 2000 1000 0 20 30 40 50 60 70 80 Publish Window Size (number of files) 90 100 Number of Publish Requests Log Optimization for trace 4 25000 Nodelay 20000 Dealy 3min 15000 10000 5000 0 20 30 40 50 60 70 Publish Window Size 80 90 100 Incremental Update/Hoarding Design Method Client FS M-MEM Network F-MEM Versioned File Manager Mobile Host Versioned File Manager 1. Version control 2. Versioned File Archive Server End Experiment Environment FMEM + NistNET D-Link Router M-MEM + NistNET Workload Introduction Redhat 7.1 Distribution RedHat 7.2 Distribution Workload Version Size (Bytes) Number of files Version Size (Bytes) Number of files Apache 1.3.19 8.9M 752 1.3.20 9.3M 773 Bash 2.04 8.2M 731 2.05 8.5M 761 Glade 0.5.9 8.9M 549 0.6.2 10.8M 598 Groff 1.16.1 7.7M 668 1.17.2 7.9M 654 GNU Spell 32.6 6.3M 446 33.7 7.4M 507 Percentage of Network Traffic Relative to Value Shipping Network Traffic 35 30 25 20 15 10 5 0 Apache Bash Glade Groff Spell Avg Apache ReIntegration Time (Seconds) 1500 Value Shipping Incremental 1200 900 600 300 4KB 5KB 6KB 7KB 8KB Bandw idth (1KB=1024 Bytes) 9KB 10KB Bash ReIntegration Time (Seconds) 1500 Value Shipping Incremental 1200 900 600 300 4KB 5KB 6KB 7KB 8KB Bandw idth (1KB=1024 Bytes) 9KB 10KB Glade ReIntegration Time (Seconds) 2400 Value Shipping Incremental 2100 1800 1500 1200 900 600 4KB 5KB 6KB 7KB 8KB Bandw idth (1KB=1024 Bytes) 9KB 10KB Groff 1500 ReIntegration Time (Seconds) Value Shipping Incremental 1200 900 600 300 4KB 5KB 6KB 7KB 8KB Bandw idth (1KB=1024 Bytes) 9KB 10KB GNU Spell 2100 Value Shipping Incremental ReIntegration Time (Seconds) 1800 1500 1200 900 600 4KB 5KB 6KB 7KB 8KB Bandw idth (1KB=1024 Bytes) 9KB 10KB Outline • • • • • • • Introduction Double Middleware-based Architecture Automatic Data Hoarding Data Management & Consistency Heterogeneity Support Mobile Network Adaptation Summary Summary • A New Architecture & Implementation – – – – • • • • • • Linux & Windows Xindice Native XML DB Libxml XML Parser C/C++/STL, ~20K Lines Filtering Mechanism & a New Hybrid Priority-based Algorithm XML-based Protocol Asynchronous Consistency Model Network Log Optimization Incremental-based Weak Connection Adaptation Simulation-based & Live Experiment Validation Questions ?