EMC Data De-Duplication & Virtualization Backup High Level

Download Report

Transcript EMC Data De-Duplication & Virtualization Backup High Level

EMC Next Generation Backup &
Data De-Duplication
High Level Overview and Strategy
Joe Staiber
EMC Corporation
Data De-Duplication Product Manager
Backup, Recovery and Archive Division
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
1
Typical Issus with Traditional Backup
 Long Backup Windows
 Backup Servers
 Affecting Production
 Server Cost / Licensing
 Tape Cost
 VM Guest Proliferation
 License Cost
 Off-site storage
 Cost to use Disk Technology
 Iron Mountain / Transport
 Client Licensing
 Tape Rotation and Changes
 VMWare Resources
 Restore times
 VCB Infrastructure
 Restore complexity
 Tape Drive Failure
 Multiple Solutions
 Tape Read/Write Errors
 Remote office backup
 Tape Drive Maintenance
 DR / Business Continuity
 Intraday Restore needs
 GROWTH / TIME
 Retention
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
2
What is Data De-Duplication? – An Analogy
 How many times does the word “THE” appear in a sentence, a chapter,
an entire book, a library?
 Data is not unlike words in print, only instead of words, data uses strings
of 1’s and 0’s.
 A book may contain 4 million words in it, but only 200,000 different
words, 3.8 million words are repeats. Some of them, hundreds or
thousands of times.
 The Amount of de-duplication possible in your
data center is in line with these numbers…. Its staggering
“Would you rather copy 200,000 or 4 million words every day?”
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
3
How it Works
Simple Example of Global, Source Data De-duplication
Data
Center
 First Instance
 Duplicate Instance
Remote
Site 1
 Modified Instance
Remote
Site 2
A
Only unique
data segments
are backed up
B
C
D
Data already backed up,
so only unique IDs stored
(20 byte pointers)
A
B
C
D
E
New data segment
identified and backed up
E
De-duplication Server
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
(stored backup data)
4
Where can De-Duplication Occur?
 IT’S NOT JUST IN BACKUP!!!!!!
 De-Dup is theoretically possible ANYWHERE
 But it comes with a price…. Processing, latency, bandwidth,
and most importantly TIME
Who does the actual processing?
 Storage Array?
 Software?
 Backup Server?
 Tape Device / VTL?
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
Backup
Server
De-duplication
Device
5
De-Duplication Concepts: Prominent Use
Cases
Where is De-Duplication being applied today?
 Backup – address significant inefficiency & cost due to redundant data
–
–
Integrated end-to-end backup software stack
B2D H/W Target component for incumbent backup environments
 Archive Applications and Platforms – efficient retention over time
–
–
Low cost, “acceptable performance” secondary storage for mid-term retention, where regulatory
compliance is not required
As an efficiency feature in compliant archive (e.g. Centera)
 Primary Storage - “Capacity Optimized” ILM tier
–
–
Block and file for tier 2 applications
Different performance and cost characteristics
 Replication – Save bandwidth & time by moving less data
–
–
Inherent in most storage use case solutions
Also found in WAAS/WAFS solutions
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
6
De-Duplication in PRIMARY Storage will
Change the GAME !!!!
Technologies like Flash drives and NAS subfile de-dup are HERE.
Celerra
CLARiiON
EMC
Centera
NSX
NS80
Invista
Connectrix
NS40
NS20
CX3 UltraScale Series
Symmetrix
DL4x00
EMC
Disk Library
AX150
NS40G NS80G
Fiber Channel
and iSCSI
DL6000
DMX-4 950
EMC Centera
Gen 4 LPNode
DL210
New
DMX-4 and DMX-3
Flash
Drives
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
7
Different Vendors De-Dup in Different Places
Lets look at the Vendors who play
in this equation
 What happens when the backup
application does the de-dup?
(such as Commvault)
EMC, HP
NetApp, IBM
Etc etc
Primary
Storage
SAN/NAS
– Do we need DD or Exagrid to do it
again? No we don't
 What happens when the primary
SAN does it? (NetApp & EMC
Celerra)
Backup
Application Data De-dupe
Symantec
Commvault
Etc etc
– Do we need Commvault or DD to do
it again? No we don’t
And if they did, they would have to
“un-dedup (rehydrate) the data to
even be able to read it!!!
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
Data Domain
Target
Device
Data De-dupe Exagrid, Quantum
Etc etc
SOFTWARE
TARGET BASED
BASEDDE-DUP
DE-DUP
Commvault
Data Domain
/ PureDisk
/ Exagrid 8
BUYER BEWARE!!!!!
Primary
Storage
SAN/NAS
Data De-dupe
Backup
Application Data De-dupe
 EMC IS THE ONLY COMPANY THAT
MANUFACTURES PRODUCTS IN
EVERY SECTION OF THE DEDUPLICATION MARKET
 EMC is ready and capable in leveraging deduplication across the spectrum
 What happens to vendors like Data Domain
and Commvault, when the data is already
de-duplicated???
Target
Device
Data De-dupe
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
 Other vendors see De-Dup as a product,
not a technology…
9
What is Most Impactful to You TODAY?
Backup is still the best and most efficient application for De-Dup today
 It is proven and available
 It is out of the production window
 There are several ways to De-Duplicate data in a backup environment
 But first, lets define the backup challenge we all are facing…
TARGET DE-DUPLICATION
B
B
Backup
Server
B
B
B
SOURCE DE-DUPLICATION
B
De-Dup
Device
De-duplication
Device
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
10
Backup De-duplication – Media Impact
Traditional Backup v. EMC Avamar
Traditional Backup vs. EMC Avamar
Total Cumulative Storage - 10TB MS Office file data
Cumulative
Media
(Traditional
= weekly fulls + daily
incrementals;Required
EMC Avamar = daily fulls)
Traditional Backup
200
192
180
8 weeks
4 weeks
160
144
140
Cumulative TB
Traditional Backup
w/Compression (2:1)
176
160
128
120
EMC Avamar
112
100
96
96
88
80
80
80
72
64
60
64
56
48
40
20
32
16
8
0 3.2
1
16
48
40
32
24
3.4
3.6
3.8
4.0
4.2
2
3
4
5
6
4.4
4.7
4.9
5.1
5.3
5.5
7
8
9
10
11
12
Weeks
Traditional Backup
Traditional Backup with Compression (2:1)
EMC Avamar
Avamar makes backup to disk more economical
Traditional Backup vs. EMC Avamar
Total Cumulative Storage - 10TB MS Office file data
(Traditional = weekly fulls + daily incrementals; EMC Avamar = daily fulls)
200
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
11
The Avalanche
Would you rather stop the avalanche here?
The Goal is to
De-Duplicate as close
to the SOURCE as
possible
Or here?
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
12
The Power of Avamar and De-Duplication
What has Avamar resulted in for Customers:
 70 Hour backup down to 4 Hours
 400 servers backed up in 5 hours
over T1 or less bandwidth
 Eliminated Tape
 Eliminated 40 backup servers
 Improved Restore times
 Centralized all Backup Operations
 300GB of backup stored in 10GB
 99.8% de-dup rate in Windows
 99% de-dup rate in SQL
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
 10x Faster
Backups
 500:1 reduction
in network
bandwidth
 50:1 reduction
in backup
infrastructure
 Elimination of
off-site tape
storage
13
13
BC & Disaster Recovery:
 Primary Data Storage:
 Daily Cumulative:
50TB
8TB
 Weekly Cumulatives:
48TB
 Weekly Full Backups:
50TB
95+% Less

Primary Data Storage:

Axion Daily Snapups:

Axion Weekly Snapups:
3.5TB

Weekly Full Backups:
N/A
98TB


50TB
.5TB
3.5TB
70 hour staged full backup window reduced to 4 hours
Cost-effective replication to two sites
“Avamar has a game changing solution. Through their innovative technology,
we have been able to rethink our backup, recovery and replication
infrastructure, providing Morgan Stanley with better local and remote recovery
at a greatly reduced TCO.”
—Guy Chiarello, CTO/CIO, Morgan Stanley
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
14
Expected Results for Manufacturing Co.
Backup Exec
Current Full Backups 5 TB
Backup Window 28 hrs
Media Used (1 year) = 106 TB



Current Full Backups 5 TB
Backup Window 1.2 Hours
Media Used (1 year) = 6 TB
28 hour staged full backup window reduced to 1.2 Hours
Cost savings estimated at $23,851 for 3 Years
New Functionality, Centralized, Faster Backups, Streamlined
Avamar starts at 17k and goes up from there based on
Capacity and Retention periods
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
15
Avamar Customers (Notable)

















Verizon Wireless
Home Depot
The Limited
Ann Taylor
GE
Cardinal Health
Nationwide
Pepsi
AT&T
VMWare
Nexon
Travelers
Corporate Express
Wellesley College
Sterilite
CRI Technologies
Danvers Bank
 CISCO
 Churchill Downs
 Arizona Dept of Education
 New Albany
 PPG
 Bank of New York
 Medco
 Dell
 Kelley Drye & Warren
 City of Kirkland
 Brooks Automation
 Univ of CA
 Chrysler
 Kiewit
 Morriston Forester
 Komatsu
 Lexis Nexis
 Iowa Dept of Transportation
 Kroger
 Duoline
 Baker & McKenzie
 Citizens Bank
 Rob Roy
 Nodaway Bank
 Reckitt Benckiser
 Plymouth Rock
 Steamship Authority
 Chadwick Martin
 La Quinta
 Auto Owners
 21st Century
 Mile High Banks
 Oklahoma Turnpike
 Farallon
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
16
Eastern Regional Avamar Installed Customers
(Commercial)





















Montgomery County Public Schools
Howard County Public Schools
Country Meadow Associates
Arraya Solutions
IPR
Evolve IP
HydroGeneLogic
American Healthways
NetTelCos
Expedient
ADLCM
Kirklands Retail
Restaurant Services Inc
SEA Medical Center
DCH
Informed Medical
Orange Lake Resorts
Welbro Construction
FCCJ
Seminole Community College
Avocent
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber





















Debartolo Properties
First Bank
GPX
Leesburg Regional Hospital
Northside Hospital
Lithonia Lighting
Manatee County
Palm Beach County
Parker Hudson Rainer & Dobbs LLC
Miles & Stockbridge
Reynolds Smith and Hills
Sarasota County Clerk
Satilla Regional Medical
Southern Bone & Joint
Success For All
CGI
Mecklenburg County
Barlowworld
ABNB Federal Credit Union
Wunderlich
Microstrategy
17
Where is Avamar MOST common
 Avamar is used in nearly every industry
 Every type of infrastructure
 Across most platforms
Its biggest Value comes in areas where backup time / bandwidth are
limited:
 Remote Offices / Branch Offices
 Data Centers / Enterprise Backup Management
 VMWare & File Sytems
 NAS
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
18
Remote Office Backup Via WAN
Without Avamar Clients
With Avamar Clients
Data De-dupe
Central
Data Center
Data De-dupe
WAN
De-dupe
Server
WAN
Challenges




WAN blockage
Poor reliability
Decentralized
Untrained IT staff
Data De-dupe
Advantages




Automated
Encrypted
Centralized
Outstanding ROI
 Target approach requires hardware at every site
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
19
Real Example from Avamar
MD Public School System (WAN)
 WR777-DATA1 (T1)
 Day 1
 Day 2
 Day 3
 2-11-08
 2-13-08
 WR777-APPS1 (T1)
 Day 1
 Day 2
 Day 3
 2-11-08
 2-13-08
 OL502-DATA1 (T1)
 Day 1
 Day 2
 Day 3
 2-11-08
 2-13-08
 OL502-APPS1 (T1)
 Day 1
 Day 2
 Day 3
 2-11-08
 2-13-08
23.5 GB
23.5 GB
23.6 GB
23.6 GB
23.7 GB
10.59 GB
70 MB
118 MB
70 MB
47 MB
54.9%
99.7%
99.5%
99.7%
99.8%
18h:51m
53m
55m
1h:4m
50m
24.8 GB
24.8 GB
24.8 GB
24.8 GB
24.8 GB
8.28 GB
12 MB
24 MB
24 MB
12 MB
66.6%
99.9%
99.9%
99.9%
99.9%
15h:7m
7m
7m
7m
6m
20.8 GB
20.7 GB
20.7 GB
20.8 GB
20.8 GB
6.32 GB
62 MB
82 MB
104 MB
62 MB
69.6%
99.7%
99.4%
99.5%
99.7%
21h:46m
52m
52m
55m
55m
20.7 GB
20.7 GB
20.7 GB
20.7 GB
20.7 GB
5.23 GB
20 MB
20 MB
41 MB
20 MB
74.7%
99.9%
99.9%
99.8%
99.9%
15h:45m
7m
7m
7m
6m
Virtualization Creates New Backup
Challenges
OLD PARADIGM
NEW PARADIGM
Low overall utilization and plenty of
bandwidth for backup
High overall server utilization, but low
bandwidth for backup
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
21
Backup Built for VMware Infrastructure
Avamar Efficiently Protects Virtual Machines
Traditional moves ~200% weekly
 Up to 95% reduction in data moved
 Up to 90% reduction in backup times
 Up to 50% reduction in disk impact
 Up to 95% reduction in NIC usage
 Up to 80% reduction in CPU usage
 Up to 50% reduction in memory usage
 All backups stored as “virtual full
backups,” ready for immediate restore
Avamar moves ~2% weekly
 Maintain effective consolidation ratios
without over-taxing CPU utilization
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
22
EMC Avamar Solutions for VMware
Infrastructure
Flexible, Fast, Efficient and Reliable Backup and Recovery
AVAMAR CLIENT BACKUP
SOLUTIONS
Guest
VCB
Service Console
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
AVAMAR SERVER BACKUP
SOLUTIONS
Avamar Software
Avamar Virtual
Edition
Avamar Data Store
23
Lightweight Agents / Reduced CPU Utilization
Total CPU Utilization by Event (Time Elapsed)
Full
Avamar:
Efficient Full
Backups
Avamar: Efficient Full Backups
Incrementals
Traditional Incremental + Full Backups
• Avamar reduces backup times by up to 90% weekly
• CPU utilization slightly higher during backup operation (~15%)
• Reduced time = weekly CPU utilization reduced by up to 85%
• Avamar backups set in “nice mode” or low priority: minimizes CPU
contention
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
24
EMC Avamar Data Store Gen 2
SUSTAINABLE GRID (RAIN) TECHNOLOGY
 Avamar Data Store
– Multi-node configuration starts at 4 TB and scales to
support up to 32 TB licensable de-duplicated capacity
– Equivalent of up to 1.1 PB of cumulative traditional disk
or tape backup storage*
– Backup media requirement reduced 20–40 times
– High availability and reliability with RAIN architecture,
RAID, daily integrity checks, and redundant power
 Avamar Data Store, Single Node
– Supports 1 TB and2 TB licensable de-duplicated
storage capacity configurations
– Equivalent of up to 70 TB of cumulative traditional disk
or tape backup storage*
– Designed for easy deployment at remote offices
– Offers fast, local recovery without dependence on a
WAN connection
*Note: Equivalent traditional backup capacity assumptions: 100 percent MS Office file data, weekly full and daily
incremental backups, no compression, 10 percent daily change rate, 90-day retention
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
25
Avamar’s Major Competitive Differentiations
Who is REALLY less expensive?
Symantec
PureDisk
Data Domain
Exagrid
Commvault
AVAMAR
Software Only
Hardware Only
Hardware Only
Software Only
HW & SW
Purchase Hardware Purchase Software
Purchase Software Purchase Hardware
Additional $$
Additional $$
Additional $$
Additional $$
Must have
NetBackup and
license Agents
Must License each
Agent
Must License each
Agent
Must License each
Agent
Additional $$
Additional $$
Additional $$
All Components
Included
All Agents included
Additional $$
Not Pure Source De- De-Dup Starts over De-Dup Starts over Not Pure Source De- Grid Architecture
Dup – occurs at
with each additional with each additional
Dup – occurs at allows for true Global
Media Server
box –Target Only
box – Target Only
Media Server
De-Dup – scalable
by adding nodes
Requires HW and
Requires HW and
Requires HW and
Requires HW and
No Hardware or
SW at Remote
SW at Remote
SW at Remote
SW at Remote
Software required to
offices
offices
offices
offices
backup remote
offices
Additional $$
Additional $$
Additional $$
Additional $$
Does NOT
Does NOT
Does NOT
Does NOT
SIGNIFICANTLY
significantly improve significantly improve significantly improve significantly improve improves Backup
Backup times
Backup times
Backup times
Backup times
times
Requires separate Requires separate Requires separate Requires separate
No additional
backup servers
backup servers
backup servers
backup servers
servers required
Additional $$
Additional $$
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
Additional $$
Additional $$
26
Why an Integrated Solution?
 HARDWARE ONLY SOLUTIONS: (Data Domain / Exagrid)
– As software is now performing the De-Duplication, the hardware de-duplication is NO LONGER
required (This is now the case with Symantec, Commvault v8, and Avamar)
 SOFTWARE ONLY SOLUTIONS: (Symantec, Commvault, etc)
– As primary storage arrays begin to utilize Data De-Duplication technologies, the Backup Software
is not aware and its value diminishes if the data is already in a De-Duplicated state. Re-Hydration
would be required. (This is already the case with NetApp and Celerra NAS based De-Dup and
there are more to come)
 EMC is the ONLY vendor in the De-Duplication space that manufactures Primary
Storage, Backup Software and Backup Hardware.
– Regardless of where the de-duplication occurs, EMC is ready and capable to leverage and
optimize it.
 EMC|Avamar is the only vendor to utilize variable length segments when deduplicating data. It will ALWAYS store less, send less and backup faster!
“What vendor do you want to make a strategic investment in?”
Ask these vendors what their strategy is, as data is already de-duplicated before it gets to
their product…
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
27
The Economics of Backup & Recovery
Avamar Example
 $90,000 Investment
 3 years all inclusive (HW, SW, Maint)
Traditional Backup Solution
 $35,000 Investment
 1 year included HW and Maint
 No recurring tape spend
 No software, use existing. 9k per year
in maint
 All client software/agents incl.
 $3500 per year for new clients
 Offisite replication included





 $9000 for VCB SW (vRanger) + a
server
All data retained on disk and all media
included for the 3 years of retention
 $2700 per year for additional media
20% growth rate of data was factored  11,500 per year in offsite costs
into the system
 No data growth factored in
Backup window reduced by 300%
 New media / upgrade required year 2
Restore times improved
at 18k. New server too?
Time to first byte of restore within
 HW maint of 12k years 2 and 3
minutes
 No significant backup improvements
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
28
Avamar SOLVES issues
 Long Backup Windows
 Backup Servers
 Affecting Production
 Server Cost / Licensing
 Tape Cost
 VM Guest Proliferation
 License Cost
 Off-site storage
 Cost to use Disk Technology
 Iron Mountain / Transport
 Client Licensing
 Tape Rotation and Changes
 VMWare Resources
 Restore times
 VCB Infrastructure
 Restore complexity
 Tape Drive Failure
 Multiple Solutions
 Tape Read/Write Errors
 Remote office backup
 Tape Drive Maintenance
 DR / Business Continuity
 Intraday Restore needs
 GROWTH / TIME
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
29
Intuitive, Policy-Based Management Console
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
30
In Summary
 File System and VMWare benefits of Source De-Dup alone,
justify the investment
 You can start SMALL with Avamar (single use) and grow it
easily into a full integrated enterprise solution
 Source Based De-Duplication makes the Difference, beware
of the values of a Target Based De-Dup
 Competitive Solutions around De-Dup have value, but
understand the differences. They are Band-Aid’s not long
term solutions
 EMC has the ONLY broad based De-Dup strategy that will
grow and continue to add value as De-Dup stretches into
new areas
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
31
Next Steps
 Live Demo’s provided every FRIDAY at 11am EST
– Performed by an Avamar Engineer
– Live via Web
– Ask questions, see the product in action
 Solution Sizing
–
–
–
–
How much data is transferred in a full backup today
% of data is FS/Exchange/DB/Images/VMWare
Retention periods on disk
Replication?
 Avamar Virtual Demo
 Configuration / Pricing / Cost Justifications
 Commonality Analysis
 Proof of Concept / Evaluations
© Copyright 2009 EMC Corporation. All rights reserved. Joe Staiber
32
where YOUR information should live
where PRIMARY information lives
where TIERED information lives
where VIRTUAL information lives
where BACKUP information lives
where REPLICATED information lives
where DE-DUPLICATED data lives
where ARCHIVED data lives