Transcript Hadoop - interface:systems
BigData
Vom Experiment zur Produktion Mario Vosschmidt Consulting Systems Engineer
1
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Agenda
BigData oder SmartData?
1)
Was ist „BigData“
2)
Anforderungen und Herausforderungen
3)
Auf welche Szenarien konzentrieren wir uns?
4)
Wie sehen Lösungsansätze aus?
5)
Wie implementiere ich diese Lösungen?
6)
Zusammenfassung
2
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
The Big Data Landscape
3
BigData The 3V Paradigm
Variety
Multiple data sources Multiple data formats
Velocity
High speed processing Fast changing requirements
Volume
Huge amounts of data Process and persist
4
NetApp Confidential - Internal Use Only
Entering a New Era of Scale
5
6
Big Data Solution Portfolio
A B C s of Big Data at Netapp Insight from extremely large datasets
Big Data
Secure boundless data storage Performance for data intensive workloads
Not Even to The “Peak”
VISIBILITY
Peak of Inflated Expectations Plateau of Productivity Slope of Enlightenment Trough of Disillusionment Technology Trigger
TIME 35 Zettabytes
Estimated size of the digital universe in 2020
30 Billion
Pieces of new content to Facebook per month
5 Billion
Smart phones
80%
Unstructured data
7
Big Data Vendor Landscape
A Lot of Hype and Buzz – Everyone is Jumping In
400 350 300 250 200 150 100 50 0 Jan-08 Funding for Hadoop and NoSQL Cloudera series B MapR series A Cloudera series C
451 Research
Cloudera series D 10gen series D MapR series B DataStax series B Neo Technology series A Opera Solutions series A Platfora series A Couchbase series C Nov-11
Market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015 Most firms are taking a pragmatic approach Big data is in the very early stages of maturity Best practices are not mature IDC Big Data Survey
8
NetApp Confidential - Internal Use Only "The Big Data market is expanding rapidly … For technology buyers, opportunities exist to use Big Data technology to improve operational efficiency and to drive innovation. Use cases are already present across industries and geographic regions." Dan Vesset, Vice President, IDC 8
Data Growth Impact on Business
“Big Data” refers to datasets whose size is beyond the ability of typical tools to capture, store, manage and analyze Information Becomes a Propellant to Business Speed Inflection Point Data Becomes a Burden to IT Infrastructure Complexity Volume 2020
9
2010
The Big Data Opportunities
Financial Services
Fraud detection & prevention Anti-money laundering Risk management
Government
Law enforcement Counter-terrorism Research and Education
10
Manufacturing
Supply chain optimization Defect tracking Root cause analysis RFID correlation
Healthcare
Drug development Patient Records Evidence-based medicine
Why Should You Care?
It’s the Value of Your Data
Top line revenue
– Leverage their data assets into business advantage 5 Billion Records Anywhere, Anytime Faster time to market 50% Increase in Revenue
Bottom Line savings
– Lower the cost of compliance – Manage ever growing data efficiently
11
NetApp Confidential - Internal Use Only Over 1PB of data Growth of 175% YOY 90 days of data within 24 hours of a failure
NetApp Big Data
13
Why NetApp?
Practical solutions that solve today’s problems
Get Control Break Through Gain Insight
NetApp helps you turn your exploding data from threat to opportunity. Manage your data effectively and affordably.
Break through the limits. With NetApp, you can take on even the most massive and complex data projects. Turn insight to action. NetApp helps you get to clarity and insight faster and more reliably.
14
Experience Managing Data at Scale
NetApp’s Largest Customer
100 PB
4 Customers
50 PB 10 Customers 20 PB 50 Customers 10 PB 100 Customers
NetApp Big Data Strategy Open Best-of-Breed Choice
Best of breed storage for Big Data Applications Built on open standards with best in-class partnerships Validated with ecosystem leaders Complete server, network and storage “Racks” Delivered via trusted high-value partners
15
NetApp Confidential - Internal Use Only
15
16
Analytics
Smart Data
Big Analytics Strategy
Smart Data
DSS
/
DW (traditional analytics)
Solutions partners include IBM, Oracle, Microsoft, ParAccel, Exasol and SAND
Big Analytics
Enterprise class Hadoop-based solutions MapR, Hortonworks, Cloudera
Leverage partners to complete Big Analytics stack
Solutions for validated server, network and storage
1 7
18
Big Analytics Solutions
Data Warehouse
Fast, space-efficient backup and recovery with storage utilization up to 90%. Less raw capacity with modular scalability
Mixed Use Database, Cubes
Optimized for IBM, Oracle and Microsoft. Simplified data management and protection. Zero down time
Hadoop
Enterprise class Hadoop with Lower total cost of ownership and based on open standards
The Value Proposition:
Some problems require and Enterprise Class Hadoop Solution Enterprise Class Hadoop
Packaged ready-to-deploy modular Compute / Memory intensive Hadoop cluster Compute intensive applications Tic Data Analysis Extremely tight Service Level expectations Severe financial consequences if the analytic run is late
Enterprise Class Hadoop
Packaged ready-to-deploy modular Hadoop cluster The Data has intrinsic value $$$ Usable capacity must expand faster than compute Higher storage performance Real human consequences if the system fails (Threats, treatments, financial losses) System has to allow for asymmetric growth
White Box Hadoop
Values associated with early adopters of Hadoop Social Media Space Contributors to Apache Strong bias to JBOD Skeptical of ALL vendors
Enterprise Class Hadoop
Bounded Compute algorithm / Memory intensive Hadoop cluster Compute intensive applications Additional CPUs do not improve run time Extremely tight Service Level expectations Severe financial consequences if the analytic run is late Need for deeper storage per datanode Storage Capacity
19
NetApp Confidential - Internal Use Only
Challenges with Hadoop in Enterprise
Availability NameNode is a single point of failure Slow recovery from disk drive failure Expensive process to replace failed disks online Most common Hadoop support issue is disk drive failure Operations Requires three copies and more storage of data, larger footprint, Limited flexibility; storage and servers tied together affects scalability Low cluster efficiency, higher network congestion Implementation Need to keep up with fast-paced patches, projects of open source platform Need to decide on distribution of Hadoop Skills are not common Integration with existing IT infrastructure can be difficult Tuning expertise needed to make Hadoop perform optimally
20
Cisco and NetApp Confidential. For Internal Use Only. Do Not Distribute.
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
20
Why Big Data and Analytics as a service is important!
21
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
FlexPod Converged Infrastructure Family
FlexPod ® Express MSB/Branch Office For smaller, less-dynamic requirements and VAR velocity App App App FlexPod Data Center Enterprise/Service Provider Massively scalable shared virtual data center infrastructure FlexPod Select Dedicated Big data analytics, scientific, HPC App App App App App App App Compute Pool Network Pool Compute Pool Network Pool Compute Nodes Network / Direct Storage Pool Cisco UCS C-Series Nexus ® 3K FAS2xx0, Two fixed pod sizes Cisco UCS Director, VMware ® , and Microsoft ® Storage Pool Cisco UCS C-Series/B-Series, Nexus ® 5k FAS Storage Flexible pod sizes FlexPod validated management and ecosystem Storage Cisco UCS C-Series Nexus, Catalyst ® , MDS E-Series, FAS Reference architecture and/or designs Application-based management
Netapp Reference Architecture
23
NetApp Confidential - Internal Use Only
Example: FlexPod Select with Cloudera
Cisco UCS ® C-Series Rack Mount Servers Cisco UCS Fabric Interconnect Cisco UCS Manager
Converged big data platform from NetApp and Cisco for Hadoop
Enterprise-class Hadoop:
Innovative storage, servers, networking validated with leading Hadoop distributions
Faster time to value
: Prevalidated configuration accelerates deployment
High availability
: Less downtime, higher serviceability to meet tight SLAs around data applications and processes
Flexible scaling:
Independently scale servers and storage; modular design for scaling as data needs grow NetApp ® FAS Storage Systems NetApp E-Series Storage Array * NetApp 50% Storage Guarantee http://www.netapp.com/us/solutions/infrastructure/virtualization/guarantee.html
24
FlexPod Select with Hadoop
NetApp and Cisco deliver enterprise class Hadoop for high availability, performance, scalability Cloudera or Hortonworks Distribution of Hadoop Master
… …
Expansion Architected for the enterprise Superior NameNode protection Faster recovery from failover Lower cluster downtime Faster time to value Validated, presized configurations Low-latency, high-bandwidth networking 12 DataNodes in master, 16 in expansion Coexistence with current applications and infrastructure Supports existing applications from SAP, Microsoft, Oracle Data management and monitoring with Cloudera Manager, Cisco UCS ® Manager
26
27
Service-Level Expectations Around Data
High-Value Time-Sensitive Problems
Accelerate time to insights Fast deployment with validated, preconfigured, reference designs Store, process, analyze all data for new opportunities and business impact More time to focus on data analysis rather than deal with cluster downtime Making the Hadoop experience better Optimized, tuned, fully configured cluster Hadoop integrated with storage, compute, networking Monitoring and management tools with SANtricity® and from partners (Cloudera Manager, Cisco UCS® Manager) High density and capacity reduce data center footprint Reduce risk in an open ecosystem Compatibility with existing infrastructure and applications Best-in-class partnerships, not entire stack from one vendor Future-proof against lock-in and benefit from evolving ecosystem FlexPod Select for Hadoop with Cloudera
28
Ease of Setup and Deployment
Preconfigured – Pre-Vaildated
Use Case Example: NetApp Auto Support
Phone home data representing information about the status NetApp storage controllers Correlate disk latency (hot) with disk type 24 billion records 4 weeks to run query Hadoop implementation 10.5 hours Bug detection through pattern matching 240 billion records – Too large to run Hadoop implementation 18 hours
30
Wireless Service Provider
Archiving & Indexing Tools
NetApp Hadoop Solution
DN DN DN DN DN DN DN DN Hadoop Distributed File System (HDFS)
32
Agent Servers AS AS AS Remote Site Collector Servers CS CS CS Central Site Agent Servers AS AS AS Remote Site The solution consists of an eight node Hadoop cluster at the core site. All the data from the remote sites are transported over WAN into the central site. The data gets collected, ingested, compressed and archived into the Hadoop cluster via HDFS. The data is then categorized, put into separate containers, and indexed based on its record keeping tags.
Telco Industry Provides wireless voice and data services globally
32
Analytics & Enterprise Apps Environment
OLAP OLTP
Mobile Devices Location/GPS Logs Sensors Applications
ETL
Other Data Source s Reporting/Dashboard/Visualization Applications Analytics Data Management Storage File Systems
ETL
Content Shared Storage Infrastructure Storage Data Manageme nt
OLAP
(All other storage, i.e. internal DAS)
33
34
Bandwidth
Big Bandwidth Solutions
Full Motion Video Video Storage for Surveillance
Scalable density and performance to ingest and simultaneously analyze UAV and satellite video data High bandwidth & density supporting hundreds or thousands of HD cameras
Media Content Management
High ingest & play-out rates with support for media and entertainment workflows
HPC: Lustre, GPFS, BeeGfs
Massively parallel distributed file system for large scale cluster computing and O&G Seismic Processing
Big Bandwidth Solutions
Applications Storage File Systems Density Reliability Modularity
E-Series Storage
Performance Efficiency Flexibility
Full-Motion Video Storage Solution
• •
High bandwidth HD Video Ingest
Satellite UAV
Full-Motion Video Built on E-Stack E5460 Stack Quantum® StorNext File System Massively Scalable Single Data Container Multi-Stream
• • •
Video Playout
Processing Exploitation Analyst Viewing
Turnkey solution in a 40U industry-standard rack
Single architecture for ingest, exploitation and dissemination 1.8PB Raw Capacity – 4000+ hours of uncompressed 720p HD video >20 GB/s R/W Performance, >30 GB/s Peak Performance Scale to multiple Petabytes in a single data container
HPC: Lustre
Performance to meet the needs of the world’s fastest Supercomputers High Bandwidth & Density – 1.8PB & 30GB/s per 40U rack Highly available – No Single points of failure – Extensive RAS features NetApp provided 7x24 Lustre Support NetApp Professional Services
38
NetApp Confidential – Limited Use
Lawrence Livermore National Lab
Sequoia
– announced as the fastest supercomputer and storage combination on the planet at ISC 2012 Supercomputer storage to support twenty thousand trillion arithmetic operations per second with access speeds up to 1 TB/sec 55PB of usable storage Simulations for nuclear weapons viability Counter Terrorism Energy Security Understanding Climate Change Press Release: http://www.netapp.com/us/company/news/news-rel-20110928-990734.html
NetApp Confidential – Limited Use
39
Video Surveillance Storage
Enhance public safety with better physical security Industry trends are exploding storage Analog to Digital SD to HD 7 days to 30+ Days Open Platform Solution Best of breed industry partners Flexible deployments Modular scalability 99.999% up time
40
Unique Out-of-Band Recording
No servers
required between cameras and storage save HW/SW, licensing, footprint, very robust, save a lot of network cabling, easy to scale.
41
NetApp Confidential - Internal Use Only
Media Content Management
Industry-leading bandwidth per rack to reduce bottlenecks Highly scalable digital repository Consolidates collaborative production Multi-format distribution workflows Highest capacity density to minimize power and cooling Single namespace for multi-petabyte repositories Unmatched breadth of production client support
42
NetApp Confidential – Limited Use
Content Management
44
NetApp Confidential – Limited Use
Big Content Solutions
File Services
Multi-application workloads Non-disruptive operation Integrated data protection, efficiency
Enterprise Content Repository
Infinite container Fixed content Non-disruptive operation Integrated data protection, efficiency
Distributed Content Repository
Large, multi-site repository Policy based data management Metadata-enabled object storage 45 NetApp Confidential – Limited Use
File Services
ONTAP Cluster Mode
46
Heterogeneous cluster: A mix of controller types in a single cluster per workload needs Entry, mid, and high-end platforms Native and third-party storage (FAS and V-Series) Multiprotocol: NFS, pNFS, CIFS, iSCSI, FCP Integrated Data Protection Virtual storage tier: Match data to disk price and performance Manage multiple tiers in the same namespace or many NetApp Confidential – Limited Use
Enterprise Content Repositories
ONTAP Cluster Mode with Infinite Volume Single large content repository Scales to PBs and billions of files across cluster Native storage efficiency Simplified operations Multi-tenancy Simplifies application workflows Load balances data at ingest Starts small, grow granularly High availability Protects against disk and hardware failures Snapshots & Replication for quick recovery Manage & Upgrade non-disruptively
47
Content Repository
Object Storage Insights Flat Namespace No filesystem hierarchy Metadata separated Not within data space Metadata serve as descriptors Can change over time However Data is persistent Objects referenced by ID Index Write once read many Similar to library Objects do not change Single writer multiple readers
48
NetApp Confidential - Internal Use Only Less data management overhead High Metadata rates Less space management Data are replicated across Geos Simplified rights management
Distributed Content Repositories
StorageGRID Large content repository for big, unstructured data Billions of data sets, dozens of petabytes Create, manage and consume content globally Predictable access to data independent of location Policy-controlled data stores at each site Intelligent data classification and access Metadata-based management
49
StorageGRID Functional Diagram
NAS I/O Object Ingest and Retrieval NAS Protocols (SG 9) HTTP API / CDMI Metadata Tagging and Query Global Object Namespace Object-Level Data Management Location-Transparent Distributed Object Store Policy-Driven Data Placement Storage Systems
“We’ve increased the number of retail partners we work with from 2,000 to almost 20,000 in just a few years. In the past 6 years, we’ve seen a 1,900% increase in transactions. This plus the massive increase in digital images uploaded by consumers demanded a more robust and highly scalable storage infrastructure .” –
Zach Wickes, Vice President of Technology, PNI
51
Media Content Repository PNI Digital Media
High-performance, scalable storage infrastructure built to support 17 million revenue-generating transactions annually NetApp Confidential – Limited Use 100% uptime even during peak holiday access when transaction increase 6 to 10 times 3PB of rich media data Consumer access to 950 million digital images 20,000 worldwide retail locations, online fulfillment partners and in-store kiosks Wal Mart Canada, Costco, Sam’s Club, Tesco, CVS/pharmacy, and Kodak NetApp FAS6280 and FAS3200, Data ONTAP, and FlashCache
52
Health in the Cloud
STaaS offering for healthcare providers Medical Image Archive Cloud Two sites with ~1PB each 2TB+ local cache at each edge site 8x growth in capacity last 12 months 100% uptime since start of service “Forever” retention policies ~60% of customers use hybrid cloud model Solution offers a proven 100% up-time with automated data movement from on-premise to off premise public clouds with “keep forever” retention policy and indefinite growth Press Release: http://www.netapp.com/us/company/news/news-rel-20111128-36413.html
Integrated Big Data Solutions and Expertise
Planning and implementation expertise for Big Data Turn-key solution stacks and Big Data services Big Data System Integrators Solutions Built on NetApp ®
53
NetApp Confidential – Limited Use
Reference Material
54
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
55
Flexpod Select
Common Architecture Software Solution Solution Rack + Appliance Application Packaging Visualization Analytics Integration Management Efficiency Validated Architecture & SKUs Infrastructure Integration & Distribution Operational Integration & System Integrators
56
Big Data Summary
Enable enterprise customers to gain business advantage Practical solutions proven to reduce complexity, increase efficiency and lower cost of ownership Open standards based with best in-class partnerships For more information : http://www.netapp.com/us/company/leadership/big-data/
57
Next Steps - Team with the Experts
Strategic Assessment Business goals Data growth needs Use case discovery (partner delivery) Consult Solution architecture and design (NetApp delivery) Deploy Installation and implementation (NetApp delivery) Solution implementation (partner delivery)
Support options:
Global support available from NetApp and partners
Thank You
NetApp Confidential - Internal Use Only