Transcript Hadoop
Dennis Mulder Windows Azure Center of Excellence Spotlight Services Global Services Team 10 Senior Cloud Architects Assessment Pilots Pilots Architecture and Design Guidance Cloud Apps Global Scale Modern Apps Champs 8 US, EMEA, APAC Assess Design Pilots Contact Design Sessions Dennis Mulder, Solution Architect, [email protected] Engage Four megatrends will dominate the next decade Mobility 91% of organizations expect to spend on mobile devices in 2012 In 2012, mobile devices will outship PCs by more than 2:1 and generate more revenue than PCs for the first time 85 BILLION Social Social networking will follow not just people but also appliances, devices and products mobile apps will be downloaded in 2012 = 1/2 of companies expect to use internal social network apps in 2012 Big data Cloud >80% of new apps in 2012 will be distributed/ 49% of CIOs rank BI as the top project priority for 2012 deployed on clouds The strategic focus in the cloud will shift in 2012 from infrastructure to application 34% platforms of CIOs say technology as a service (cloud) will have the most profound effect on the CIO role in the future 2/3 of mobile apps developed in 2012 will integrate with analytics offerings 2.7 zettabytes in 2012 32% of businesses are likely to invest in BI and analytics in 2012 Microsoft is embracing these megatrends Mobility 91% of organizations expect to spend on mobile devices in 2012 In 2012, mobile devices will outship PCs by more than 2:1 and generate more revenue than PCs for the first time 85 BILLION Social Social networking will follow not just people but also appliances, devices and products mobile apps will be downloaded in 2012 = 1/2 of companies expect to use internal social network apps in 2012 Big data Cloud >80% of new apps in 2012 will be distributed/ 49% of CIOs rank BI as the top project priority for 2012 deployed on clouds The strategic focus in the cloud will shift in 2012 from infrastructure to application 34% platforms of CIOs say technology as a service (cloud) will have the most profound effect on the CIO role in the future 2/3 of mobile apps developed in 2012 will integrate with analytics offerings 2.7 zettabytes in 2012 32% of businesses are likely to invest in BI and analytics in 2012 Rethinking and evolving business strategies Mobility Social Cloud Big data How will technology megatrends enable you to save money, drive innovation, grow your business, and attract and retain customers? Terabytes (10E12) Click Stream Mobile Volume Petabytes (10E15) Internet of things Wikis / Blogs Sensors / RFID / Devices Social Sentiment Exabytes (10E18) Audio / Video WEB 2.0 Advertising eCommerce ERP / CRM Gigabytes (10E9) Log Files Collaboration Spatial & GPS Coordinates Digital Marketing Data Market Feeds Payables Contacts Search Marketing Payroll Deal Tracking Web Logs Inventory Sales Pipeline Recommendations eGov Feeds Weather Text/Image Velocity - Variety - variability ERP / CRM 1980 190,000$ Storage/GB 1990 9,000$ WEB 2.0 Internet of things 2000 15$ 2010 0.07$ 4 54235 $75 7 10025 $60 2 53705 $30 1 02115 4 $15 DataNode2 3 54235 $75 5 7 53705 10025 $65$60 0 8 54235 10025 $22 $95 44313 $55 5 53705 $65 0 54235 $22 5 53705 $15 2 53705 $30 6 44313 $10 1 02115 $15 Mapper 3 10025 $95 5 53705 6 44313 8 44313 $55 6 44313 $25 9 02115 $15 Group By 54235 $75 54235 $22 10025 $60 10025 $95 44313 $55 53705 $65 One output bucket per reduce task Mapper DataNode1 Blocks of the Sales file in HDFS DataNode3 (custId, zipCode, amount) $15 6 $1044313 $25 9 $15 02115 Group By Map tasks 53705 $30 53705 $15 02115 $15 02115 $15 44313 $10 44313 $25 21 Mapper Reducer $65 54235 $75 53705 $30 54235 $22 53705 $15 10025 $60 10025 $95 44313 $55 $65 53705 $30 53705 $15 02115 $15 02115 $15 44313 $10 44313 $25 Sort 53705 $65 53705 $30 53705 $15 SUM 53705 $110 10025 $155 44313 $90 02115 $30 54235 $97 Reducer Shuffle 53705 Mapper 53705 44313 $10 10025 $60 44313 $25 10025 $95 10025 $60 44313 $10 10025 $95 44313 $25 44313 $55 44313 $55 Sort Reduce tasks SUM Reducer 54235 $75 54235 $22 02115 $15 02115 $15 Sort 02115 $15 02115 $15 54235 $75 54235 $22 SUM HDFS API Name Node Azure Blob Storage de Front Front end Frontend end Data Node Data Node … DFS (1 Data Node per Worker Role) and Compute Cluster Partition Layer Stream Layer Azure Storage (ASV) Distributed Processing (MapReduce) Distributed Storage (HDFS) ODBC Query (Hive) Legend Red = Core Hadoop Blue = Data processing Purple = Microsoft integration points and value adds Orange = Data Movement Green = Packages Hive, Pig, Mahout, Cascading, Scalding, Scoobi, Pegasus… C#, F# Map/Reduce, LINQ to Hive, .NET management clients JavaScript Map/Reduce, Browser hosted console, Node.js management clients PowerShell, Cross Platform CLI tools http://www.windowsazure.com/ http://hadoop.apache.org/ http://nuget.org/packages?q=hadoop http://hadoopsdk.codeplex.com Dennis Mulder Windows Azure Center of Excellence Spotlight Services Global Services Team 10 Senior Cloud Architects Assessment Pilots Pilots Architecture and Design Guidance Cloud Apps Global Scale Modern Apps Champs 8 US, EMEA, APAC Assess Design Pilots Contact Design Sessions Dennis Mulder, Solution Architect, [email protected] Engage © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.