Transcript Slide 1
Big Data and Business Intelligence Virgil Dodson 1 Actuate Corporation © 2012 Today’s Agenda and Goals • Introduction to Big Data • Eclipse Survey Results • Independent Survey Results • Introduction to BIRT • Big Data Connections • Live Demo • Questions 2 Actuate Corporation © 2012 Big Data Definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. web logs RFID sensors social networks Internet text search indexes call detail records astronomy atmospheric info genomics biogeochemical biological military surveillance medical records photographs video large-scale e-commerce - Wikipedia 3 Actuate Corporation © 2012 IDC 2013 Big Data Predictions • The “Digital Universe” will expand to over 4 zettabytes… Over 50% growth from 2012 • The Big Data focus will shift “up the stack”, toward analytics and discovery, and analytic applications • Spending will reach $10 billion in 2013, over $20 billion by 2016 Source: IDC, IDC Predictions 2013 presentation 4 Actuate Corporation © 2012 Eclipse BIRT Survey – Oct/Nov 2012 • Big Data or Little Data - How Do You Display Yours? The Eclipse Foundation would like to better understand how developers are using Eclipse with big data and reporting projects. • We ran this survey to get the pulse of what technologies where in demand related to Eclipse/BIRT technologies. • Eclipse Promoted the Survey. • 60% of 518 responders claimed to be big data users 5 Actuate Corporation © 2012 Eclipse BIRT Survey - Technology Choices What big data technologies are you using with Eclipse? None R Mahout Talend Open Studio Hive BIRT MongoDB Cassandra Hadoop 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% Hadoop Cassandra MongoDB BIRT Hive 28.5% 7.3% 17.0% 20.6% 10.9% 30.0% Talend Open Studio 7.3% 35.0% 45.0% Mahout R None 7.9% 12.1% 40.0% Note: Responders could choose more than one option 6 Actuate Corporation © 2012 40.0% Eclipse BIRT Survey - Other Mentions Other Mentions Home grown Jasper Greenplum jdt Netezza ZEND StreamBase hypertable HBase CouchDB torque Pentaho OOZIE Sqoop IBM Inforsphere Streams Kamasphere Bigtop BerekelyDB-JE Next-generation-sequencing (BAM) 7 Actuate Corporation © 2012 Eclipse BIRT Survey - Data Visualization How Important is Data Visualization/Reporting to Your Projects? Sometimes important 28% Occasionally useful 13% Essential 52% 8 Actuate Corporation © 2012 Never needed 7% Report/Visualization Tools How do you create and/or use data display tools or libraries in development ? 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% I use open source I use commercial data data reporting/visualizati reporting/visualizati on tools on tools 70.9% 20.0% I use home grown routines or open source libraries to display data 39.4% My projects don't require reporting or data visualization Note: Responders could choose more than one option 9 Actuate Corporation © 2012 7.9% Independent Big Data Survey – Sept/Oct 2012 Goals: • How many large firms (>$1B) are conducting Big Data projects • What are such companies doing with their Big Data projects • What are the expected benefits for those Big Data initiatives • What are the inhibitors • King Research received 516 surveys • 316 completed and 200 partially completed surveys • Completed surveys were the primary source of analysis • 32% of those who completed survey (98 respondents) work at companies with revenue of $1B or more 10 Actuate Corporation © 2012 Independent Big Data Survey – Key Findings • 26% of large companies have Big Data projects. 40% have not evaluated Big Data or have evaluated and decided not to proceed. The balance (34%) are either evaluating or planning such initiatives. • “Not enough staff with expertise” and “Expected cost of Big Data initiatives” are the major inhibitors • Major benefits expected from Big Data initiatives are: • • • • Make better decisions, faster Gain competitive advantage Improve efficiency Improve customer targeting • Major benefits realized from Big Data initiatives are: • • • • Gain competitive advantage Improve customer targeting Make better decisions, faster Improve efficiency 11 Actuate Corporation © 2012 Independent Big Data Survey – Big Data Usage Does your organization have a Big Data implementation today? 50.00% 45.00% $1B+ Revenue Universe of Respondents 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% No – Have not evaluated No – Evaluated and Big Data decided not to proceed Evaluating Planning to use in the Planning to use in the Yes – We have a Big Data short term – less than 1 long term – more than 1 implementation today year year • More large companies have implemented Big Data projects (26%) than the universe of companies represented in this survey (19%) • Conversely, far fewer respondents at large companies responded “No” to this question (40% versus the universe of respondents 49%) 12 Actuate Corporation © 2012 Independent Big Data Survey – Big Data Technologies What Big Data technologies do you plan to use? (eval/planning) 50.00% 45.00% $1B+ Revenue Universe of Respondents 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% Apache Hadoop Cloudera Hadoop Apache Hive Apache HBase EMC Greenplum HD • We asked about their planned use of 15 technologies, and the top 5, in descending order of frequency of mention are displayed above • Other technologies planned for use at $1B+ organizations include: Apache Cassandra, 12%; Hortonworks Hadoop, 12%; Amazon DynamoDB, 9%; Apache CouchDB, 9%; VoltDB, 9%; HyperTable, 6%; 10gen MongoDB, 3%; Datastax Cassandra, 3% 13 Actuate Corporation © 2012 Independent Big Data Survey – Application Types What are likely to be your Big Data applications? (responses from those who are evaluating or planning Big Data implementations) 50.00% 45.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% • Our survey listed 23 frequently reported Big Data applications and when asked which of these they have evaluated or planned to use, they indicated an average 4.5 apps each. • Shown above are the 14 apps that were most frequently indicated 14 Actuate Corporation © 2012 Independent Big Data Survey – Number of End Users How many people in your organization will consume information from or use your Big Data applications? (evaluating/planning) 50.00% 45.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 1 – 9 people 10 - 49 people 50 - 99 people 100 - 499 people 500 or more people • Clearly companies with revenues of $1B or greater plan to share their Big Data information with large audiences across their companies 15 Actuate Corporation © 2012 Actuate Launches the BIRT Project Actuate proposed and started BIRT Business Intelligence and Reporting Tools Project … a top-level Eclipse project Actuate Joins Eclipse Foundation as Strategic Developer and Board Member Adds BI and Reporting as Open Source Project Professional open source Primary development resources funded by Actuate Contributions from many sources IBM, Innovent Solutions and community 16 Actuate Corporation © 2012 AUGUST 2004 Business Intelligence and Reporting Tools A New Generation of Data Visualization Technology • Makes all data-driven content development easy • Modern, web-page design metaphor • Open and standards-based • Flexible with rich programmatic control • Full support for libraries and reuse • Foundation for a range of solutions Simplicity that makes simple layouts easy 17 Actuate Corporation © 2012 Power BIRT to create very complex layouts BIRT Release History September 2004 BIRT Project proposal accepted, and project launched June 2005 1.0 Eclipse Report Designer, Report Engine, Chart Engine December 2005 2.0 Support for a wide variety of common layouts June 2006 2.1 Advanced parameters, ability to join data sets, … June 2007 2.2 Dynamic crosstab support, web services data source, … June 2008 2.3 JavaScript Debugger, BiDi Support, Charts in Crosstabs, … June 2009 2.5 Page aggregates, Multiple drill-downs in Charts, … June 2010 2.6 New charts, more chart control, developer productivity, … June 2011 3.7 POJO Runtime, Hive/Hadoop, Open Office emitters… June 2012 4.2 Maven Support, Excel Data Source, Relative Time Periods… • Ground-up initiative: Innovative approach to layout and design • Developed in the open with community feedback at all stages 18 Actuate Corporation © 2012 BIRT Example Key Capabilities Very Simple to Very Complex Layouts • Listings, cross-tab, dashboard, pixelperfect, charts … • Grouping, advanced aggregations, subtotals, calculations • Multi-section and sub-reports • Conditional sections and logic • Full programmatic control/scripting • Embedded images… Re-use and Developer Productivity • Library support for publishing and sharing components • Leverages common standards (SQL, HTML, JavaScript, Java, XML) • Cascading Style Sheets • Built-in debugger… Interactivity and Linking • Data driven hyperlinks Comprehensive Data Access • SQL databases, Web Services, Flat Files, • Drill-through charts and graphics… XML, scripted data sources … Multiple Usage and Productivity Aids • Multiple data sources in one design… • Graphical layout and design Output Formats • Query & metadata editors • HTML, PDF, Excel, Word, PowerPoint… • Formatting Builder • Internationalization of labels and text • Grouping Builder • Bi-Directional language display • Customizable cheat sheets and templates… 19 Actuate Corporation © 2012 Getting to Know BIRT DEMO 20 Actuate Corporation © 2012 BIRT Design Gallery Charts and Tables Listing with Groups and Sub-Totals 21 Actuate Corporation © 2012 BIRT Design Gallery Crosstab and Charts Crosstabs 22 Actuate Corporation © 2012 BIRT Design Gallery Forms Calendar / Schedule 23 Actuate Corporation © 2012 BIRT Design Gallery Multi-Language and Bi-Directional Dashboards 24 Actuate Corporation © 2012 BIRT Chart Gallery 25 Actuate Corporation © 2012 BIRT Chart Gallery 26 Actuate Corporation © 2012 BIRT Chart Gallery 27 Actuate Corporation © 2012 High-Level BIRT Architecture BIRT Designer Eclipse Designer Eclipse DTP, WTP,… Chart Designer Design Engine XML Design Document BIRT Engine Data Data 28 Actuate Corporation © 2012 Generation Services Charting Engine Data Services Presentation Services HTML PDF Excel Word PowerPoint PostScript … High Level BIRT Architecture DE API Design Engine RE API Report Engine CE API Chart Engine All Engines can be ran with or without OSGi Core BIRT Open Source Products Report Designer Chart Builder Example Viewer Can be ran outside of BIRT 29 Actuate Corporation © 2012 Produces XML Report, Templates, and Library Designs Runs Reports and produces output – PDF, HTML, Doc, XLS, PS, PPT Etc Consume Chart EMF model and produces Chart Output. Supports 14 Main types and many sub types. Ouputs to PNG, JPG, BMP, SVG, PDF, SWT, and SWING BIRT AJAX Based Viewer 30 Actuate Corporation © 2012 BIRT Data Access • BIRT Offers many ways to get data • Standard Data Sources • Flat File (CSV, TSV, SSV, PSV) • Hive Data Source • Cassandra Scripted Data Source • JDBC Textual or Graphical • Web Service - XPath syntax • XML - XPath syntax • XLS/XLSX • Scripted Data Source Written in Java or JavaScript • Open Data Access (ODA) DTP Project • Extensible JDBC Driver Framework Community Contributions GoogleDocs XML/A Casandra REST MongoDB Multi-Flat File GitHub Twitter JSON Search Dropbox usage YQL Google Analytics LinkedIn Facebook FQL 31 Actuate Corporation © 2012 Live Demo – New MongoDB ODA DEMO 32 Actuate Corporation © 2012 Connecting to Hadoop 33 Actuate Corporation © 2012 Hive JDBC – HQL Sub Query Example 34 Actuate Corporation © 2012 Hive JDBC – get_json_object UDF 35 Actuate Corporation © 2012 Hive JDBC – RegExP Example 36 Actuate Corporation © 2012 Hive JDBC – HQL Hints example 37 Actuate Corporation © 2012 Hive JDBC – Transform Example 38 Actuate Corporation © 2012 BIRT Exchange Community Site Centralized hub for BIRT developers • Access demos, tutorials, tips and techniques, documentation… • Enables developers to be more productive and build applications faster • Marketplace for applications Explore • Search/sort • Rate, comment • Forums Download • Documentation • Software • Examples Contribute • BIRT designs, code • Technical tips • Contests 39 Actuate Corporation © 2012 Plug in to BIRT Spring 2013 Contest Contest runs from March 28, 2013 to April 30, 2013 Plug-In Categories Open Data Access (ODA) Drivers Output Emitters Report Item Extensions Chart Extensions New iPad for Top 3 Plug-Ins! Visit BIRT Exchange for full contest details 40 Actuate Corporation © 2012 Questions? Big Data and Business Intelligence Virgil Dodson [email protected] 41 Actuate Corporation © 2012