Transcript Slide 1

Big Data and Business Intelligence
Virgil Dodson
1
Actuate Corporation © 2012
Today’s Agenda and Goals
• Introduction to Big Data
• Eclipse Survey Results
• Independent Survey Results
• Introduction to BIRT
• Big Data Connections
• Live Demo
• Questions
2
Actuate Corporation © 2012
Big Data Definition
Big data is a collection of data sets so large and complex that it
becomes difficult to process using on-hand database
management tools or traditional data processing applications.
web logs
RFID
sensors
social networks
Internet text search indexes call detail records astronomy
atmospheric info
genomics
biogeochemical
biological
military surveillance
medical records
photographs video
large-scale e-commerce
- Wikipedia
3
Actuate Corporation © 2012
IDC 2013 Big Data Predictions
• The “Digital Universe” will expand to over 4 zettabytes… Over 50%
growth from 2012
• The Big Data focus will shift “up the stack”, toward analytics and
discovery, and analytic applications
• Spending will reach $10 billion in 2013, over $20 billion by 2016
Source: IDC, IDC Predictions 2013 presentation
4
Actuate Corporation © 2012
Eclipse BIRT Survey – Oct/Nov 2012
• Big Data or Little Data - How Do You Display Yours?
The Eclipse Foundation would like to better understand how developers
are using Eclipse with big data and reporting projects.
• We ran this survey to get the pulse of what technologies where in
demand related to Eclipse/BIRT technologies.
• Eclipse Promoted the Survey.
• 60% of 518 responders claimed to be big data users
5
Actuate Corporation © 2012
Eclipse BIRT Survey - Technology Choices
What big data technologies are you using with Eclipse?
None
R
Mahout
Talend Open Studio
Hive
BIRT
MongoDB
Cassandra
Hadoop
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
Hadoop
Cassandra
MongoDB
BIRT
Hive
28.5%
7.3%
17.0%
20.6%
10.9%
30.0%
Talend Open
Studio
7.3%
35.0%
45.0%
Mahout
R
None
7.9%
12.1%
40.0%
Note: Responders could choose more than one option
6
Actuate Corporation © 2012
40.0%
Eclipse BIRT Survey - Other Mentions
Other Mentions
Home grown
Jasper
Greenplum
jdt
Netezza
ZEND
StreamBase
hypertable
HBase
CouchDB
torque
Pentaho
OOZIE
Sqoop
IBM Inforsphere Streams
Kamasphere
Bigtop
BerekelyDB-JE
Next-generation-sequencing (BAM)
7
Actuate Corporation © 2012
Eclipse BIRT Survey - Data Visualization
How Important is Data
Visualization/Reporting to Your Projects?
Sometimes
important
28%
Occasionally
useful
13%
Essential
52%
8
Actuate Corporation © 2012
Never needed
7%
Report/Visualization Tools
How do you create and/or use data display
tools or libraries in development ?
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
I use open source
I use commercial
data
data
reporting/visualizati reporting/visualizati
on tools
on tools
70.9%
20.0%
I use home grown
routines or open
source libraries to
display data
39.4%
My projects don't
require reporting or
data visualization
Note: Responders could choose more than one option
9
Actuate Corporation © 2012
7.9%
Independent Big Data Survey – Sept/Oct 2012
Goals:
• How many large firms (>$1B) are conducting Big Data projects
• What are such companies doing with their Big Data projects
• What are the expected benefits for those Big Data initiatives
• What are the inhibitors
• King Research received 516 surveys
• 316 completed and 200 partially completed surveys
• Completed surveys were the primary source of analysis
• 32% of those who completed survey (98 respondents) work at
companies with revenue of $1B or more
10
Actuate Corporation © 2012
Independent Big Data Survey – Key Findings
• 26% of large companies have Big Data projects. 40% have not evaluated Big
Data or have evaluated and decided not to proceed. The balance (34%) are
either evaluating or planning such initiatives.
• “Not enough staff with expertise” and “Expected cost of Big Data initiatives” are
the major inhibitors
• Major benefits expected from Big Data initiatives are:
•
•
•
•
Make better decisions, faster
Gain competitive advantage
Improve efficiency
Improve customer targeting
• Major benefits realized from Big Data initiatives are:
•
•
•
•
Gain competitive advantage
Improve customer targeting
Make better decisions, faster
Improve efficiency
11
Actuate Corporation © 2012
Independent Big Data Survey – Big Data Usage
Does your organization have a Big Data implementation today?
50.00%
45.00%
$1B+ Revenue
Universe of Respondents
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
No – Have not evaluated No – Evaluated and
Big Data
decided not to proceed
Evaluating
Planning to use in the Planning to use in the Yes – We have a Big Data
short term – less than 1 long term – more than 1 implementation today
year
year
• More large companies have implemented Big Data projects (26%) than the universe of
companies represented in this survey (19%)
• Conversely, far fewer respondents at large companies responded “No” to this question
(40% versus the universe of respondents 49%)
12
Actuate Corporation © 2012
Independent Big Data Survey – Big Data Technologies
What Big Data technologies do you plan to use? (eval/planning)
50.00%
45.00%
$1B+ Revenue
Universe of Respondents
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
Apache Hadoop
Cloudera Hadoop
Apache Hive
Apache HBase
EMC Greenplum HD
• We asked about their planned use of 15 technologies, and the top 5, in descending order of
frequency of mention are displayed above
• Other technologies planned for use at $1B+ organizations include: Apache Cassandra,
12%; Hortonworks Hadoop, 12%; Amazon DynamoDB, 9%; Apache CouchDB, 9%; VoltDB,
9%; HyperTable, 6%; 10gen MongoDB, 3%; Datastax Cassandra, 3%
13
Actuate Corporation © 2012
Independent Big Data Survey – Application Types
What are likely to be your Big Data applications? (responses from those who are
evaluating or planning Big Data implementations)
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
• Our survey listed 23 frequently reported Big Data applications and when asked which of
these they have evaluated or planned to use, they indicated an average 4.5 apps each.
• Shown above are the 14 apps that were most frequently indicated
14
Actuate Corporation © 2012
Independent Big Data Survey – Number of End Users
How many people in your organization will consume information from or use your Big
Data applications? (evaluating/planning)
50.00%
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
15.00%
10.00%
5.00%
0.00%
1 – 9 people
10 - 49 people
50 - 99 people
100 - 499 people
500 or more people
• Clearly companies with revenues of $1B or greater plan to share their Big Data information
with large audiences across their companies
15
Actuate Corporation © 2012
Actuate Launches the BIRT Project
Actuate proposed and started
BIRT
Business Intelligence
and Reporting Tools Project
… a top-level Eclipse project
Actuate Joins
Eclipse Foundation
as Strategic Developer
and Board Member
Adds BI and Reporting
as Open Source Project
Professional open source
Primary development resources
funded by Actuate
Contributions from many sources
IBM, Innovent Solutions and community
16
Actuate Corporation © 2012
AUGUST
2004
Business Intelligence and Reporting Tools
A New Generation of Data Visualization Technology
• Makes all data-driven content development easy
• Modern, web-page design metaphor
• Open and standards-based
• Flexible with rich programmatic control
• Full support for libraries and reuse
• Foundation for a range of solutions
Simplicity
that makes
simple
layouts easy
17
Actuate Corporation © 2012
Power
BIRT
to create
very complex
layouts
BIRT Release History
September 2004
BIRT Project proposal accepted, and project launched
June 2005
1.0
Eclipse Report Designer, Report Engine, Chart Engine
December 2005
2.0
Support for a wide variety of common layouts
June 2006
2.1
Advanced parameters, ability to join data sets, …
June 2007
2.2
Dynamic crosstab support, web services data source, …
June 2008
2.3
JavaScript Debugger, BiDi Support, Charts in Crosstabs, …
June 2009
2.5
Page aggregates, Multiple drill-downs in Charts, …
June 2010
2.6
New charts, more chart control, developer productivity, …
June 2011
3.7
POJO Runtime, Hive/Hadoop, Open Office emitters…
June 2012
4.2
Maven Support, Excel Data Source, Relative Time Periods…
• Ground-up initiative: Innovative approach to layout and design
• Developed in the open with community feedback at all stages
18
Actuate Corporation © 2012
BIRT Example Key Capabilities
Very Simple to Very Complex Layouts
• Listings, cross-tab, dashboard, pixelperfect, charts …
• Grouping, advanced aggregations, subtotals, calculations
• Multi-section and sub-reports
• Conditional sections and logic
• Full programmatic control/scripting
• Embedded images…
Re-use and Developer Productivity
• Library support for publishing and
sharing components
• Leverages common standards (SQL,
HTML, JavaScript, Java, XML)
• Cascading Style Sheets
• Built-in debugger…
Interactivity and Linking
• Data driven hyperlinks
Comprehensive Data Access
• SQL databases, Web Services, Flat Files,
• Drill-through charts and graphics…
XML, scripted data sources …
Multiple Usage and Productivity Aids
• Multiple data sources in one design…
• Graphical layout and design
Output Formats
• Query & metadata editors
• HTML, PDF, Excel, Word, PowerPoint…
• Formatting Builder
• Internationalization of labels and text
• Grouping Builder
• Bi-Directional language display
• Customizable cheat sheets and
templates…
19
Actuate Corporation © 2012
Getting to Know BIRT
DEMO
20
Actuate Corporation © 2012
BIRT Design Gallery
Charts and Tables
Listing with Groups and Sub-Totals
21
Actuate Corporation © 2012
BIRT Design Gallery
Crosstab and Charts
Crosstabs
22
Actuate Corporation © 2012
BIRT Design Gallery
Forms
Calendar / Schedule
23
Actuate Corporation © 2012
BIRT Design Gallery
Multi-Language and Bi-Directional
Dashboards
24
Actuate Corporation © 2012
BIRT Chart Gallery
25
Actuate Corporation © 2012
BIRT Chart Gallery
26
Actuate Corporation © 2012
BIRT Chart Gallery
27
Actuate Corporation © 2012
High-Level BIRT Architecture
BIRT Designer
Eclipse
Designer
Eclipse
DTP,
WTP,…
Chart
Designer
Design Engine
XML
Design
Document
BIRT Engine
Data
Data
28
Actuate Corporation © 2012
Generation
Services
Charting
Engine
Data
Services
Presentation
Services
HTML
PDF
Excel
Word
PowerPoint
PostScript
…
High Level BIRT Architecture
DE API
Design Engine
RE API
Report Engine
CE API
Chart Engine
All Engines can be ran with or without OSGi
Core BIRT Open Source Products
Report Designer
Chart Builder
Example Viewer
Can be ran outside of BIRT
29
Actuate Corporation © 2012
Produces XML Report,
Templates, and Library
Designs
Runs Reports and
produces output – PDF,
HTML, Doc, XLS, PS,
PPT Etc
Consume Chart EMF
model and produces
Chart Output. Supports
14 Main types and many
sub types. Ouputs to
PNG, JPG, BMP, SVG,
PDF, SWT, and SWING
BIRT AJAX Based Viewer
30
Actuate Corporation © 2012
BIRT Data Access
• BIRT Offers many ways to get data
• Standard Data Sources
• Flat File (CSV, TSV, SSV, PSV)
• Hive Data Source
• Cassandra Scripted Data Source
• JDBC Textual or Graphical
• Web Service - XPath syntax
• XML - XPath syntax
• XLS/XLSX
• Scripted Data Source Written in Java
or JavaScript
• Open Data Access (ODA) DTP Project
• Extensible JDBC Driver Framework
Community Contributions
GoogleDocs
XML/A
Casandra
REST
MongoDB
Multi-Flat File
GitHub
Twitter JSON Search
Dropbox usage
YQL
Google Analytics
LinkedIn
Facebook FQL
31
Actuate Corporation © 2012
Live Demo – New MongoDB ODA
DEMO
32
Actuate Corporation © 2012
Connecting to Hadoop
33
Actuate Corporation © 2012
Hive JDBC – HQL Sub Query Example
34
Actuate Corporation © 2012
Hive JDBC – get_json_object UDF
35
Actuate Corporation © 2012
Hive JDBC – RegExP Example
36
Actuate Corporation © 2012
Hive JDBC – HQL Hints example
37
Actuate Corporation © 2012
Hive JDBC – Transform Example
38
Actuate Corporation © 2012
BIRT Exchange Community Site
Centralized hub for BIRT developers
• Access demos, tutorials, tips and techniques, documentation…
• Enables developers to be more productive and build applications faster
• Marketplace for applications
Explore
• Search/sort
• Rate, comment
• Forums
Download
• Documentation
• Software
• Examples
Contribute
• BIRT designs, code
• Technical tips
• Contests
39
Actuate Corporation © 2012
Plug in to BIRT Spring 2013 Contest
Contest runs from March 28, 2013 to April 30, 2013
Plug-In Categories
Open Data Access (ODA) Drivers
Output Emitters
Report Item Extensions
Chart Extensions
New iPad for Top 3 Plug-Ins!
Visit BIRT Exchange for full contest details
40
Actuate Corporation © 2012
Questions?
Big Data and Business Intelligence
Virgil Dodson
[email protected]
41
Actuate Corporation © 2012