Open Source Business Intelligence

Download Report

Transcript Open Source Business Intelligence

Enterprise Information Mashups:
Integrating Information, Simply
Anant Jhingran
CTO, Information Management
IBM
IBM Confidential
Acknowledgements
 Mehmet Altinel
 John Furrier of PodTech.net
 Kevin Beyer
 AND ALL THE WEB AND WEB 2.0
GIANTS
 Volker Markl
 David Simmen
 Shivakumar Vaithyanathan
 Hamid Pirahesh
 Laura Haas
 Tina Mukai
 Allen Cypher
 Rod A Smith
 Don Ferguson
 Jerry Cuomo
 Carol Jones
 Chet Kapoor
 Chung-Sheng Li
Outline
 Evolution of IT and Information Management influence
 Situational Applications and Mashups as the next
phase of “sustainable” IT spend
 Why Information 2.0 is a critical fuel for this
sustainable growth
 Example and the research problems we see
 IBM efforts in this area
Information Technology Spend “had” been
growing nicely
1964:
S/360
debuts
1971: First
Intel Micro
1981: IBM
PC
1994:
Netscape
Navigator
2000: Dotcom
collapse
Is it the end? No: Just the beginning of a “golden”
age
Innovation
Irruption
Crash
Frenzy
Deployment
Synergy
Maturity
1771
Panic
1797
• Formation of Mfg. industry
• Repeal of Corn Laws opening 1829
trade
1829
Panic
1847
• Standards on gauge, time
• Catalog sales companies
• Economies of scale
1873
• Separation of savings,
investment banks
• FDIC, SEC
1920
• Build-out of Interstate
highways
• IMF, World Bank, BIS
1974
1
The Industrial
Revolution
2
Age of Steam
and Railways
3
Age of Steel, Electricity
and Heavy Engineering
1875
Depression
1893
4
Age of Oil, Automobiles
and Mass Production
1908
Crash
1929
5
Age of Information and
Telecommunications
1971
Dot.com
Collapse
Source: “Technological Revolutions and Financial Capital, Carlota Perez, 2002
Current period of
Institutional Adjustment
Around Web Style of
Architectures
Four Phases of Influence of Data Management
Research
High:? Separation of Situational
Apps & Information
Low  Middle: Separation
of Process & Information
Web 2.0
Zero, but a lot of research
Web1.5
WebServices
Web1.0
High: Separation of Logic &
Data/Content
Traditional IT Spend
1990
1995
2000
Over time, complexity got built into the IT systems
Mainframe
PC/NT apps
Unix apps
3rd Party Interface
AIS Reports
Depository
Banks
Vendor Setup
Budget
Analysis Tool
AIS Calendar
Vendor
Maintenance
Insertions
Orders
Due Dates
General
Maintenance
Broadcast
Filter
Stores & Mrkts
Process Servers
(Imaging)
Printer
Maintenance
NEW Soundscan
NPD Group
AIG Warranty Guard
Mesa Data
S20-Sales
Polling
Printer PO
I13- Auto
Replenishment
I06 - Customer
Order
S01 - Sales
Corrections
I17 Customer Perceived
In-Stock
UAR - Universal Account
Reconciliation
Sterling VAN
Mailbox (Value)
Roadshow
I15 Hand Scan
Apps
I06 Warehouse
Management
Print Costing
Invoice App
E13
E3 Interface
Fringe PO
Smart Plus
M03 - Millennium 3.0
Smart Plus
Launcher
S04 - Sales Posting
S07 - Cell
Phones
P16 - Tally Sheet
I03 Return to
Vendor
D01 Post Load
Billing
M02 - Millennium
S06 - Credit App
Equifax
Stock Options
P15 EES Employee
Change Notice
L02-Resource
Scheduling
(Campbell)
I12 Entertainment
Software
A04 - Cust
Refund Chks
AAS
P01Employee
Masterfile
P09 - P17
Cyborg
Cobra
Frick
Co
CTS
ACH
V02-Price
Marketing
Support
CTO2.Bestbuy.
com
V04-Sign
System
U18 - CTO
Spec Source
SKU Tracking
B01 - Stock
Status
Prodigy
Banks - ACH and Pos to
Pay
I10 Cycle Physical
Inventory
I04 Home
Deliveries
Intercept
E02-Employee
Purchase
S08 - Vertex
Sales
Tax
I02 Transfers
I11 Price
Testing
I09 Cycle Counts
Scorecard - HR
I01 PO
Receiving
V03- Mkt
Reactions
S03-Polling
K02
Customer Repair
Tracking
ASIS
I18
SKU Rep
Arthur Planning
I07 Purchase
Order
Ad Expense
G02 - General
Ledger
Store
Scorecard
Sign
System
Texlon 3.5
NARM
U16-Texlon
SKU Selection
Tool
I35 Early Warning
System
I55 SKU
Information
ELT
PowerSuite
SKU
Performance
L60 MDF
Coop
I05
Inventory Info
V01-Price Management
System
I35 - CEI
Rebate
Transfer
X92-X96
Host to AS400
Communication
Supplier
Compliance
S02 Layaways
NPD,
SoundScan
Spec
Source
P09
Bonus/HR
Washington,
RGIS,
Ntl Bus Systems
S11 - ISP
Tracking
POS
Plan Administrators
(401K, PCS, Life,
Unicare, Solomon
Smith Barney)
Store
Monitor
L01-Promo
Analysis
1
E01-EDI
P14 On-line New
Hire Entry
Resumix
S09 - Digital
Satellite
System
I14 Count Corrections
Store Budget
Reporting
Valley Media
B02 Merchandise
Analysis
CopyWriter's
Workspace
BMP - Bus
performance Mngt
EDI
Coordinator
Merch Mngr Approval
Batch Forcasting
Ad Measurement
AIMS Admin
AIMS
Journal Entry Tool Kit
A05 - AP
Cellular
Rollover
AIMS
Reporting
Ad
Launcher
S05 - House
Charges
Optika
PSP
C02 - Capital
Projects
Data Warehouse
(Interfaces to and from the
Data Warehouse are not
displayed on this diagram)
US Bank Recon
File
Connect 3
ICMS Credit
SiteSeer
In-Home
Repair
Warranty
Billing
System
OTHER APPS - PC
AP - Collections/Credit
TM - Credit Card DB
F06 - Fixed
Assets
Star Repair
Connect 3
PDF Transfe
Connect 3
Reports
Cash Over/
Short
Cash Receipts/Credit
Misc Accounting/Finance Apps - PC/NT
COBA
(Corp office Budget Assistant)
PCBS(Profit Center Budget System)
Merchandising Budget
INVENTORY CONTROL APPS - PC
Code Alarm
Debit Receivings
Devo Sales
Display Inventory
In Home
Junkouts
Merchandise Withdrawal
Promo Credits
RTV Accrual
Shrink
AP Research - Inv Cntrl
AP Research-Addl Rpts
Book to Perpetual Inventory
Close Out Reporting
Computer Intelligence Data
Count Corrections
Cross Ref for VCB Dnlds
Damage Write Off
Debit Receivings
DFI Vendor Database
Display Inventory Reconcile
Display Inventory Reporting
INVENTORY CONTROL APPS - PC
DPI/CPI
IC Batching
Inventory Adj/Count Correct
Inventory Control Reports
Inventory Levels
Inventory Roll
Merchandise Withdrawal
Open Receivings
PI Count Results
PI Time Results from Inv
Price Protection
Sales Flash Reporting
Shrink Reporting
SKU Gross Margin
SKU Shrink Level Detail
USM
VCB Downloads
ACCTS REC APPS - PC
990COR
Bad Debt
Beneficial Fees
Beneficial Reconcile
JEAXF
JEBFA
JEBKA
JEDVA
JESOA
JEVSA
JEVSF
NSF
TeleCredit Fees
Prepared by Michelle Mills
Actual Application Architecture for Consumer Electronics Company
Web 1.5 = SOA = Business flexibility and reuse
Traditional Business*
 Economics: globalization demands
flexibility
 Business Processes: changing
quickly and sometimes outsourced
 Growth: at the top of the CEO agenda
 Reusable Assets: can cut costs
 Information: greater availability
 However, most of the action in theToday’s World-Class Business*
“process space”
*Sources: CBDi
And using Information as a Strategic Asset to
build better Architectures
Presentation Services
Legacy
Portals, Browsers, and or Devices
Transaction
Application Services
Legacy
Analytic Application Services
Discovery
Services
Master
Data
APPL
Discovery
APPL
Business Process Management
Tx
APPL
App
Server
Process
Services
Business Business
Monitoring
Rules
Strategic
APPL
Tactical
APPL
Enterprise Service Bus
Master data Hubs
Metadata
Email
Notes
OLTP
OLTP1
OLTP2
Metadata
Services
Product
Customer
Supplier
Location
Master Data Services
Federation
EDW
ECW
Event
Processing
Streaming
Batch
Collaboration
Services
Transaction
Services
Information Integration Services
Analytic Services
Content Services
With new “Master Data” hubs being created, such
as WebSphere Customer Center
New
Business
Processing
Privacy
and Data
Mgmt.
Marketing
Insight
Compliance
& Risk
Mgmt.
Sales and
Marketing –
Closed Loop
Campaign
Mgmt.
Customer
Service
Customer
Facing Channels
Call Center
Web
Self-Service
Wireless
Self-Service
Distributor
IVR
Self-Service
Branch /
Sales Office
Internal Users
Browser-based
Customer
Unlimited
Attributes
Multi-enterprise
Standards-based
Multiple
Categorizations
Security and
Audit
Master Data Integration
Data Stewardship
& Administration
Compliance
Marketing
Account
Administration
Privacy
Management
Web 2.0 = Further Simplification
 Invented Outside the
Enterprise
 It is happening around
“Information”
RSS/Atom, wikis, blogs
Information Centric Mashups
RSS
Wiki
AJAX Mashup
PHP/RoR
Blog
Tagging
REST
 Web 2.0 “instant”
applications
Assertion: The Instant Application Model, fueled by Mashups,
will be the most significant drvier for Web 2.0 in the Enterprise
Web 2.0 = Situational Applications in the Enterprise
 More movement towards
situational/activity-based
applications
 Innovation for publishing/handling
content is growing out of the
professional website community
 Ray Lane’s Software 2006 Keynote:
 Serve and individual need
 Seek viral, organic adoption
 Provide contextualized, personal
information
 Require no data entry or training
 Deliver instantaneous value
 Utilizie the community & social
relationships
 Require a minimum IT footprint
However, a whole bunch of these apps not being written today
because they’re not affordable
Number of users per application
The long-tail of application development is large
US Estimates for 2006
(double for WW)
< .5M full-time
application
developers working
on very long-term
projects
Spectrum of applications
> 1.65 – 11M “ad hoc” IT
staff and business
professionals trying to
solve day-to-day
problems
Rethinking Web (2.0) Application Assumptions
How would we design
middleware if assume:
• business organizations & relationships
are continually changing - therefore
solutions need are situational
• LOB teams just enough IT savvy to create
their own services/solutions that drive
their part of the business (Igniting the Phoenix:
A New Vision for IT/Sapir)
• …applications are disposable
Assertion: Only with an Information
2.0 Mashup Fabric!
Mash-up: New form of Integration
Mash-up definition: (different one!) from Wikipedia: Bastard pop is a musical genre
which, in its purest form, consists of the combination (usually by digital means) of the
music from one song with the a cappella from another. Typically, the music and vocals
belong to completely different genres. At their best, bastard pop songs strive for
musical epiphanies that add up to considerably more than the sum of their parts.
List of
stores
Google Map
web service
NOAA Weather
web service
SAP order
fulfillment
RSS feed of topselling items
Wiki commands
to compose
application
Web 2.0 outside, and inside an enterprise will
succeed only with a Info 2.0 Mashup Fabric
Web 2.0
Info 2.0
Enables the same separation of “data” and “logic” that revolutionized
the use of databases in the ’80’s.
Enables the same separation of “information” and “process” that is now
happening in Web 1.5
Within enterprises, it will…
 Enable connections to
information that does not make it
into the enterprise IT
Architectures:
– Email
– Presentations and Documents
– External Data (Web)
– Spreadsheets
– Decision Support Datasets…
 And Enable it to be done
“quickly”, as “assembly” as
opposed to as “programming”
Presentation Services
Legacy
Portals, Browsers, and or Devices
Transaction
Application Services
Legacy
Analytic Application Services
Discovery
Services
Business Process Management
Tx
APPL
App
Server
Process
Services
Business Business
Rules Monitoring
Strategic Tactical
APPL
APPL
Master
Data
APPL
Discovery
APPL
Enterprise Service Bus
Master data Hubs
Federation
Metadata
Pr
od
uct
Email
Cu
sto
me
r
EDW
ECW
OLTP
OLTP1
Notes
OLTP2
Metadata
Services
Sup
plier
L
oc
ati
o
Master Data Services
n
Event
Processing
Streaming
Batch
Collaboration
Services
Transaction
Services
Information Integration Services
Analytic Services
Content Services
How the Architecture could play out…
External Web
LOB Focus
Situational Apps
IT Focus
Process Server/ESB
Web
2.0
Info
2.0
Information Integration
Info 2.0 Fabric
ppt
SaaS Model
email
doc
doc
Software Model
DB
CM
Files
Example
(Zipcode)
Meet Pete, an insurance agent in Florida.
He sees a news report of a severe storm. What is
the company’s risk?
He needs to forward a risk summary to executives.
(HUC = Hydrological Unit Code)
http://water.usgs.gov/waterwatch/
(Geocode = Latitude/Longitude)
edc.usgs.gov/
(Geocode = Latitude/Longitude)
http://florida.maps.anant/
http://www.dotd.louisiana.gov/
Flood Risk Assessment Mashup
Report
Mashup Search
Standardization
Standardize
Screen Scraping
www.floodlevels.com
Lineage
standardize
policy XLS
water.usgs.gov
edc.usgs.gov
dotd.florida.gov
Accuracy
So how can Pete write his mashup simply?
Simplicity
So how can Pete write his mashup simply?
Procedural Code
<?php
// Get policy holders in a Policy object array
$url = "file://policies/myclients.xsl";
$content = file_get_contents($url);
$policyArr = getPolicy($content);
// Find high risk zones
$url = "http://www.floodlevels.com";
$content = file_get_contents($url);
Accuracy
// Do screen scraping to extract high risk zones
$zoneArr = findRiskyZones($content);
Procedural
// Initialize the return array
$riskArr
= array();
Code
// Find corresponding policy holders for each city
foreach ($policyArr as $policy)
{
if ($policy->amount < 250000)
{
continue;
}
// Standardize the address
$policyZone = findZone($policy->address);
}
// Check whether this policy affected
foreach ($zoneArr as $zone)
{
if ($zone == $policyZone)
{
// This policy carries a high risk.
// Insert into high risk array
$riskArr [] = $policy;
}
}
// Send email to manager for high risk policies
sendEmail("[email protected]", "High risk policies",
$riskArr);
?>
Simplicity
So how can Pete write his mashup simply?
Declarative Queries
sendMail("[email protected]",
<highRiskPolicies>
{
Accuracy
for $i in url(“file://policies/myclients.xsl”)
for $j in url("http://www.floodlevels.com”)
where $i//amount > 250000 and
$i//address in $j/zone
return <policy> {$i} </policy>
}
</highRiskPolicies>);
Simplicity
So how can Pete write his mashup simply?
Accuracy
GUIs, Spreadsheets, Wikis
Simplicity
Accuracy
So how can Pete write his mashup simply?
Search
Flood risk for homes in myclients.xsl worth
over 250000
Simplicity
How do we get there?
Research Agenda
 It is all about “simplicity” – do deep research and
build deep technology, but make the job of application
writer much easier!
 Much of our past research is applicable (including
Information Manifold and its children), but new
problems exist because of new target users.
Info 2.0 Mashup Fabric needs to address these
issues, over time
 How to create such a Mashup?
– Finding what exists, specifying what he wants, and
creating what is needed (expressiveness vs. ease of use
– DWIS vs. DWIM)
 How to integrate the information?
– What is the minimal level of semantics that the
Information 2.0 layer needs to have, and has the world
evolved to make it easier now?
 How to deal with unstructured data?
 How do Mashups evolve?
How does Pete find the floodlevels.com Mashup?
 Pages on floodlevels.com are dynamically generated
AJAX pages (produced by another mashup)
 Pete may have typed “Flood Levels Louisiana” into a
search engine
 Similar to deep Web search problem, but now we have
to deal with joins and other mashup operations, or
even workflow
 Search has to understand the logic of the mashup
Web 2.0 magnifies the deep web search problem
How does Pete specify his Mashup?
 Pete is an insurance agent, not an expert Javascript
or PHP/Java/Ruby/etc. programmer
 How does Pete specify a screen scraper if needed?
 How does Pete describe the Mashup flow?
– Current mashups are a hodge-podge of application
and data access
– Similarity to ETL Flow
– Is the answer an XQuery-like language for mashups,
or programming by example?
Web 2.0 needs simple methods to write mashups!
Can he create the Mashup by giving an example?
Could it have been even easier?
 Could Pete’s mashup have been dynamically
constructed when he searched for “flood levels for
zipcodes 33101, 34106, etc.”?
– Test of Time Award: “Information Manifold”
Querying Heterogeneous Information Sources Using
Source Descriptions by A. Halevy, A. Rajamaran, and
A. Ordille
– automatically finding the right sources based on query
Extend Information Manifold to dynamically create Mashups!
How does one simplify “semantics”?
 Helped by:
– Microformats growing in popularity in the open
community
– Standardization services increasingly available
– Master Data Management taking off in enterprises
 Issues:
– Standardization is inherently uncertain. How is
uncertainty handled?
– Quality of services differ. How to track the lineage of
both data and integration services?
– Services vary in price. How to trade-off price, quality,
and time?
 Search shows us some ways
Issues in Unstructured Data
 Everybody wants to run analytics on unstructured data, and
create structured data, and then we are back in our favorite
world. This poses two challenges:
– Analytics are hard and require some fundamentally
new techniques.
– The extracted structured (meta-) data is inherently imprecise.
But unstructured query systems have evolved to address this!
U
Semantics
QUERY/INTEGRATION
S
Analytics
S
U
DATA
Search systems simplify query interface, but…
Conventional Search
Interpreted as
Return emails that
contain the
keywords “beineke”
and phone
It will miss
Text Analytics needs to be powerful enough to extract <name, phonenumber> from email.
And can they disambiguate the “interpretations?”
Conventional Search
Mail from
“Beineke”
containing
a ph#
Interpreted as
Return emails that
contain the
keywords “beineke”
and phone
Dear Owen,
One thing I forgot to add in my previous mail (re: confirmation
Number 295).
If, for whatever reason you are unable to reach me, my co-author
Shivakumar Vaithyanathan will be reachable at 410.555.1212.
Thank You
Phil Beineke
We are building technologies (Avatar) to help do exactly that.
Text Analytics Services
Advanced processing and analytics can enable identification and indexing
of more than just words…
Located At
Relationship
Annotator
Arg1:Entity
Named Entity
Annotator
Arg2:Location
Gov Official
Title
Parts of Speech
Annotator
President
Country
Person
NP
Bush
VP
visits
PP
shrine
in
Israel
But one analysis is not sufficient, we need to be able to “chain” them.
UIMA is that chaining mechanism
Resulting in structured data out of unstructured
domains
namephone
Name
phone
name phone
web
excel
email
UIMA
In the Web 2.0 sense, being able to share and build upon
other’s annotators is extremely useful
Manual tagging –
By Professionals
Pros
 Controlled
vocabularies &
standard
taxonomies
 Higher quality

In another Web 2.0 sense, how does
this co-exist and augment social
tagging?
Cons
 Costly
 Human
resource
intensive
 Cannot
keep up
Example: ?
Automated Tagging –
By Machine
Pros
Cons
 Learns from
professional &
user tagging
 Lower human
cost
 Requires training
of models
 Lower quality
than manual
tagging

Social Tagging –
By Users
Example: Semantic tagging
Popularity
Pros
 User driven
 Emergent
folksonomies
 Serpendipitous
browsing
High-value
content &
enterprise
data sources

Deep archives, large personal collections
Consumer content
“Long tail”
Digital item
Cons
 Ambiguity
 Uncontrolled
vocabulary
 Synonyms
Examples: Del.icio.us and Flickr
Mashup Evolution
SCA
Portals
Mission
Critical
DataMart
Line of Business
Best
Effort,
AdHoc
IT Dept
DataWarehouse
New Initiatives
Proof of Concept
Mashups
Limited Time, Immediate
Lots of Time
MAFIA – A Mashup Fabric for Intranet Applications
being built @ IBM
HTML
XML/Atom/RSS Feed
MAFIA
Lightweight
Semantics
External
Data
Services
Presentation
Transformation
Feed Generation
Augmentation
Fusion
Union
Standardization
Ingestion
Screen Scraping
Web Services
XML/Atom/RSS Feeds
Enterprise
IT
Services
Web Pages
Web Services
Atom/RSS Store
Summary
 Web style of architectures represent the next
“sustainable” phase of IT spend
 The database research community can make a big
difference!
– Re-enable the separation of data and logic: Web 2.0
built on Info 2.0!
 New research problems exist
– Ease of use and ad-hoc integration.
– Bringing Unstructured and (semi-) structured data
 We at IBM are building such an Information 2.0 Fabric,
targeting enterprise situational applications