Transcript IBM - CBS
Big Data Driven:
Official Statistics
Amish Patel, Big Data Leader for Government, Europe [email protected]
Information Management
© 2011 IBM Corporation
Information Management
AGENDA
Drivers for leveraging Big Data
Implications of Big Data on Official Statistics
–
Challenges & Opportunities
–
Industrialisation and Collaborative model
–
New products and indicators
© 2011 IBM Corporation
Information Management
DRIVERS FOR LEVERAGING BIG DATA
© 2011 IBM Corporation
Information Management © 2011 IBM Corporation
Information Management
The Big Data Conundrum
The economies of deletion have changed….
– Leading us into new opportunities and challenges The percentage of available data an enterprise can analyze is decreasing proportionately to the data available to that enterprise – Quite simply, this means as enterprises, we are getting “ more naive ” about our business over time Just collecting and storing “Big Data” doesn’t drive a cent of value to an organization’s bottom line Data AVAILABLE to an organization Data an organization can PROCESS © 2011 IBM Corporation
6 Information Management
Implications Of Big Data On Official Statistics
© 2011 IBM Corporation
2.
3.
4.
5.
6.
Information Management
Challenges & Opportunity
1.
Impact on Policy and Development issues Methodological: bridging the gaps by combining multiple data sources Technology (processing and storage) Security/Privacy Governance Financial © 2011 IBM Corporation
Information Management
1. Impact On Policy And Development Issues
Example: Leveraging Big Data for Currency of National Statistics
© 2011 IBM Corporation
Information Management
2. Methodological
Example: Bridging the gaps by combining multiple data sources
© 2011 IBM Corporation
Information Management
3. Technology – Processing and Storage
Example: Storage is key to your Infrastructure
Cloud Agile Efficient by Design
Designed for
seconds through systems built to process a variety of data at scale Smarter Storage Incorporates cloud technologies to improve service quality, speed of delivery and efficiency Optimize performance and cost by matching workloads with the best platform to meet specific workload requirements Self-Optimizing
© 2011 IBM Corporation 10
Information Management
Data Footprint Reduction
Active Data Backup Data Real-time Compression 40-80% Best 20-30%
• Real-Time Compression is a method of reducing storage needs by changing the encoding scheme as the data is being read and written – – – Short patterns for frequent data Longer patterns for infrequent data. Can achieve 40 to 80 percent reduction in storage capacity.
40-80% 80-95 % Best Data Deduplication
• Data deduplication is a method of reducing storage needs by eliminating duplicate copies of data.
– Store only one unique instance of the data – Redundant data replaced with pointer © 2011 IBM Corporation
Information Management
Storage Tiers – A trade-off between performance and cost
Server Faster Performance Cache, Flash and Solid-State Drives Hard Disk Drives Technologies allow us to place and move data to the appropriate storage tier to balance between performance and cost Lower Cost Tape Cloud © 2011 IBM Corporation
Information Management
4. Security/Privacy
Need real-time data activity monitoring for security & compliance
Continuous, policy-based, real-time monitoring of all data traffic activities, including actions by privileged users Data Repositories
( databases, warehouses, file shares, Big Data)
Database infrastructure scanning for missing patches, mis-configured privileges and other vulnerabilities
Data protection compliance automation
Host-based Probes (S-TAPs)
Key Characteristics
Single Integrated Appliance Non-invasive/disruptive, cross-platform architecture Dynamically scalable SOD enforcement for DBA access Auto discover sensitive resources and data Detect or block unauthorized & suspicious activity Granular, real-time policies
Who, what, when, how
Collector Appliance 100% visibility including local DBA access Minimal performance impact Does not rely on resident logs that can easily be erased by attackers, rogue insiders No environment changes Prepackaged vulnerability knowledge base and compliance reports for SOX, PCI, etc.
Growing integration with broader security and compliance management vision © 2011 IBM Corporation
Information Management
5. Governance
Vision for information integration & governance
Traditional Approach
Structured, analytical, logical
Systems of Record New Approach
Creative, holistic thought, intuition
Systems Of Engagement
Transaction Data Data Warehous e Hadoop Streams Web Logs Internal App Data Mainframe Data
Structured Repeatable Linear
OLTP System Data
Information Integration, Governance & Context Accumulation
Unstructured Exploratory Iterative
Social Data Text & Images Sensor Data ERP data Tradition al Sources New Sources RFID
Systems Of Record and Systems Of Engagement
© 2011 IBM Corporation
Information Management
Governance concerns for big data customers
How do I integrate and link my big data environment with my current one ? How do I cleanse and validate the results of my big data analysis ?
Agile. Simple. Trusted Information.
How do I protect data in a big data environment ?
How do I create a trusted view of my customers and products for big data ?
Is a governed and auditable archive possible with big data ? © 2011 IBM Corporation
Information Management
Governance in an exploratory Big Data environment
1. Ensure trust & compliance Create privatized data in real time or on the cluster to ensure data protection
•Lineage of data as it enters and leaves the big data system •Secure the big data systems from breaches •Create masked dev and test analytics clusters
High Performance and high quality data loads Secured BigInsights to prevent any data breaches 2. Accelerate time to value
•High performance data provisioning •Integrated data integration and stream analytics platform
3. Lower total cost of ownership
•Simplified tooling to improve productivity of developers and testers •Automated system security •Complete visibility into the data movement and lifecycle
Low cost historical archive loaded to Hadoop for exploratory analytics Integration for improved segmentation of analytical data sources
© 2011 IBM Corporation
Information Management
6. Financial
Engagement Model
Information (catalogue and datasets)
Invest and define NS Incubate and evaluate
NS co-invests Accelerate evolution of ecosystem
Link Data
Citizens-Pay
• To private Company for value-added services to citizens
Business Model NS-Pay
• • Pay to private Company for inexpensive services Typically cloud-based
Businesses-Pay
• • • Services free or discounted Funded by other parts of the business Can be non profit organisations
Motivate and educate
Services built & maintained by community on top of open-data
© 2011 IBM Corporation
Information Management
Industrialisation and Collaborative Model
Leverage City Forward model for National Statistics
© 2011 IBM Corporation
Information Management
Impact on Everyday Life
How safe is my neighborhood?
Which career is right for me?
What type of education do I need?
Sources: http://www.chicagocitycrime.com
/, http://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm
, http:// cityforward.org
© 2011 IBM Corporation
Information Management
New Products and Indicators
Evolving beyond statistics to predictive analytics, sharing complementary datasets with private sector and citizens
Examples: Predictive models for healthcare cost reduction and outcome optimisation Epidemic outbreak surveillance – hotspots, progression waves Aligning public services (federal, regional and city level) to existing and predictive demographic data © 2011 IBM Corporation
Information Management
Example:
Traffic Management for Sustainability and Efficiency Multimodal Data Streams – GPS – Cell-phones (location tracking) – Public Transport (bus, docking) – Pollution measurements – Weather Conditions (including road conditions) – Optical traffic flow detectors – Travel time data based on plate recognition – Induction loop detector data – Accidents in network as they are being recorded – Road closures (road work, etc) – Still pictures from road cameras Real Time Traffic Monitoring & Information (Multimodal) Travel Planner GPS Data Streams Real Time Transformation Logic Real Time Geo Mapping Real Time Speed & Heading Estimation Real Time Aggregates & Statistics Interactive visualization Storage adapters 21 Web Server Google Earth Data Warehouse Offline statistical analysis © 2011 IBM Corporation
22 Information Management
Thank You
© 2011 IBM Corporation
www.sendsteps.com
Prepare to react; keep your phone ready!
Internet 1 2 3
Go to
sendc.com
Log in with
Session
Type
WS2
your answer TXT 1 2
Text to
+316 4250 0030
Type
Session
WS2
your answer
Information Management
Posting messages is anonymous No additional charge per message © 2011 IBM Corporation
Information Management
What kind of Use-case enabled by Big Data technology do you think will add value to your organisation for calculating official statistics?
Internet TXT
Go to
sendc.com
and log in with
Session Type WS2
Your answer
Send to
06 4250 0030
:
Session Type WS2
Your answer
© 2011 IBM Corporation