Transcript IBM - CBS

Big Data Driven:

Official Statistics

Amish Patel, Big Data Leader for Government, Europe [email protected]

Information Management

© 2011 IBM Corporation

Information Management

AGENDA

Drivers for leveraging Big Data

Implications of Big Data on Official Statistics

Challenges & Opportunities

Industrialisation and Collaborative model

New products and indicators

© 2011 IBM Corporation

Information Management

DRIVERS FOR LEVERAGING BIG DATA

© 2011 IBM Corporation

Information Management © 2011 IBM Corporation

Information Management

The Big Data Conundrum

 The economies of deletion have changed….

– Leading us into new opportunities and challenges  The percentage of available data an enterprise can analyze is decreasing proportionately to the data available to that enterprise – Quite simply, this means as enterprises, we are getting “ more naive ” about our business over time  Just collecting and storing “Big Data” doesn’t drive a cent of value to an organization’s bottom line Data AVAILABLE to an organization Data an organization can PROCESS © 2011 IBM Corporation

6 Information Management

Implications Of Big Data On Official Statistics

© 2011 IBM Corporation

2.

3.

4.

5.

6.

Information Management

Challenges & Opportunity

1.

Impact on Policy and Development issues Methodological: bridging the gaps by combining multiple data sources Technology (processing and storage) Security/Privacy Governance Financial © 2011 IBM Corporation

Information Management

1. Impact On Policy And Development Issues

Example: Leveraging Big Data for Currency of National Statistics

© 2011 IBM Corporation

Information Management

2. Methodological

Example: Bridging the gaps by combining multiple data sources

© 2011 IBM Corporation

Information Management

3. Technology – Processing and Storage

Example: Storage is key to your Infrastructure

Cloud Agile Efficient by Design

Designed for

seconds through systems built to process a variety of data at scale Smarter Storage Incorporates cloud technologies to improve service quality, speed of delivery and efficiency Optimize performance and cost by matching workloads with the best platform to meet specific workload requirements Self-Optimizing

© 2011 IBM Corporation 10

Information Management

Data Footprint Reduction

Active Data Backup Data Real-time Compression 40-80% Best 20-30%

Real-Time Compression is a method of reducing storage needs by changing the encoding scheme as the data is being read and written – – – Short patterns for frequent data Longer patterns for infrequent data. Can achieve 40 to 80 percent reduction in storage capacity.

40-80% 80-95 % Best Data Deduplication

Data deduplication is a method of reducing storage needs by eliminating duplicate copies of data.

– Store only one unique instance of the data – Redundant data replaced with pointer © 2011 IBM Corporation

Information Management

Storage Tiers – A trade-off between performance and cost

Server Faster Performance Cache, Flash and Solid-State Drives Hard Disk Drives Technologies allow us to place and move data to the appropriate storage tier to balance between performance and cost Lower Cost Tape Cloud © 2011 IBM Corporation

Information Management

4. Security/Privacy

Need real-time data activity monitoring for security & compliance

Continuous, policy-based, real-time monitoring of all data traffic activities, including actions by privileged users Data Repositories

( databases, warehouses, file shares, Big Data) 

Database infrastructure scanning for missing patches, mis-configured privileges and other vulnerabilities

Data protection compliance automation

Host-based Probes (S-TAPs)

Key Characteristics

       Single Integrated Appliance Non-invasive/disruptive, cross-platform architecture Dynamically scalable SOD enforcement for DBA access Auto discover sensitive resources and data Detect or block unauthorized & suspicious activity Granular, real-time policies 

Who, what, when, how

Collector Appliance    100% visibility including local DBA access Minimal performance impact Does not rely on resident logs that can easily be erased by attackers, rogue insiders   No environment changes Prepackaged vulnerability knowledge base and compliance reports for SOX, PCI, etc.

 Growing integration with broader security and compliance management vision © 2011 IBM Corporation

Information Management

5. Governance

Vision for information integration & governance

Traditional Approach

Structured, analytical, logical

Systems of Record New Approach

Creative, holistic thought, intuition

Systems Of Engagement

Transaction Data Data Warehous e Hadoop Streams Web Logs Internal App Data Mainframe Data

Structured Repeatable Linear

OLTP System Data

Information Integration, Governance & Context Accumulation

Unstructured Exploratory Iterative

Social Data Text & Images Sensor Data ERP data Tradition al Sources New Sources RFID

Systems Of Record and Systems Of Engagement

© 2011 IBM Corporation

Information Management

Governance concerns for big data customers

How do I integrate and link my big data environment with my current one ? How do I cleanse and validate the results of my big data analysis ?

Agile. Simple. Trusted Information.

How do I protect data in a big data environment ?

How do I create a trusted view of my customers and products for big data ?

Is a governed and auditable archive possible with big data ? © 2011 IBM Corporation

Information Management

Governance in an exploratory Big Data environment

1. Ensure trust & compliance Create privatized data in real time or on the cluster to ensure data protection

•Lineage of data as it enters and leaves the big data system •Secure the big data systems from breaches •Create masked dev and test analytics clusters

High Performance and high quality data loads Secured BigInsights to prevent any data breaches 2. Accelerate time to value

•High performance data provisioning •Integrated data integration and stream analytics platform

3. Lower total cost of ownership

•Simplified tooling to improve productivity of developers and testers •Automated system security •Complete visibility into the data movement and lifecycle

Low cost historical archive loaded to Hadoop for exploratory analytics Integration for improved segmentation of analytical data sources

© 2011 IBM Corporation

Information Management

6. Financial

Engagement Model

Information (catalogue and datasets)

Invest and define NS Incubate and evaluate

NS co-invests Accelerate evolution of ecosystem

Link Data

Citizens-Pay

• To private Company for value-added services to citizens

Business Model NS-Pay

• • Pay to private Company for inexpensive services Typically cloud-based

Businesses-Pay

• • • Services free or discounted Funded by other parts of the business Can be non profit organisations

Motivate and educate

Services built & maintained by community on top of open-data

© 2011 IBM Corporation

Information Management

Industrialisation and Collaborative Model

Leverage City Forward model for National Statistics

© 2011 IBM Corporation

Information Management

Impact on Everyday Life

How safe is my neighborhood?

 Which career is right for me?

 What type of education do I need?

Sources: http://www.chicagocitycrime.com

/, http://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm

, http:// cityforward.org

© 2011 IBM Corporation

Information Management

New Products and Indicators

Evolving beyond statistics to predictive analytics, sharing complementary datasets with private sector and citizens

Examples:  Predictive models for healthcare cost reduction and outcome optimisation  Epidemic outbreak surveillance – hotspots, progression waves  Aligning public services (federal, regional and city level) to existing and predictive demographic data © 2011 IBM Corporation

Information Management

Example:

Traffic Management for Sustainability and Efficiency  Multimodal Data Streams – GPS – Cell-phones (location tracking) – Public Transport (bus, docking) – Pollution measurements – Weather Conditions (including road conditions) – Optical traffic flow detectors – Travel time data based on plate recognition – Induction loop detector data – Accidents in network as they are being recorded – Road closures (road work, etc) – Still pictures from road cameras  Real Time Traffic Monitoring & Information  (Multimodal) Travel Planner GPS Data Streams Real Time Transformation Logic Real Time Geo Mapping Real Time Speed & Heading Estimation Real Time Aggregates & Statistics Interactive visualization Storage adapters 21 Web Server Google Earth Data Warehouse Offline statistical analysis © 2011 IBM Corporation

22 Information Management

Thank You

© 2011 IBM Corporation

www.sendsteps.com

Prepare to react; keep your phone ready!

Internet 1 2 3

Go to

sendc.com

Log in with

Session

Type

WS2

your answer TXT 1 2

Text to

+316 4250 0030

Type

Session

WS2

your answer

Information Management

Posting messages is anonymous No additional charge per message © 2011 IBM Corporation

Information Management

What kind of Use-case enabled by Big Data technology do you think will add value to your organisation for calculating official statistics?

Internet TXT

Go to

sendc.com

and log in with

Session Type WS2

Your answer

Send to

06 4250 0030

:

Session Type WS2

Your answer

© 2011 IBM Corporation