Big Data - Chapters Site

Download Report

Transcript Big Data - Chapters Site

Big Data
Big Data
Stephen Head
Senior Manager, IT Risk Advisory Services
1
Big Data
2
Big Data
Our Time Today
•
•
•
•
•
•
•
•
Attributes of Big Data
Uses of Big Data
Types of Big Data
Big Data Tools
Big Data Controls
Privacy Risks
Security and Governance
Questions
Big
Data
Tools
Improved
Decision
Making
3
Attributes of Big Data
Big Data
Data – How Big is Big?
1 Kilobyte = 1,000 bytes
1 Megabyte = 1,000,000 bytes
1 Gigabyte = 1,000,000,000 bytes
1 Terabyte = 1,000,000,000,000 bytes
1 Petabyte = 1,000,000,000,000,000 bytes
1 Exabyte = 1,000,000,000,000,000,000 bytes
1 Zettabyte = 1,000,000,000,000,000,000,000 bytes
?
1 Yottabyte = 1,000,000,000,000,000,000,000,000 bytes
5
Big Data
Where Do Humans Fit on the Chart
The human brain consists of about one billion neurons. Each neuron
forms about 1,000 connections to other neurons, amounting to more
than a trillion connections.
If each neuron could only help store a single memory, running out of
space would be a problem. You might have only a few gigabytes of
storage space, similar to the space in an iPod or a USB flash drive.
Yet neurons combine so that each one helps with many memories at a
time, exponentially increasing the brain’s memory storage capacity to
something closer to around 2.5 petabytes.
Source: Scientific American, May/June 2010
6
Big Data
Where Do Humans Fit on the Chart
Researchers have been able to encode a draft of an entire book into
DNA. The 5.27 MB file contains 53,246 words, 11 JPG images, as well
as a JavaScript program, making this the largest piece of non-biological
data ever stored in DNA. The scientists published their findings in the
journal Science.
In theory, two bits of data could be incorporated per nucleotide, implying
that each gram of DNA could store 455 exabytes of data (1 exabyte is 1
million terabytes), which outstrips inorganic storage devices like flash
memory, hard disks, and even quantum-computing methods.
Source: Science Tech Daily, August 21, 2012
7
Big Data
Definition of Big Data
Big data is where the data volume, acquisition velocity,
or data representation limits the ability to perform
effective analysis using traditional relational
approaches or requires the use of significant
horizontal scaling for efficient processing. - NIST
8
Big Data
What Attributes Define Big Data?
Gartner defines Big Data using three Vs
9
Uses of Big Data
Big Data
Uses of Big Data
• Timely insights from the vast amounts of data. This includes those
already stored in company databases, from external third-party
sources, the Internet, social media and remote sensors.
• Real-time monitoring and forecasting of events that impact either
business performance or operation.
• Identifying significant information that can improve decision quality.
• Mitigating risk by optimizing the complex decisions of unplanned
events more rapidly.
Source: McKinsey Global Institute
11
Big Data
Sources of Big Data
Data Analytics for Information Security. © 2012 Information Security Forum Limited. All rights reserved.
12
Big Data
Potential Value of Big Data
• $300 billion potential annual value to US health care.
• €250 billion annual value to Europe’s Public Sector Administration.
• $600 billion potential annual consumer surplus from using personal
location data.
• 60% potential in retailers’ operating margins.
Source: McKinsey Global Institute
13
Types of Big Data
Big Data
Types of Big Data
Type 1: This is where a non-relational data representation required for effective analysis.
Type 2: This is where horizontal scalability is required for efficient processing.
Type 3: This is where a non-relational data representation processed with a horizontally
scalable solution is required for both effective analysis and efficient processing.
Source: NIST
15
Big Data Tools
Big Data
Big Data Tools
• MapReduce (originally a proprietary technology of Google, but now a
term used generically) is a programming model for parallel operations
across a practically unlimited number of processors.
• Hadoop is a popular open‐source programming platform and program
library based on the same ideas.
• NoSQL (the name derived from “not Structured Query Language”) is a
set of database technologies that relaxes many of the restrictions of
traditional, “relational” databases and allows for better scalability
across the many processors in one or more data centers.
• Berkeley Data Analytics Stack, an open‐source platform that
outperforms Hadoop and is being used by such companies as
Foursquare, Yahoo, and Amazon Web Services.
17
Big Data
Big Data Tools
18
Big Data Controls
Big Data
Big Data Controls
• Confidentiality
– Regulated Data
– Access Restricted
– Encrypted
• Integrity
– Reliable
– Complete
– Accurate
• Availability
– Accessible
– Resilient
20
Big Data
COSO Principle 13
The organization obtains or generates and uses relevant, quality
information to support the functioning of internal control.
Identifies Information Requirements—A process is in place to identify the
information required and expected to support the functioning of the other
components of internal control and the achievement of the entity’s objectives.
Captures Internal and External Sources of Data—Information systems
capture internal and external sources of data.
Processes Relevant Data into Information—Information systems process and
transform relevant data into information.
Maintains Quality throughout Processing—Information systems produce
information that is timely, current, accurate, complete, accessible, protected, and
verifiable and retained. Information is reviewed to assess its relevance in
supporting the internal control components.
Source: COSO, Internal Control––Integrated Framework Executive Summary, USA, May 2013.
21
Privacy Risks
Big Data
Privacy Risks
23
Big Data
Privacy Risks
“We have the capacity to send every customer an ad booklet, specifically
designed for them, that says, ‘Here’s everything you bought last week and a
coupon for it,” one executive told me. As his computers crawled through the
data, he was able to identify about 25 products that, when analyzed together,
allowed him to assign each shopper a “pregnancy prediction” score. More
important, he could also estimate her due date to within a small window, so
RETAILER could send coupons timed to very specific stages of her pregnancy.
“With the pregnancy products, though, we learned that some women react
badly,” the executive said. “Then we started mixing in all these ads for things we
knew pregnant women would never buy, so the baby ads looked random. That
way, it looked like all the products were chosen by chance.”
“And we found out that as long as a pregnant woman thinks she hasn’t been
spied on, she’ll use the coupons. She just assumes that everyone else on her
block got the same mailer for diapers and cribs. As long as we don’t spook her, it
works.”
Charles Duhigg, How Companies Learn Your Secrets, New York Times. February 16, 2012
24
Big Data
Protecting Privacy
• Data anonymization/sanitization or deidentification
• Adequate, relevant, useful and current big data privacy
policies, processes, procedures and supporting structures
• Senior management buy-in and evidence of continuous
commitment to protect privacy
• Appropriate data destruction, comprehensive data
management policy, clearly defined disposal ownership and
accountability
• Compliance with legal and regulatory data requirements
Source: Privacy and Big Data, page 11. © 2013 ISACA. All rights
reserved.
25
Security and Governance
Big Data
Security
• Security must adopt a big data view…The age of big data has
arrived in security management.
• We must collect data throughout the enterprise, not just logs.
• We must provide context and perform real time analysis.
Arthur Coviello, Chairman RSA
Source: Economist; http://searchcloudsecurity.techtarget.com/news/2240111123/Coviello-‐talks-‐about-‐building-‐a-‐trusted-‐cloud-‐resilient-‐security.
27
Big Data
Findings from the Information Security Forum
• Big data analytics is delivering value today
• Big data analytics has the potential to reduce cyber security risk and
increase agility
• Despite its potential, big data analytics is not yet mature within
information security
• Big data analytics is challenging, but manageable
• Existing big data analytics capabilities can be leveraged to improve
information security
28
Big Data
Information Security Uses of Big Data
• Monitoring security incidents and events
• Producing cyber intelligence
• Addressing phishing
• Keeping systems available
• Discovering a breach
• Identifying threat trends and evolution
• Detecting an embedded cyber attack
Data Analytics for Information Security. © 2012 Information Security Forum Limited. All rights reserved.
29
Big Data
Key Governance Questions
1. Can we trust our sources of big data?
2. What information are we collecting that may expose the enterprise
to legal and regulatory battles?
3. How will we protect our sources, our processes and our decisions
from theft and corruption?
4. What policies are in place to ensure that employees keep
stakeholder information confidential during and after employment?
5. What actions are we taking that create trends that can be exploited
by our rivals?
Source: Privacy and Big Data, page 10. © 2013 ISACA.
All rights reserved.
30
Big Data
Summary
The potential value of Big Data to
organizations is huge.
The tools to fully exploit Big Data are in
varying stages of development.
The potential risks posed by Big Data are
also significant.
As auditors, you are in an ideal position to
help ensure that proper controls are put in
place to mitigate these risks and realize the
full potential offered by Big Data.
31
Big Data
Questions?
Stephen Head
[email protected]
704-953-6688
32