Airavat: Security and Privacy for MapReduce

Download Report

Transcript Airavat: Security and Privacy for MapReduce

Security and Privacy in
Cloud Computing
Ragib Hasan
Johns Hopkins University
en.600.412 Spring 2011
Lecture 8
04/04/2011
Enforcing Data Privacy in Cloud
Goal: Examine techniques for ensuring data
privacy in computations outsourced to a cloud
Review Assignment #7: (Due 4/11)
Roy et al., Airavat: Security and Privacy for
MapReduce, NSDI 2010
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Recap: Cloud Forensics (Bread &
Butter paper from ASIACCS 2010)
• Strengths?
• Weaknesses?
• Ideas?
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
What does privacy mean?
• Information Privacy is the interest an
individual has in controlling, or at least
significantly influencing, the handling of data
about themselves.
• Confidentiality is the legal duty of individuals
who come into the possession of information
about others, especially in the course of
particular kinds of relationships with them.
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Problem of making large datasets public
Model:
– One party owns the dataset
– Another party wants to run some computations
on it
– A third party may take data from the first party,
run functions (from the second party) on the data,
and provide the results to the second party
Problem:
– How can the data provider ensure the
confidentiality and privacy of their sensitive data?
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Problem of making large datasets public
• Massachusetts Insurance Database
– DB was anonymized, with only birthdate, sex, and
zip code made available to public
– Latanya Sweeny of CMU took the DB and voter
records, and pinpointed the MA Governor’s record
• Netflix Prize Database
– DB was anonymized, with user names replaced
with random IDs
– Narayanan et al. used Netflix DB and imDB data to
de-anonymize users
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Differential Privacy schemes can
ensure privacy of statistical queries
• Differential privacy aims to provide means to
maximize the accuracy of queries from
statistical databases while minimizing the
chances of identifying its records.
• Informally, given the output of a computation
or a query, an attacker cannot tell whether
any particular value was in the input data set.
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Securing MapReduce for Privacy and
Confidentiality
• Paper:
– Roy et al., Airavat: Security and Privacy for
MapReduce
– Goal: Secure MapReduce to provide
confidentiality and privacy assurances for sensitive
data
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
System Model
• Data providers: own data sets
• Computation provider: provides MapReduce
code
• Airavat Framework: Cloud provider where
the MapReduce code is run on uploaded data
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Threat Model
• Assets: Sensitive data or outputs
• Attacker model:
– Cloud provider (where Airavat is Run) is trustworthy
– Computation provider (user who queries, provides
Mapper and Reducer functions) can be malicious
• Functions provided by the Computation provider can be
malicious.
• Cloud provider does not perform code analysis on usergenerated functions
– Data provider is trustworthy
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
MapReduce
• MapReduce is a widely used and deployed
distributed computation model
• Input data is divided into chunks
• Mapper nodes run a mapping function on a
chunk and output a set of <key, value> pairs
• Reducer nodes combine values related to a
particular key based on a function, and output
to a file
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Key design concepts
• Goal: Ensure privacy of source data
• Concept used: Differential privacy – ensure
that no sensitive data is leaked.
• Method used: Adds random Laplacian noise
to outputs
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Key design concepts
• Goal: Prevent malicious users from preparing sensitive
functions that leak data.
• Concept used: Functional sensitivity - How much the
output changes when a single element is included/removed
from inputs
– More sensitivity: more information is leaked
• How is used? :
– Airavat requires CPs to give range of possible output values.
– This is used to determine sensitivity of CP-written mapper
functions.
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Key design concepts
• Goal: Prevent users from sending many brute
force queries and try to reveal the input data.
• Concept used: Privacy budget (defined by data
provider)
• How used:
– Data sources set privacy budget for data.
– Each time a query is run, the budget is decreased, and
– Once the budget is used up, user cannot run more
queries.
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Airavat system design
• Mappers are provided by computation provider,
and hence are not trusted
• Reducers are provided by Airavat. They are
trusted
– Airavat only supports a small set of reducers.
• Keys must be pre-declared by CP (why?)
• Airavat generates enough noise to assure
differential privacy of values
• Range enforcers ensure that output values from
mappers lie within declared range
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Security via Mandatory Access Control
• In MAC, Operating System enforces access
control at each access
• Access control rights cannot be overridden by
users
• Airavat uses SELinux – a special Linux
distribution that supports MAC (developed by
NSA)
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Security via MAC
• Each data object and process is tagged
showing the trust level of the object
• Data providers can set a declassify bit for their
data, in which case the result will be released
when there is no differential privacy violation
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Implementation
• Airavat was implemented on Hadoop and
Hadoop FS.
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan
Further reading
Cynthia Dwork defines Differential Privacy, interesting blog post that gives
high level view of differential privacy.
http://www.ethanzuckerman.com/blog/2010/09/29/cynthia-dwork-definesdifferential-privacy/
4/4/2011
en.600.412 Spring 2011 Lecture 8 | JHU | Ragib Hasan