Transcript Document

Privacy-Preserving Public Auditing for Data
Storage Security in Cloud Computing
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
1 Illinois Institute of Technology, 2 Worcester
Polytechnic Institute
Proceedings of IEEE Infocom 2010
Computer Systems Lab Group Meeting
Presented by: Zakhia Abichar
February 25, 2010
Cloud Computing
• With cloud computing, users can remotely store their data
into the cloud and use on-demand high-quality applications
• Using a shared pool of configurable computing resources
• Data outsourcing: users are relieved from the burden of data
storage and maintenance
• When users put their data (of large size) on the cloud, the
data integrity protection is challenging
• Enabling public audit for cloud data storage security is
important
• Users can ask an external audit party to check the integrity of
their outsourced data
user
External Audit
party
data
user
user
Cloud network
2
Third Party Auditor (TPA)
•
•
•
•
•
•
External audit party is called TPA
TPA helps the user to audit the data
To allow TPA securely:
1) TPA should audit the data from the cloud, not ask for a copy
2) TPA should not create new vulnerability to user data privacy
This paper presents a privacy-preserving public auditing
system for cloud data storage
data
user
External
Audit party
user
user
Cloud network
3
Outline
•
•
•
•
Introduction
System and threat model
Proposed scheme
Security analysis & performance
evaluation
4
Introduction
•
•
•
•
•
•
•
•
Cloud computing gives flexibility to users
Users pay as much as they use
Users don’t need to set up the large computers
But the operation is managed by the Cloud Service
Provider (CSP)
The user give their data to CSP; CSP has control on the
data
The user needs to make sure the data is correct on the
cloud
Internal (some employee at CSP) and external (hackers)
threats for data integrity
CSP might behave unfaithfully
– For money reasons, CSP might delete data that’s rarely
accessed
– CSP might hide data loss to protect their reputation
5
Introduction
• How to efficiently verify the correctness of
outsourced data?
– Simply downloading the data by the user is not
practical
• TPA can do it and provide an audit report
• TPA should not read the data content
– Legal regulations: US Health Insurance Portability
and Accountability Act (HIPAA)
• This paper presents how to enable privacypreserving third-party auditing protocol
– First work in the literature to do this
6
System and Threat Model
• U: cloud user has a large amount of data files to store in the cloud
• CS: cloud server which is managed by the CSP and has significant
data storage and computing power (CS and CSP are the same in
this paper)
• TPA: third party auditor has expertise and capabilities that U and
CSP don’t have. TPA is trusted to assess the CSP’s storage security
upon request from U
7
A note on auditing
• What’ is auditing?
•
Reference:
http://searchcio.techtarget.com/searchCIO/downloads/AuditTheDataOrElse.pdf
8
A Public Auditing Scheme
This is a framework from previous related work. It is adapted to suit the goals of this paper
• Consists of four algorithms (KeyGen, SigGen,
GenProof, VerifyProof)
• KeyGen: key generation algorithm that is run by the
user to setup the scheme
• SigGen: used by the user to generate verification
metadata, which may consist of MAC, signatures or
other information used for auditing
• GenProof: run by the cloud server to generate a proof
of data storage correctness
• VerifyProof: run by the TPA to audit the proof from
the cloud server
9
Setup
user
KeyGen
SigGen
Public & Secret
parameters
File F
Verification
Metadata
TPA
Audit
TPA
CSP
issues an audit message or a challenge to
GenProof
CSP
File F
Response message
TPA
VerifyProof
Verification Metadata
10
Basic Scheme 1
Block 1
Block 2
… Block n
key
File is divided into blocks
Block 1
user
TPA
code 1
Block 2
code 2
MAC
…
…
Block n
code n
Cloud
-User computes the MAC of every file block
-Transfers the file blocks & codes to cloud
-Shares the key with TPA
File block
code
Message Authentication Code (MAC)
Audit
-TPA demands a random number of
blocks and their code from CSP
-TPA uses the key to verify the
correctness of the file blocks
Drawbacks: -The audit demands retrieval of user’s data; this is not privacy-preserving
-Communication and computation complexity are linear with the sample size
11
Basic Scheme 2
user
Block 1
Key 1
Key 2
Block 2
code 1
code 1
…
code 2
code 2
Block n
…
code n
…
Block 1
Block 2
…
Block m
code n
…
Key s
code 1
code 2
…
code n
Cloud
TPA
Setup
-User uses s keys and computes the MAC for blocks
-User shares the keys and MACs with TPA
Audit
-TPA gives a key (one of the s keys) to CSP and requests MACs for the blocks
-TPA compares with the MACs at the TPA
-Improvement from Scheme 1: TPA doesn’t see the data, preserves privacy
-Drawback: a key can be used once.
-The TPA has to keep a state; remembering which key has been used
-Schemes 1 & 2 are good for static data (data doesn’t change at the cloud)
12
Privacy-Preserving Public Auditing Scheme
Proposed scheme
• Uses homomorphic authenticator
• Also uses a random mask achieved by a Pseudo Random Function (PRF)
Homomorphic authenticator
Block 1
Verification
Metadata
Block 2 … Block k
Verification
Metadata
Verification
Metadata
Aggregate Verification
Metadata
A linear combination of data blocks can be verified by
looking only at the aggregated authenticator
13
Privacy-Preserving Public Auditing Scheme
- In addition to Aggregate Authenticator,
the TPA will receive a linear combination
of file blocks:
Random Mask by PRF
vi are random number
mi are file blocks
-If TPA sees many linear combinations
of the same blocks, it might be able to
infer the file blocks
-This, we also use a random mask
provided by the Pseudo Random
Function (PRF)
-The PRF function masks the data
-It has a property of not affecting the
Verification Metadata
Block 1 with
PRF Mask
Block 1
Verification
Metadata
 Equal 
Verification
Metadata
r is the mask
14
Setup
Block 1
user
KeyGen
sk
…
Block n
Block 1
SigGen
user
σ1
Public key (sk)&
Secret key (pk)
1- User generates public
and secret parameters
Block 2
σ1
Block 2
…
Block n
σ2 … σn
σ2 … σn
2- A code is generated for
each file block
3- The file blocks and their codes
are transmitted to the cloud
Audit
-TPA sends a challenge
message to CSP
-It contains the position
of the blocks that will be
checked in this audit
CSP
Selected blocks in challenge -CSP also makes a linear combination
of selected blocks and applies a
mask. Separate PRF key for each
GenProof
auditing.
-CSP send aggregate authenticator &
Aggregate authenticator masked combination of blocks to TPA
Masked linear combination of requested blocks
TPA
VerifyProof
Aggregate authenticator
Compare the obtained Aggregate
authenticator to the one received from
CSP
15
Properties
• The data sent from CSP to TPA is
independent of the data size
– Linear combination with mask
• Previous work has shown that if the server
is missing 1% of the data
– We need 300 or 460 blocks to detect that with
a probability larger than 95% or 99%,
respectively
16
More Possible Extensions
• Batch auditing
–
–
–
–
There are K users having K files on the same cloud
They have the same TPA
Then, the TPA can combine their queries and save in computation time
The comparison function that compares the aggregate authenticators
has a property that allows checking multiple messages in one equation
– Instead of 2K operation, K+1 are possible
• Data dynamics
– The data on the cloud may change according to applications
– This is achieved by using the data structure Merkle Hash Tree (MHT)
– With MHT, data changes in a certain way; new data is added in some
places
– There is more overhead involved ; user sends the tree root to TPA
– This scheme is not evaluated in the paper
17
Performance
• Reference [11] doesn’t have privacypreserving property
– TPA can read the information
18
Batch Auditing
• Number of auditing tasks increased from 1 to 200 in
multiple of 8
• Auditing time per task: total auditing time / number of tasks
19
Performance with Invalid Responses
• In batch auditing, true means that all of the
messages are correct
• False means at least one is wrong
– Divide batch in half, repeat for left- and right
parts
– Binary search
Wrong
1
2
3
4
5
6
7
8
9
10
Wrong
1
2
3
4
5
6
7
8
9
10
1,2,3 and 9,10
1
2
3
4
5
6
7
8
9
10
3 and 10
1
2
3
4
5
6
7
8
9
10
20
The more errors that there is, it takes more time to find them
21