A Comprehensive Framework for Secure Query Processing on Relational Data in the Cloud Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science University.
Download ReportTranscript A Comprehensive Framework for Secure Query Processing on Relational Data in the Cloud Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science University.
A Comprehensive Framework for Secure Query Processing on Relational Data in the Cloud
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science University of California, Santa Barbara SDM 2011
Data Security in the Cloud, A Big Concern!!!
{sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 2
•
Data Security Concerns
Data Confidentiality o Ensure data cannot be seen or inferred • Data Availability o Ensure data is always accessible even if compromised servers exist • Data Integrity o Ensure data retrieved from the cloud is intact • Practical Query Processing o Preserve practical functionalities while being secure {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 3
•
State of the Art Solutions
Data Confidentiality o o o Encryption Database queries on encrypted data [Haci02] balance between functionality and confidentiality hard to Private Information Retrieval (PIR) [Chor98] expensive too • Data Availability o Replication, Information Dispersal Algorithm (IDA) [Rabin89], Error-correcting codes, etc • Data Integrity o Message Authentication Code, Merkle hash tree, etc {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 4
• •
Problems with Existing Solutions
Lack well balance between functionality / practicability and security Lack a comprehensive framework that ensures and integrity secure query processing data confidentiality, availability • Example Problems o Did not consider defending inference attacks on query o o o accesses • Range labels [Hore04] Specific technique for one query not flexible for others • Order-preserving encryption [Agra04] Too expensive • PIR, homomorphic encryption [Gent09] Did not consider data availability or integrity • encrypted range index [Dami03] {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 5
•
Our Goals
A practical solution with balanced maximum functionality and security • A comprehensive secure query processing framework on relational data in the cloud o o Support common database queries : exact, range, updates, inserts and deletes, etc.
Address the concerns of data confidentiality , availability and integrity {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 6
•
Model & Assumptions
A single relational table D with N tuples • Queries mainly on key attributes , supported by an index
I
• The cloud is a shared facility with sufficient concurrent queries • Attackers’ computation abilities are bounded polynomial size circuits.
by {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 7
•
Our Approaches
Leverage Information Dispersal Algorithm (IDA) propose “salted” IDA and • Query via index , access data by column accesses on IDA encoded matrices • Obfuscate query accesses by using trusted proxies • Leverage index structure/data characteristics and message authentication for integrity {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 8
•
IDA
IDA encodes and disperses data into
n uninterpretable pieces
s.t.
only m (m ≤ n) pieces are required to
reconstruct the data.
• Data matrix 𝐷 = 1 4 7 2 5 8 3 6 9 , secret key matrix then encoded data matrix 𝐸 = 𝐶 × 𝐷 = 𝐶 = 8 6 7 12 9 9 13 10 8 4 8 8 5 11 9 1 3 5 1 4 3 1 5 2 1 6 7 1 7 6 , Server 1 Server 2 Server 3 Server 4 Server 5 {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 9
• •
IDA
If 𝑆𝑒𝑟𝑣𝑒𝑟 2 8 6 7 12 9 9 13 10 8 4 8 8 5 11 9 and 𝑆𝑒𝑟𝑣𝑒𝑟 3 = 8 6 7 4 8 8 5 11 9 , are compromised, 𝐸 ∗ 1 3 5 𝐶 ∗ = 1 4 3 1 5 2 1 6 7 = 1 3 5 1 6 7 1 7 6 1 7 6 = , 𝐷 = 𝐶 ∗−1 × 𝐸 ∗ = 1 4 7 2 5 8 3 6 9 {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 10
ID I: B+-tree Index n 2
…
n 1
… … … …
n 1 n 2
… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …
IE E(n 1 ) E(n 2 )
… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …
D: Data Tuples
… t 1 t 2 .
.
, .
.
t N … … … … … … … … … … … … … … … … … … … … … … Ad … … … … … … … S
i
S
n
S
1 TE E(tc 1 ) E(tc 2 )
… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …
TD Cloud Servers tc 1 tc 2
… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 11
IDA Matrix Column Access
• Leverage column accesses for query processing • Retrieve a column 𝐷 :,𝑖 = 𝐶 ∗−1 × 𝐸 ∗ :,𝑖 • Update and encode a column 𝐸 :,𝑖 = 𝐶 × 𝐷 :,𝑖 • Locate columns to access via index {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 12
“Salted” IDA
• • When encoding , add random factors “salt” to each column of data matrix During decoding , reconstruct exact “salt” and deduct them in order to recover data matrix • Producing random factors o o A secret seed
ss
A deterministic function
fs
{sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 13
Secure Column Access Via Proxies
•
Protecting query accesses from inferences
o Route column access requests and responses for different clients through trusted proxies •
Future work for a stronger defense
o k-1 noisy requests {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 14
Data Access Framework
Data Table D
t 1 t 2 t 3 t 4 t 5 t 6 t 7 Perm No 10001 10002 10003 10004 10005 10006 10007 Salary 4000 5000 4000 4000 6000 5500 6000 Age 25 28 25 26 30 28 31 {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 15
Data Access Framework
Data Tuples D
t 1 t 2 t 3 t 4 t 5 t 6 t 7 Perm No 10001 10002 10003 10004 10005 10006 10007 Salary 4000 5000 4000 4000 6000 5500 6000 Age 25 28 25 26 30 28 31
TD 1 2 3 4
10001 4000 25 10002 5000 28 10003 4000 25 10004 4000 26 10005 6000 30 10006 5500 28 10007 6000 31 0 0 0 cksum(t 1 t 2 ) cksum(t 3 t 4 ) cksum(t 5 t 6 ) cksum(t 7 ) 0 0 0 0 0 0 0 0 {sywang, agrawal, amr}@cs.ucsb.edu
TE
TE = C · TD
1 2 3 4
... … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Server 1 Server 2
.
.
.
.
.
.
.
.
.
.
Server n
Data Access Framework
2 Index I
10003 10005
1
10001 10002 1 1
3
10003 2 10004 2
4
10005 3 10006 10007
1 2 3 4 ID
1 0 1 1 0 0 1 3 1 1 2 3 10001 10003 10003 10005 1 3 2 3 10002 10005 10004 10006 0 4 0 4 0 0 0 10007 3 0 4 0 {sywang, agrawal, amr}@cs.ucsb.edu
IE
IE = C · ID
1 2 3 4
... … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 3 4 Server 1 Server 2
.
.
.
.
.
.
.
.
.
.
Server n
1
10001 1
Data Access Framework
2
Perm No Salary
B+-tree Index
10002 10003
3
10003 10005 10004
4
10005 10006
Data Table
10007 t 1 t 2 t 3 t 4 t 5 t 6 t 7 10001 10002 10003 10004 10005 10006 10007 4000 5000 4000 4000 6000 5500 6000 4 2 2 3 3 1 Age 25 28 25 26 30 28 31
ID 1 2 3 4
1 0 1 1 0 0 1 3 1 1 1 2 3 10001 10003 10003 10005 3 2 3 10002 10005 10004 10006 0 4 0 4 0 0 0 10007 3 0 4 0 {sywang, agrawal, amr}@cs.ucsb.edu
IE
i
, TE
i
IE
k
, TE
k
IE
j
, TE
j TD 1 2 3 4
10001 4000 25 10003 4000 25 10005 6000 30 10007 6000 31 10002 5000 10004 4000 10006 5500 0 0 28 26 28 0 cksum(t 1 t 2 ) cksum(t 3 t 4 ) cksum(t 5 t 6 ) cksum(t 7 ) 0 0 0 0 0 0 0 0 4/28/2020 18
Range Query Example
2 Index
10003 10005
1
10001 1 10002 1
3
10003 2 10004 2
TD 1 2 3 4 t l
10001 4000 25 10002 5000 28 10003 4000 25 10004 4000 26 10005 6000 30 10006 5500 28 10007 6000 31 0 0 0 cksum(t 1 t 2 ) cksum(t 3 t 4 ) cksum(t 5 t 6 ) cksum(t 7 ) 0 0 0 0 0 0 0 0
4
10005 3 10006 10007 3 4 4/28/2020 19
Practical Query Performance
• Support exact queries, range queries (range aggregates), data updates, inserts and deletes • leverage index access • Cache partial index nodes on the client {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 20
•
Experimental Setup
Implementation o o our framework , baseline , encrypted range index [Dami03] C++, Crypto++ 5.6.0
• Data table: Item table from TPC-W benchmark • Setup o o o o B+-tree branch factor b=50 m=13, n=21 servers for our framework, one server for baseline and encrypted range index 1000 exact queries, range queries, data updates and inserts Linux servers with Intel 2.40GHz CPU, 3GB memory and Fedora Core 8 OS {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 21
Varying Data Size on Exact Queries
Processing Time Breakdown {sywang, agrawal, amr}@cs.ucsb.edu
Communication Size Breakdown 4/28/2020 22
Varying Selectivity on Range Queries
Client Processing Time {sywang, agrawal, amr}@cs.ucsb.edu
Data Communication Size 4/28/2020 23
Varying Cache Hit Rate on Exact Queries
Client Processing Time {sywang, agrawal, amr}@cs.ucsb.edu
Data Communication Size 4/28/2020 24
•
Conclusion
A comprehensive framework for practical secure query processing on relational data in the cloud • Well balance between functionality and security • Support updates efficient database queries and data • Ensure data confidentiality in both storage and at access time • Provide data availability and integrity {sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 25
• • • •
References
[Haci02] H. Hacigumus et al. Executing sql over encrypted data in the database-service-provider model. In SIGMOD, pages 216-227, 2002.
[Chor98] B. Chor et al. Private information retrieval. In J. ACM, 45(6): 965-981, 1998.
[Rabin89] M. Rabin et al. Efficient dispersal of information for security, load balancing, and fault tolerance. In J. ACM, 36(2): 335-348, 1989.
[Agra04] R. Agrawal et al. Order preserving encryption for numeric data. In SIGMOD, pages 563-574, 2004.
{sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 26
• • •
References
[Gent09] C. Gentry. Fully homomorphic encryption using ideal lattices. In STOC, pages 169-178, 2009.
[Hore04] B. Hore. A privacy-preserving index for range queries. In VLDB, pages 720-731, 2004.
[Dami03] E. Damiani. Balancing confidentiality and efficiency in untrusted relational DBMSs. In CCS, pages 93-102, 2003.
{sywang, agrawal, amr}@cs.ucsb.edu
4/28/2020 27
Effects of Varying Number of Tuples N on Inserts
Client Processing Time {sywang, agrawal, amr}@cs.ucsb.edu
Data Communication Size 4/28/2020 28