A Comprehensive Framework for Secure Query Processing on Relational Data in the Cloud Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science University.

Download Report

Transcript A Comprehensive Framework for Secure Query Processing on Relational Data in the Cloud Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science University.

A Comprehensive Framework for Secure Query Processing on Relational Data in the Cloud

Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science University of California, Santa Barbara SDM 2011

Data Security in the Cloud, A Big Concern!!!

{sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 2

Data Security Concerns

Data Confidentiality o Ensure data cannot be seen or inferred • Data Availability o Ensure data is always accessible even if compromised servers exist • Data Integrity o Ensure data retrieved from the cloud is intact • Practical Query Processing o Preserve practical functionalities while being secure {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 3

State of the Art Solutions

Data Confidentiality o o o Encryption Database queries on encrypted data [Haci02]  balance between functionality and confidentiality hard to Private Information Retrieval (PIR) [Chor98] expensive  too • Data Availability o Replication, Information Dispersal Algorithm (IDA) [Rabin89], Error-correcting codes, etc • Data Integrity o Message Authentication Code, Merkle hash tree, etc {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 4

• •

Problems with Existing Solutions

Lack well balance between functionality / practicability and security Lack a comprehensive framework that ensures and integrity secure query processing data confidentiality, availability • Example Problems o Did not consider defending inference attacks on query o o o accesses • Range labels [Hore04] Specific technique for one query not flexible for others • Order-preserving encryption [Agra04] Too expensive • PIR, homomorphic encryption [Gent09] Did not consider data availability or integrity • encrypted range index [Dami03] {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 5

Our Goals

A practical solution with balanced maximum functionality and security • A comprehensive secure query processing framework on relational data in the cloud o o Support common database queries : exact, range, updates, inserts and deletes, etc.

Address the concerns of data confidentiality , availability and integrity {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 6

Model & Assumptions

A single relational table D with N tuples • Queries mainly on key attributes , supported by an index

I

• The cloud is a shared facility with sufficient concurrent queries • Attackers’ computation abilities are bounded polynomial size circuits.

by {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 7

Our Approaches

Leverage Information Dispersal Algorithm (IDA) propose “salted” IDA and • Query via index , access data by column accesses on IDA encoded matrices • Obfuscate query accesses by using trusted proxies • Leverage index structure/data characteristics and message authentication for integrity {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 8

IDA

IDA encodes and disperses data into

n uninterpretable pieces

s.t.

only m (m ≤ n) pieces are required to

reconstruct the data.

• Data matrix 𝐷 = 1 4 7 2 5 8 3 6 9 , secret key matrix then encoded data matrix 𝐸 = 𝐶 × 𝐷 = 𝐶 = 8 6 7 12 9 9 13 10 8 4 8 8 5 11 9 1 3 5 1 4 3 1 5 2 1 6 7 1 7 6 , Server 1 Server 2 Server 3 Server 4 Server 5 {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 9

• •

IDA

If 𝑆𝑒𝑟𝑣𝑒𝑟 2 8 6 7 12 9 9 13 10 8 4 8 8 5 11 9 and 𝑆𝑒𝑟𝑣𝑒𝑟 3 = 8 6 7 4 8 8 5 11 9 , are compromised, 𝐸 ∗ 1 3 5 𝐶 ∗ = 1 4 3 1 5 2 1 6 7 = 1 3 5 1 6 7 1 7 6 1 7 6 = , 𝐷 = 𝐶 ∗−1 × 𝐸 ∗ = 1 4 7 2 5 8 3 6 9 {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 10

ID I: B+-tree Index n 2

n 1

… … … …

n 1 n 2

… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …

IE E(n 1 ) E(n 2 )

… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …

D: Data Tuples

… t 1 t 2 .

.

, .

.

t N … … … … … … … … … … … … … … … … … … … … … … Ad … … … … … … … S

i

S

n

S

1 TE E(tc 1 ) E(tc 2 )

… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …

TD Cloud Servers tc 1 tc 2

… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 11

IDA Matrix Column Access

• Leverage column accesses for query processing • Retrieve a column 𝐷 :,𝑖 = 𝐶 ∗−1 × 𝐸 ∗ :,𝑖 • Update and encode a column 𝐸 :,𝑖 = 𝐶 × 𝐷 :,𝑖 • Locate columns to access via index {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 12

“Salted” IDA

• • When encoding , add random factors “salt” to each column of data matrix During decoding , reconstruct exact “salt” and deduct them in order to recover data matrix • Producing random factors o o A secret seed

ss

A deterministic function

fs

{sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 13

Secure Column Access Via Proxies

Protecting query accesses from inferences

o Route column access requests and responses for different clients through trusted proxies •

Future work for a stronger defense

o k-1 noisy requests {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 14

Data Access Framework

Data Table D

t 1 t 2 t 3 t 4 t 5 t 6 t 7 Perm No 10001 10002 10003 10004 10005 10006 10007 Salary 4000 5000 4000 4000 6000 5500 6000 Age 25 28 25 26 30 28 31 {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 15

Data Access Framework

Data Tuples D

t 1 t 2 t 3 t 4 t 5 t 6 t 7 Perm No 10001 10002 10003 10004 10005 10006 10007 Salary 4000 5000 4000 4000 6000 5500 6000 Age 25 28 25 26 30 28 31

TD 1 2 3 4

10001 4000 25 10002 5000 28 10003 4000 25 10004 4000 26 10005 6000 30 10006 5500 28 10007 6000 31 0 0 0 cksum(t 1 t 2 ) cksum(t 3 t 4 ) cksum(t 5 t 6 ) cksum(t 7 ) 0 0 0 0 0 0 0 0 {sywang, agrawal, amr}@cs.ucsb.edu

TE

TE = C · TD

1 2 3 4

... … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Server 1 Server 2

.

.

.

.

.

.

.

.

.

.

Server n

Data Access Framework

2 Index I

10003 10005

1

10001 10002 1 1

3

10003 2 10004 2

4

10005 3 10006 10007

1 2 3 4 ID

1 0 1 1 0 0 1 3 1 1 2 3 10001 10003 10003 10005 1 3 2 3 10002 10005 10004 10006 0 4 0 4 0 0 0 10007 3 0 4 0 {sywang, agrawal, amr}@cs.ucsb.edu

IE

IE = C · ID

1 2 3 4

... … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 3 4 Server 1 Server 2

.

.

.

.

.

.

.

.

.

.

Server n

1

10001 1

Data Access Framework

2

Perm No Salary

B+-tree Index

10002 10003

3

10003 10005 10004

4

10005 10006

Data Table

10007 t 1 t 2 t 3 t 4 t 5 t 6 t 7 10001 10002 10003 10004 10005 10006 10007 4000 5000 4000 4000 6000 5500 6000 4 2 2 3 3 1 Age 25 28 25 26 30 28 31

ID 1 2 3 4

1 0 1 1 0 0 1 3 1 1 1 2 3 10001 10003 10003 10005 3 2 3 10002 10005 10004 10006 0 4 0 4 0 0 0 10007 3 0 4 0 {sywang, agrawal, amr}@cs.ucsb.edu

IE

i

, TE

i

IE

k

, TE

k

IE

j

, TE

j TD 1 2 3 4

10001 4000 25 10003 4000 25 10005 6000 30 10007 6000 31 10002 5000 10004 4000 10006 5500 0 0 28 26 28 0 cksum(t 1 t 2 ) cksum(t 3 t 4 ) cksum(t 5 t 6 ) cksum(t 7 ) 0 0 0 0 0 0 0 0 4/28/2020 18

Range Query Example

2 Index

10003 10005

1

10001 1 10002 1

3

10003 2 10004 2

TD 1 2 3 4 t l

10001 4000 25 10002 5000 28 10003 4000 25 10004 4000 26 10005 6000 30 10006 5500 28 10007 6000 31 0 0 0 cksum(t 1 t 2 ) cksum(t 3 t 4 ) cksum(t 5 t 6 ) cksum(t 7 ) 0 0 0 0 0 0 0 0

4

10005 3 10006 10007 3 4 4/28/2020 19

Practical Query Performance

• Support exact queries, range queries (range aggregates), data updates, inserts and deletes • leverage index access • Cache partial index nodes on the client {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 20

Experimental Setup

Implementation o o our framework , baseline , encrypted range index [Dami03] C++, Crypto++ 5.6.0

• Data table: Item table from TPC-W benchmark • Setup o o o o B+-tree branch factor b=50 m=13, n=21 servers for our framework, one server for baseline and encrypted range index 1000 exact queries, range queries, data updates and inserts Linux servers with Intel 2.40GHz CPU, 3GB memory and Fedora Core 8 OS {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 21

Varying Data Size on Exact Queries

Processing Time Breakdown {sywang, agrawal, amr}@cs.ucsb.edu

Communication Size Breakdown 4/28/2020 22

Varying Selectivity on Range Queries

Client Processing Time {sywang, agrawal, amr}@cs.ucsb.edu

Data Communication Size 4/28/2020 23

Varying Cache Hit Rate on Exact Queries

Client Processing Time {sywang, agrawal, amr}@cs.ucsb.edu

Data Communication Size 4/28/2020 24

Conclusion

A comprehensive framework for practical secure query processing on relational data in the cloud • Well balance between functionality and security • Support updates efficient database queries and data • Ensure data confidentiality in both storage and at access time • Provide data availability and integrity {sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 25

• • • •

References

[Haci02] H. Hacigumus et al. Executing sql over encrypted data in the database-service-provider model. In SIGMOD, pages 216-227, 2002.

[Chor98] B. Chor et al. Private information retrieval. In J. ACM, 45(6): 965-981, 1998.

[Rabin89] M. Rabin et al. Efficient dispersal of information for security, load balancing, and fault tolerance. In J. ACM, 36(2): 335-348, 1989.

[Agra04] R. Agrawal et al. Order preserving encryption for numeric data. In SIGMOD, pages 563-574, 2004.

{sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 26

• • •

References

[Gent09] C. Gentry. Fully homomorphic encryption using ideal lattices. In STOC, pages 169-178, 2009.

[Hore04] B. Hore. A privacy-preserving index for range queries. In VLDB, pages 720-731, 2004.

[Dami03] E. Damiani. Balancing confidentiality and efficiency in untrusted relational DBMSs. In CCS, pages 93-102, 2003.

{sywang, agrawal, amr}@cs.ucsb.edu

4/28/2020 27

Effects of Varying Number of Tuples N on Inserts

Client Processing Time {sywang, agrawal, amr}@cs.ucsb.edu

Data Communication Size 4/28/2020 28