ObliviStore: High Performance Oblivious Cloud Storage

Download Report

Transcript ObliviStore: High Performance Oblivious Cloud Storage

ObliviStore
High Performance
Oblivious Cloud Storage
Emil Stefanov
Elaine Shi
[email protected]
[email protected]
UC Berkeley
UMD
http://www.emilstefanov.net/Research/ObliviousRam/
Cloud Storage
Dropbox
Amazon S3, EBS
SkyDrive
EMC
Atmos
Windows Azure
Storage
Mozy
iCloud
Google Storage
Data Privacy
• Data privacy is a growing concern.
• So, many organizations encrypt their data.
• Encryption is not enough.
• Access patterns leak sensitive information.
• E.g., 80% of search queries (Islam et. al)
Oblivious Storage (ORAM)
• Goal: Conceal access patterns to remote storage.
• An observer cannot distinguish a sequence of
read/write operations from random.
Client
Read(x)
Write(y, data)
Read(z)
... etc
Untrusted
Cloud Storage
Proposed by Goldreich and Ostrovsky. [GO96, OS97]
Recently: [WS08, PR10, GM10, GMOT11, BMP11, SCSL11, SSS12,
GMOT12, KLO12, WR12, LPMRS13, … ]
heavyweight
(offers scalability)
ORAM Node
Hybrid Cloud
ORAM Node
Public Cloud
(untrusted)
ORAM Node
Oblivious Load Balancer
lightweight
Private Cloud (trusted)
(stores 0.25% of data)
(e.g., corporate cloud)
Client
Client
Client
Client
Client
Client
Trusted Hardware in the Cloud
entire storage system untrusted
ORAM Node
ORAM Node
ORAM Node
networking
untrusted
few machines
with trusted
hardware
Client
Oblivious Load Balancer
Client
Client
Client
Client
Client
Contributions
• Built end-to-end oblivious storage system.
– Open source code available.
• Fully asynchronous design – no blocking on I/O
– Efficiently handles thousands of simultaneous operations.
• High performance (throughput & response time)
– High throughput over high latency connections.
– Much faster than existing systems.
• Oblivious load balancing technique for distributing
the ORAM workload.
• Optimized for both SSDs and HDDs.
Performance Challenges
Untrusted Cloud
Client
Server
client storage
bandwidth cost,
response time,
block size
focus on exact
(not asymptotic)
performance
storage IO cost,
seeks
scalability to multiple servers
Security Challenges
• Goals:
• Oblivious asynchronous scheduling.
– Scheduling should not leak private information.
– Oblivious load balancing across multiple machines.
• Load distribution should be independent of access pattern.
• Adversary can:
–
–
–
–
–
Observe raw storage locations accessed.
Observe network traffic patterns.
Maliciously delay storage and network IO.
Attempt to corrupt data.
etc.
Overview
Partitioned ORAM
Partition
• Based on GoldreichOstrovsky scheme.
𝑶 𝐥𝐨𝐠 𝑵
levels
Level ℓ has
𝟐ℓ blocks
Reading from a Partition
• Read one block from each level.
• One of them is the real block.
Server
Client
Writing to a Partition (shuffling)
Server (before)
Client
Server (after)
• Shuffle consecutively
filled levels.
• Write into next
unfilled level.
shuffle
blocks
Challenge
•
•
•
•
Parallelism
Overlapping reading & shuffling
Maintaining low client storage
Preserving security
Architecture
Partitions
Server
Client
Storage Cache
CacheIn (addr)
Partition Reader
CacheIn (addr)
CacheOut(addr)
Fetch(addr)
Partition
States
ReadPartition(partition, blockId)
Background Shuffler
increment
decrement
increment
Semaphores
decrement
Read (blockId)
Write (blockId, block)
Fetch(addr)
Store(addr, block)
ORAM Main
Fetch(partition)
Eviction Cache
Fetch (blockId)
Store (partition, block)
Partitions
Server
Client
Storage Cache
CacheIn (addr)
Partition Reader
CacheIn (addr)
CacheOut(addr)
Fetch(addr)
Partition
States
ReadPartition(partition, blockId)
Background Shuffler
increment
decrement
increment
Semaphores
decrement
Read (blockId)
Write (blockId, block)
Fetch(addr)
Store(addr, block)
Fetch(partition)
Eviction Cache
Fetch (blockId)
Store (partition, block)
ORAM Main
• ORAM Read/Write requests enter the system.
Partitions
Server
Client
Storage Cache
CacheIn (addr)
Partition Reader
CacheIn (addr)
CacheOut(addr)
Fetch(addr)
Partition
States
ReadPartition(partition, blockId)
Background Shuffler
increment
decrement
increment
Semaphores
decrement
Read (blockId)
Write (blockId, block)
Fetch(addr)
Store(addr, block)
ORAM Main
• The requests are then partitioned.
Fetch(partition)
Eviction Cache
Fetch (blockId)
Store (partition, block)
Partitions
Server
Client
Storage Cache
CacheIn (addr)
Partition Reader
CacheIn (addr)
CacheOut(addr)
Fetch(addr)
Partition
States
ReadPartition(partition, blockId)
Background Shuffler
increment
decrement
increment
Semaphores
decrement
Read (blockId)
Write (blockId, block)
Fetch(addr)
Store(addr, block)
Fetch(partition)
Eviction Cache
Fetch (blockId)
Store (partition, block)
ORAM Main
• The partition reader reads levels of the partitions.
Partitions
Server
Client
Storage Cache
CacheIn (addr)
Partition Reader
CacheIn (addr)
CacheOut(addr)
Fetch(addr)
Partition
States
ReadPartition(partition, blockId)
Background Shuffler
increment
decrement
increment
Semaphores
decrement
Read (blockId)
Write (blockId, block)
Fetch(addr)
Store(addr, block)
Fetch(partition)
Eviction Cache
Fetch (blockId)
Store (partition, block)
ORAM Main
• The background shuffler writes and shuffles levels
of the partitions.
Partitions
Server
Client
Storage Cache
CacheIn (addr)
Partition Reader
Partition
States
Background Shuffler
increment
decrement
increment
ReadPartition(partition, blockId)
Semaphores
decrement
Read (blockId)
Write (blockId, block)
Fetch(addr)
Store(addr, block)
CacheIn (addr)
CacheOut(addr)
Fetch(addr)
Fetch(partition)
Eviction Cache
Fetch (blockId)
Store (partition, block)
ORAM Main
• Semaphores bound the client memory.
Partitions
Server
Client
Storage Cache
CacheIn (addr)
Partition Reader
CacheIn (addr)
CacheOut(addr)
Fetch(addr)
Partition
States
ReadPartition(partition, blockId)
Background Shuffler
increment
decrement
increment
Semaphores
decrement
Read (blockId)
Write (blockId, block)
Fetch(addr)
Store(addr, block)
Fetch(partition)
Eviction Cache
Fetch (blockId)
Store (partition, block)
ORAM Main
• The storage cache temporarily stores data for the
background shuffler and helps ensure consistency.
Pipelined
Shuffling
Background Shuffler
• Each ORAM Read/Write
operation creates a
shuffling job.
A job consists of (on average):
1. Read 𝑂 log 𝑁 blocks.
2. Shuffle locally.
3. Write the 𝑂 𝑙𝑜𝑔 𝑁 blocks
(along with 1 additional block)
before
after
block to be written
(associated with shuffle job)
Without Pipelining
• Without pipelining shuffle jobs are latency limited:
– 𝑂 𝑘 log 𝑁 round-trips for 𝑘 ORAM operations.
– Example:
• 50ms latency, 1 TB ORAM, 4 KB blocks
• Want to write 1 MB file (𝑘 = 256 ORAM operations).
• Total time:
– About 50ms * 256 * 30 = 384 seconds
Without pipelining it would take over
6 minutes to write a 1 MB file!
Pipelining Across One Job
• Pipelining the IO operations of each job:
– Reduction in round-trips:
𝑂 𝑘 log 𝑁 to 𝑘
– Still not enough – 15 seconds for 1 MB file even if
bandwidth is plenty.
• Actually even worse.
• Distribution of job sizes highly skewed.
• Need to pipeline IOs across multiple shuffle jobs.
Asynchronous Shuffling Pipeline
Release memory
resources after
writing blocks
Shuffle Job
1
2
3
start
reads
complete
all reads
shuffle
locally
start
writes
complete
all writes
start
reads
complete
all reads
shuffle
locally
start
writes
complete
all writes
start
reads
complete
all reads
shuffle
locally
start
writes
Reserve memory
resources before
reading blocks
complete
all writes
Note: meanwhile, blocks may
be read by the partition reader
time
Semaphores
Semaphores
• Carefully designed semaphores
– Enforce bound on client memory.
– Control de-amortized shuffling speed.
– Independent of the access pattern.
• Eviction
– Unshuffled blocks that were recently accessed.
• Early cache-ins
– Blocks read during shuffling of a partition.
• Shuffling buffer
– Blocks currently being shuffled.
• Shuffling I/O
– Pending work for the shuffler.
Security
• Secure in the malicious model.
• Adversary only observes (informally):
– Behavior of synchronous system
• i.e., without ObliviStore’s optimizations.
• Proven secure.
– Semaphore values
• Independent of the access pattern.
– Timings
• Independent of the access pattern.
• Security proof in full online paper.
Evaluation
Performance
1 node (2x1TB SSD) (300 GB ORAM)
(50ms simulated client latency)
Speed: 3.2 MB/s
Scalability
10 nodes (2x1TB SSD each) (3 TB ORAM)
(50ms simulated client latency)
Speed: 31.5 MB/s
Response time: 66ms (full load)
HDD “Friendly”
4 to 10 seeks per operation
Works well on both SSDs and HDDs
Comparison to other
ORAM implementations
About 17 times higher throughput than PrivateFS
(under a very similar configuration)
Comparison to other
ORAM implementations
• Lorch et. al. also implemented ORAM.
• Built on top of real-world secure processors.
• Lots of overhead from limitations of secure
processors
– Very limited I/O bandwidth
– Very limited computation capabilities
• Many other ORAM constructions exist, but not
many full end-to-end implementations.
Conclusion
• Fully asynchronous.
• High performance.
• Full end-to-end implementation (open source).
– Already been used for mining biometric data.
(Bringer et. al)
Thank you!
http://www.emilstefanov.net/Research/ObliviousRam/