The Learnability of Quantum States

Download Report

Transcript The Learnability of Quantum States

The Learnability of
Quantum States

Scott Aaronson
University of Waterloo
Outline
A Quantum Occam’s Razor Theorem
- Why you should want it to be true
- Why it is true
- Application to quantum communication
- Application to quantum advice
Sneak Preview: Quantum Software Copy-Protection
- What it has to do with learning
- Why it might be possible
Why do we believe the sun will rise tomorrow?
The hypothesis that it will rise every day until tomorrow
is equally compatible with evidence…
David Hume (1711-1776)
The Sun
In my view, a branch of CS called computational
learning theory has pretty much solved this Humean
Problem of Induction, insofar as it has a solution…
Occam’s Razor Theorem
(Valiant, Vapnik, Blumer et al…)
“If the possible hypotheses have sufficiently fewer bits
than the data you’ve collected, and if one of those
hypotheses succeeds in explaining your data, then that
hypothesis will probably also explain most of the data
you haven’t collected”
In particular: If you want to output a hypothesis from set H
that explains at least a 1- fraction of future data with
probability at least 1-, then
1
H 
m  O log 
 

data points suffice.
Trouble in QuantumLand
To describe a quantumFear
statenot,
of nphysicists!
qubits takes ~2n
Why would he even be
classical bits
raising this “dilemma” if
Indeed, traditional quantum
state tomography
requires
he wasn’t
gonna
(22n) measurementsdemolish
on copiesitof
onthe
thestate
very
next10,000-particle
slide?
Does this mean that a generic
state can
never be “learned” within the lifetime of the universe?
If so, would call into question the operational status of
many-body quantum states themselves…
HILBERT SPACE
“Operationally
meaningful subset”
The Quantum Occam’s Razor Theorem
Let  be an n-qubit mixed state. Let D be a distribution
over two-outcome measurements. Suppose we draw
m measurements E1,…,Em independently from D, and
then output a “hypothesis state”  such that
TrEi   TrEi    
for all i. Then provided /10 and
 1  n
1
1 
m   2 2  2 2 log  log  ,

 
   
we’ll have
Pr  TrE   TrE      1  
ED
with probability at least 1- over E1,…,Em
Upshot for Experimentalists
You can do “pretty good tomography” on an arbitrary
entangled state of n spins, using a number of
measurements that scales only linearly (!) with n
Here “pretty good” means with respect to any fixed
distribution over observables
Q: But what if I can’t estimate the Tr(E)’s? What if for
each measurement E, all I get is a bit that’s 1 with
probability Tr(E) and 0 with probability 1-Tr(E)?
A: In that case you need this many measurements:
 1  n
1
1 
O 4 2  4 2 log  log  

 
   
To prove the theorem, we need a notion
introduced by Kearns and Schapire called
Fat-Shattering Dimension
Let C be a class of functions from S to [0,1]. We say a set
{x1,…,xk}S is -shattered by C if there exist reals a1,…,ak
such that, for all 2k possible statements of the form
f(x1)a1-  f(x2)a2+  …  f(xk)ak-,
there’s some fC that satisfies the statement.
Then fatC(), the -fat-shattering dimension of C, is the
size of the largest set -shattered by C.
Small Fat-Shattering Dimension
Implies Small Sample Complexity
Proof uses a 1996 result of Bartlett and Long
Let C be a class of functions from S to [0,1], and let fC.
Suppose we draw m elements x1,…,xm independently from
some distribution D, and then output a hypothesis hC
such that |h(xi)-f(xi)| for all i. Then provided /7 and
 1 
  2 1
1 

m   2 2  fatC   log
 log  ,

 
 35 
  
we’ll have
Pr  h x   f x      1  
xD
with probability at least 1- over x1,…,xm.
Upper-Bounding the Fat-Shattering
Dimension of Quantum States
Proof uses Ashwin Nayak’s lower bound for “quantum
random access codes,” which in turn uses Holevo’s
Theorem on quantum channel capacity
Let S be the set of two-outcome measurements on n
qubits. Let Cn be the set of functions f:S[0,1] defined
by f(E)=Tr(E) for some n-qubit mixed state .
Then
No need to
ln 2 n
thank
me!  2 .
fat
Cn   
2 
Quantum Occam’s Razor Theorem
is then just plug & chug…
Simple Application of Quantum Occam’s
Razor Theorem to Communication Complexity
x
y
Alice Walker
Bob Dylan
f(x,y)
f: Boolean function mapping Alice’s N-bit string x and
Bob’s M-bit string y to a binary output
D1(f), R1(f), Q1(f): Deterministic, randomized, and
quantum one-way communication cost of f
How much can quantum communication save?
• It’s known that D1(f)=O(M Q1(f)) for all total f
• In 2004 I showed that for all f,
D1(f)=O(M Q1(f)logQ1(f))
Theorem: R1(f)=O(M Q1(f))
for all f, partial or total
Proof: By Yao’s minimax principle, Alice can consider a
worst-case distribution Dx over Bob’s input y
Alice’s classical message will consist of y1,…,yT drawn
from Dx, together with f(x,y1),…,f(x,yT)
Here T=(Q1(f))
Bob searches for a quantum message  that yields the
right answers on y1,…,yT (certainly such a  exists)
By the Quantum Occam’s Razor Theorem, with high
probability such a  yields the right answers on most y
drawn from Dx
Computational Complexity of
Learning Quantum States
I showed that, if you find a state  that explains O(n)
measurements drawn from D, with high probability that
 will correctly explain most future measurements
drawn from D.
This says nothing about the computational problem of
finding !
Indeed, if  can always be prepared by a polynomialtime quantum algorithm, then no one-way function is
secure against quantum attack.
To say more, we need to visit the bestiary…
PostBQP/poly
BQP/qpoly
QMA/poly
YQP/poly
BQP/poly
QMA
YQP
BQP
YQP: Yaroslav Quantum Polynomial-Time
Class of problems solvable efficiently on a quantum computer,
with the help of polynomial-size untrusted quantum advice
Theorem: AvgBQP/qpoly = AvgYQP/poly
Or in English: We can use trusted classical advice
to verify that untrusted quantum advice will work
on most inputs.
Proof Idea: The classical advice will consist of “training
inputs” x1,…,xm, as well as whether xiL for all 1im
Given a purported advice state |, first check that |
yields the right answers on x1,…,xm, and only then use
it on the x you care about
By Quantum Occam’s Razor Theorem, m=O(poly(n)) is
enough to ensure | will work on most inputs w.h.p.
The technical part is to do the verification without
damaging | too badly
Quantum Copy-Protection
We say a program P is copy-protected if there’s no
efficient algorithm that, given P’s source code, outputs
two programs with the same input/output behavior as P
Classically, copy-protection is trivially impossible
(tell that to Sony/BMG…)
Quantumly: well, it’s called the “No-Cloning Theorem”
for a reason…
Connection to learning: If P can be learned from
input/output behavior, then it can’t be copy-protected
A Weird Example
Let G be a finite group, such that we can efficiently
prepare |G (a uniform superposition over gG)
Let HG be a subgroup with |H|  |G|/polylog|G|
Let f(g)=1 if gH and f(g)=0 otherwise
Given |H (a uniform superposition over H), Watrous
showed that we can efficiently compute f
Test whether |H and |gH are equal or orthogonal
Conversely, given a black box that computes f, we can
efficiently prepare |H
First prepare |G, then postselect on f(g)=1
So any program for f can be pirated—but (apparently)
only in an indirect, quantum way
The Pirate’s Nightmare
In the quantum world, can any program that
can’t be learned be copy-protected?
Main Result: There exists a “quantum oracle”
relative to which the answer is yes
Upshot: Even if the answer is no, we can’t prove it
without using “quantumly nonrelativizing
techniques”
Handwaving Proof Idea
For each circuit C, choose a “meaningless quantum
label” |C according to the Haar measure
The quantum oracle will map |C|x|0 to |C|x|C(x),
as well as |C|0 to |C|C
Problem:
“Mocking
a than being
Intuitively, then,
being given
|C is “noup”
better”
random
given a black box
for C pure state takes
exponential time
To prove this, we need to simulate an algorithm that
prepares |C given another copy of |C, by an algorithm
that prepares |C given only black-box access to C
Strategy: Mimic the copying algorithm, by “mocking up”
a random pure state | that plays the same role as |C
Solution: Pseudorandom States
1
p0  x 

p 
 1
x

n
2 xGF 2n 
where p is a degree-d univariate polynomial over GF(2n)
for some d=poly(n), and p0(x) is the “leading bit” of p(x)
Clearly the |p’s can be prepared in polynomial time
Lemma: If p is chosen uniformly at random, then |p
“looks like” it was chosen under the Haar measure
- Even if we get polynomially many copies of |p
- Even if we query the quantum oracle, which depends on |p
So the simulator can use |p’s in place of |C’s

Open Problems

Can we tighten the Quantum Occam’s Razor Theorem?
The best lower bounds I can prove go like (n/2), or (n/4) in the case
where each measurement is applied only once
Does BQP/qpoly = YQP/poly?
I.e., can we use classical advice to verify quantum advice in the worstcase setting?
Is D1(f) = O(M Q1(f))? Or even O(M+Q1(f))?
Even more ambitiously, could learning theory techniques help us show
that R1(f)=O(Q1(f)) for all total f?
In the real world, are there nontrivial programs that can
be quantumly copy-protected?
What about point functions (f(x)=1 if x equals a secret password s;
otherwise f(x)=0)?