New Capacity Bounds for the Binary Deletion Channel

Download Report

Transcript New Capacity Bounds for the Binary Deletion Channel

New Results and Open Problems for
Insertion/Deletion Channels
Michael Mitzenmacher
Harvard University
Much is joint work with
Eleni Drinea
The Most Basic Channels
• Binary erasure channel.
– Each bit replaced by a ? with probability p.
• Binary symmetric channel.
– Each bit flipped with probability p.
• Binary deletion channel.
– Each bit deleted with probability p.
M. Mitzenmacher
2
The Most Basic Channels
• Binary erasure channel.
– Each bit replaced by a ? with probability p.
1101110010
01  1? 0111? 01? 01
• Binary symmetric channel.
– Each bit flipped with probability p.
1101110010
01  1001111011
01
• Binary deletion channel.
– Each bit deleted with probability p.
1101110010
01  101110101
M. Mitzenmacher
3
The Most Basic Channels
• Binary erasure channel.
– Each bit is replaced by a ? with probability p.
– Very well understood.
• Binary symmetric channel.
– Each bit flipped with probability p.
– Very well understood.
• Binary deletion channel.
– Each bit deleted with probability p.
– We don’t even know the capacity!!!
M. Mitzenmacher
4
Motivation
This seems a disturbing, sad state
of affairs. It bothers me greatly.
M. Mitzenmacher
5
Motivation
This seems a disturbing, sad state
of affairs. It bothers me greatly.
And there may be applications…
M. Mitzenmacher
6
Motivation
This seems a disturbing, sad state
of affairs. It bothers me greatly.
And there may be applications…
Hard disks, pilot tones, etc.
M. Mitzenmacher
7
What’s the Problem?
• Erasure and error channels have pleasant
symmetries; deletion channels do not.
• Example:
– Delete one bit from 1010101010.
– Delete one bit from 0000000000.
• Understanding this asymmetry seems
fundamental.
• Requires deep understanding of combinatorics of
random sequences and subsequences.
– Not a historical strength of coding theorists.
– But it is for this audience….
M. Mitzenmacher
8
In This Talk
• Main result: capacity of binary deletion channel is
at least (1- p)/9.
– Compare to capacity (1- p) for erasure channel.
– First within constant factor result.
– Still not tight….
• We describe path to this result.
– Generally, we’ll follow the history chronologically.
• We describe recent advances on related problems.
– Insertion channels, more limited models.
• We describe many related open problems.
– What do random subsequences of random sequences
look like?
M. Mitzenmacher
9
Capacity Lower Bounds
Shannon-based approach:
1. Choose a random codebook.
2. Define “typical” received sequences.
3. Construct a decoding algorithm.
M. Mitzenmacher
10
Capacity Lower Bounds: Erasures
1. Choose a random codebook.
–
Each bit chosen uniformly at random.
2. Define “typical” received sequences.
–
No more than (p + e) fraction of erasures.
3. Construct a decoding algorithm.
–
Find unique matching codeword.
M. Mitzenmacher
11
Capacity Lower Bounds: Errors
1. Choose a random codebook.
–
Each bit chosen uniformly at random.
2. Define “typical” received sequences.
–
Between (p – e),(p + e) fraction of errors.
3. Construct a decoding algorithm.
–
Find unique matching codeword.
M. Mitzenmacher
12
Capacity Lower Bounds: Deletions
1. Choose a random codebook.
–
Each bit chosen uniformly at random.
2. Define “typical” received sequences.
–
No more that (p + e) fraction of deletions.
3. Construct a decoding algorithm.
–
Find unique matching codeword.
Yields poor bounds, and no bound for p > 0.5.
M. Mitzenmacher
13
GREEDY Subsequence Algorithm
• Is S a subsequence of T?
– Start from leftmost point of S and T
– Move right on T until match next character of S
– Move to next character of T
T 000110010
S 01010
M. Mitzenmacher
14
Basic Failure Argument
• When codeword X of length n is sent, and
p just greater than 0.5, received sequence R
has approx. n/2 bits.
• Is R a subsequence of another codeword Y?
• Consider GREEDY algorithm
– If Y is chosen u.a.r., on average two bits of Y
are needed to cover each bit of R.
– So most other codewords match!
M. Mitzenmacher
15
Deletions: Diggavi/Grossglauser
1. Choose a random codebook.
–
Codeword sequences chosen by a symmetric
first order Markov chain.
2. Define “typical” received sequences.
–
No more that (p + e) fraction of deletions.
3. Construct a decoding algorithm.
–
Find unique matching codeword.
M. Mitzenmacher
16
Symmetric First Order Markov Chain
1– q/1
q/0
0
q/1
1
1– q/0
0’s tend to be followed by 0’s,
1’s tend to be followed by 1’s
M. Mitzenmacher
17
Intuition
• To send a 0 bit, if deletions are likely, send
many copies in a block.
– Lowers the rate by a constant factor.
– But makes it more likely that the bit gets
through.
• First order Markov chain gives natural
blocks.
M. Mitzenmacher
18
Diggavi/Grossglauser Results
• Calculate distribution of number of bits required
for GREEDY to cover each bit of received
sequence R using “random” codeword Y.
– If R is a subsequence of Y, GREEDY algorithm will
show it!
– Received sequence R also behaves like a sym. first
order Markov chain, with parameter q’.
• Use Chernoff bounds to determine how many
codewords Y of length n are needed before R is
covered.
• Get a lower bound on capacity!
M. Mitzenmacher
19
The Block Point of View
• Instead of thinking of codewords being
randomly chosen bit by bit:
0, 00, 000, 0001, 00011, 000110, 0001101….
• Think of codewords as being a sequence of
maximal blocks:
000, 00011, 000110, ….
M. Mitzenmacher
20
Improvements, Random Codebook
1. Choose a random codebook.
–
Codeword sequences chosen by laying out
blocks according to a given distribution.
2. Define “typical” received sequences.
–
No more that (p + e) fraction of deletions, and
number of blocks of each length close to the
expectation.
3. Construct a decoding algorithm.
–
Find unique matching codeword.
M. Mitzenmacher
21
Changing the Codebook
• Fix a distribution Z on positive integers.
– Probability of j is Zj.
• Start sequence with 0’s.
• First block of 0’s has length given by Z.
Then block of 1’s has length given by Z.
And so on.
– Generalizes previous work: first order Markov
chains lead to geometric distributions Z.
M. Mitzenmacher
22
Choosing a Distribution
• Intuition: when a mismatch between
received sequence and random codeword
occurs under GREEDY, want it to be long
lasting with significant probability.
• (a,b,q)-distributions:
– A short block a with probability q, long block b
with probability 1– q.
– Like Morse code.
M. Mitzenmacher
23
Results So Far
M. Mitzenmacher
24
So Far…
• Decoding algorithm has always been
GREEDY.
• Can’t we do better?
• For bigger capacity improvements, it seems
we need better decoding.
• Best algorithm: maximum likelihood.
– Find the most likely codeword given the
received sequence.
M. Mitzenmacher
25
Maximum Likelihood
• Pick the most likely codeword.
• Given codeword X and received sequence R, count
the number of ways R is obtained as a subsequence
of X. Most likely = biggest count.
– Via dynamic programming.
– Let C(j,k) = number of ways first k characters of R are
subsequence of first j characters of X.
C ( j, k )  C ( j  1, k )  I [ X j  Rk ]C ( j  1, k  1)
• Potentially exponential time, but we just want
capacity bounds.
M. Mitzenmacher
26
The Recurrence
C ( j, k )  C ( j  1, k )  I [ X j  Rk ]C ( j  1, k  1)
• I would love to analyze this recurrence when:
– Y is independent of X
– Y is obtained from X by random deletions.
• If the I[Xj = Rk] values were all independent,
would be possible.
• But dependence in both cases makes analysis
challenging.
• I bet someone here can do it. Let’s talk.
M. Mitzenmacher
27
Maximum Likelihood
• Standard union bound argument:
– Let sequence R be obtained from codeword X via a
binary deletion channel; let S be a random sequence
obtained from another random codeword Y.
– Let C(R) = # of ways R is a subsequence of X.
• Similarly C(S) = # of ways S is a subsequence of X.
– What are the distributions of C(R), C(S)?
• Unknown; guess is a lognormal or power law type
distribution. Also C(S) is often 0 for many parameters.
– Want C(R) > C(S) with suitably high probability.
M. Mitzenmacher
28
Conclude: Maximum Likelihood
• This is really the holy grail.
– As far as capacity arguments.
• Questions:
– What is the distribution of the number of times
a small “random” sequence appears as a
subsequence of a larger “random” sequence?
– Same question, when the smaller “random”
sequence is derived from the larger through a
deletion process.
M. Mitzenmacher
29
Better Decoding
• Maximum likelihood – haven’t got it yet…
• An “approximation”, intuitively like mutual
information:
– Consider a received block of 0’s (or 1’s). What block(s)
did it arise from?
– Call that sequence a type.
– For random codewords and deletions, number of
(type,block) pairs for each type/block combination is
highly concentrated around its expectation.
M. Mitzenmacher
30
Type Examples
1111000011000100011100110011000
1111000011000100011100110011000
Received sequence:
1100000111100
(type,block) pairs:
(1 1 1 1 , 1 1)
(0 0 0 0 1 1 0 0 0 1 0 0 0 , 0 0 0 0 0)
(1 1 1 0 0 1 1 0 0 1 1 , 1 1 1 1)
(0 0 0 , 0 0)
M. Mitzenmacher
31
New Decoding Algorithm
1. Choose a random codebook.
–
Codeword sequences chosen by laying out blocks
according to a given distribution.
2. Define “typical” received sequences.
–
No more that (p + e) fraction of deletions, and has
near the expected number of (type,block) occurrences
for each (type,block) pair.
3. Construct a decoding algorithm.
–
Find unique codeword that could be derived from the
“expected” number of (type,block) occurrences given
the received sequence.
M. Mitzenmacher
32
Jigsaw Puzzle Decoding
Received sequence: … 0 0 1 1 0 0 1 1 0 1 …
Jigsaw puzzle pieces
00000
0000
0110
1011
11
111
00
00
00
11
11
11
M. Mitzenmacher
33
Jigsaw Puzzle Decoding : Examples
Received sequence: … 0 0 1 1 0 0 1 1 …
00000 1011 0110 11
00
11
0000
111
00
11
00000 111
00
M. Mitzenmacher
11
00
11
…000001011011011…
…0 0 1 1 0 0 1 1…
0 0 0 0 0 1 0 1 1 …0000111000001011…
00
11
0000 11
00
11
…0 0 1 1 0 0 1 1…
…00000111000011…
…0 0 1 1 0 0 1 1…
34
Formal Argument
• Calculate upper bound on number of possible
jigsaw puzzle coverings. Get lower bound on
capacity.
– Challenge 1: Don’t get exactly the expected number of
pieces for each (type,block) pair; just close.
– Challenge 2: For very rare pieces, might not even be
close.
• End result: an expression that can be numerically
computed to give a lower bound, given input
distribution.
M. Mitzenmacher
35
Calculations
• All done by computer.
• Numerical precision – not too challenging
for moderate deletion probabilities.
– Terms in sums become small quickly.
– Fairly smooth.
– We guarantee our output is a lower bound.
• Computations become time-consuming for
large deletion probabilities.
M. Mitzenmacher
36
Improved Results
M. Mitzenmacher
37
Ullman’s Bound
• Ullman has an upper bound for synchronization
channels.
– For insertions of a specific form.
– Zero-error probability.
• Does not apply to this channel – although it has
been used as an upper bound!
• We are the first to show Ullman’s bound does not
hold for this case.
• What is a (non-trivial) upper bound for this
channel?
– We have some initial results.
M. Mitzenmacher
38
Insertion/Deletion Channels
• Our techniques apply for some
insertion/deletion channels.
– GREEDY decoding cannot; depends on received
sequence being a subsequence of the original
codeword.
• Specifically, the case of duplications: 0
becomes 000….
– Maintains block structure.
M. Mitzenmacher
39
Poisson Channels
• Recall discrete Poisson distribution with mean m.
Pr[ X  k ]  e  m m k / k!
• Consider a channel that replaces each bit with a
Poisson number of copies.
• Call this a Poisson channel.
• Poisson channels can be studied using our
insertion/deletion analysis.
• Capacity when m = 1 is approx. 0.1171.
– From numerical calculations.
M. Mitzenmacher
40
Reduction!
• A code for a Poisson channel gives a code for a
deletion channel.
– To send codeword over deletion channel with deletion
probability p, use a codeword X for the Poisson channel
code, but independently replace each bit by a Poisson
distributed number of bits with mean 1/(1 – p).
– At output, each bit of X appears as a Poisson distributed
number of copies (with mean 1) – a Poisson channel.
– Decode for the Poisson channel.
M. Mitzenmacher
41
Code Picture
Take codeword X for
Poisson channel
Randomly expand to X’ for
deletion channel using a
Poisson number of copies per bit
Expands by
1/(1– p) factor
Send X’ over
deletion channel
Receive R
M. Mitzenmacher
Decode R using the
Poisson channel codebook
42
Capacity Result
• Input to the deletion channel is 1/(1 – p) factor
larger than for Poisson channel.
• Implies capacity for the deletion channel is at least
0.1171(1 – p) > (1 – p) / 9.
– Deletion channel capacity is within a constant factor of
the erasure channel (1 – p).
– First result of this type that we know of.
– Best result (using a different mean) is 0.1185(1 – p).
M. Mitzenmacher
43
More New Directions
• Sticky channels
• Segmented deletion/insertion channels
M. Mitzenmacher
44
Sticky Channels
• Motivation: insertion/deletion channels are hard.
So what is the easiest such channel we can study?
• Sticky channels: each symbol duplicated a
number of times.
– Like a sticky keyboard! xxxxxxxxxxxxx
– Examples: each bit duplicated with probability p, each
bit replaced by a geometrically distributed number of
copies.
• Key point: no deletions.
• Intuitively easy: block structure at sender
completely preserved at the receiver.
M. Mitzenmacher
45
Sticky Channels : Results
• New work: numerical method that give
near-tight bounds on the capacity of such
channels.
– Key idea: symbols are block lengths.
• 000 becomes 3.
– Capacity for original channel becomes capacity
per unit cost in this channel.
– Use techniques for capacity per unit cost.
M. Mitzenmacher
46
Segmented Channels
• Motivation: what about deletions makes
them so hard?
– Can we restrict deletions and make them easy?
• Segmented deletion channel: at most one
deletion per segment.
– Example: At most 1 deletion per original byte.
M. Mitzenmacher
47
Segmented Channels : Results
• New work: 0-error, deterministic
algorithms for segmented deletion/insertion
channels.
– With reasonable rates.
– Simple computationally.
M. Mitzenmacher
48
Open Questions
• Capacity lower bounds: Improvements to argument.
– What is the best distribution for codewords?
– Can even more general (type,block) pairs be usefully used?
– Avoid overcounting jigsaw solutions that appear multiple
times?
– Specific better lower bounds for Poisson channel, translate
immediately into better general bounds!
• Upper bounds:
– Tighter upper bound for capacity of binary deletion channel?
• Maximum likelihood arguments:
– How do random subsequences of random sequences behave?
– Look for threshold behaviors, heavy-tailed distributions.
M. Mitzenmacher
49
Open Questions : Coding
• All this has been on lower bounds –
almost nothing about coding!
– Except segmented deletion channel.
• There has been some experimental work
done, but very, very limited results so far.
– And little good theory.
• Can we take the insight here and use it to
develop good codes?
M. Mitzenmacher
50
Specific Code Challenges
• Find a good code for the Poisson channel.
– Code for the Poisson channel immediately
gives codes for the deletion channel!
• Find good codes for basic sticky channels.
– Easiest channels with block structure :
no deletions, just duplicates.
– May yield insight for other channels.
– Low-density parity-check coding techniques
seem applicable.
M. Mitzenmacher
51
The Goals
• Simple, clear, tight bounds for capacity for
binary deletion channels, and practical
codes that are close to capacity.
• It’s been done for erasure channels and
error-correcting channels, why not deletion
channels too?
M. Mitzenmacher
52