幻灯片 1 - Computer Science Department @ University of

Download Report

Transcript 幻灯片 1 - Computer Science Department @ University of

DCSP-7: Information
Jianfeng Feng
Department of Computer Science Warwick
Univ., UK
[email protected]
http://www.dcs.warwick.ac.uk/~feng/dsp.html
http://cs.gmu.edu/cne/modules/dau
/stat/dau2_frm.html
Information and coding theory
Information theory is concerned with the
description of information sources, the
representation of the information from a
source, and the transmission of this
information over channel.
This might be the best example to
demonstrate how a deep mathematical
theory could be successfully applied to
solving engineering problems.
Information theory is a discipline in applied
mathematics involving the quantification of
data with the goal of enabling as much
data as possible to be reliably stored on a
medium and/or communicated over a
channel.
The measure of data, known as information
entropy, is usually expressed by the
average number of bits needed for storage
or communication.
The field is at the crossroads of
mathematics,
statistics,
computer science,
physics,
neurobiology,
and electrical engineering.
Its impact has been crucial to success of
•
the voyager missions to deep space,
•
the invention of the CD,
•
the feasibility of mobile phones,
•
the development of the Internet,
•
the study of linguistics and of human perception,
•
the understanding of black holes,
and numerous other fields.
Information theory is generally considered to
have been founded in 1948 by Claude Shannon
in his seminal work,
A Mathematical Theory of Communication
The central paradigm of classic information
theory is the engineering problem of the
transmission of information over a noisy
channel.
An avid chess player, Professor Shannon built a chess-playing computer years
before IBM's Deep Blue came along. While on a trip to Russia in 1965, he
challenged world champion Mikhail Botvinnik to a match.
He lost in 42 moves, considered an excellent showing.
The most fundamental results of this theory are
1. Shannon's source coding theorem
which establishes that, on average, the number of
bits needed to represent the result of an uncertain
event is given by its entropy;
2. Shannon's noisy-channel coding theorem
which states that reliable communication is possible
over noisy channels provided that the rate of
communication is below a certain threshold called
the channel capacity.
The channel capacity can be approached by using
appropriate encoding and decoding systems.
Consider to predict the activity of Prime minister
tomorrow.
This prediction is an information source.
The information source has two outcomes:
• He will be in his office,
• he will be naked and run 10 miles in London.
Clearly, the outcome of 'in office' contains
little information; it is a highly probable
outcome.
The outcome 'naked run', however contains
considerable information; it is a highly
improbable event.
In information theory, an information source is a
probability distribution, i.e. a set of probabilities
assigned to a set of outcomes.
"Nothing is certain, except death and taxes"
This reflects the fact that the information
contained in an outcome is determined not only
by the outcome, but by how uncertain it is.
An almost certain outcome contains little
information.
A measure of the information contained in an
outcome was introduced by Hartley in 1927.
He defined the information contained in an
outcome x a
I(x) = - log2 p(x)
This measure satisfied our requirement that the
information contained in an outcome is
proportional to its uncertainty.
If P(x)=1, then I(x)=0, telling us that a certain
event contains no information
The definition above also satisfies the requirement
that the total information in in dependent events
should add.
Clearly, our prime minister prediction for two days
contain twice as much information as for one day.
For two independent outcomes xi and xj,
I(xi and xj) = log P(xi and xj)
= log P(xi) P(xj)
=
Hartley's measure defines the information in a
single outcome.
The measure entropy H(X) defines the information
content of the course X as a whole.
It is the mean information provided by the source.
We have
H(X)=Si P(xi)I(xi) = - Si P(xi) log2 P(xi)
A binary symmetric source (BSS) is a source with
two outputs whose probabilities are p and 1-p
respectively.
The prime minister discussed is a BSS.
The entropy of the source is
H(X) = -p log2 p - (1-p) log2 (1-p)
The function takes the value zero when p=0.
When one outcome is certain, so is the other, and
the entropy is zero.
As p increases, so too does the entropy, until it
reaches a maximum when p = 1-p = 0.5.
When p is greater than 0.5, the curve declines
symmetrically to zero, reached when p=1.
We conclude that the average information in
the BSS is maximised when both
outcomes are equally likely.
The entropy is measuring the average
uncertainty of the source.
(The term entropy is borrowed from thermodynamics. There too it is a measure of the uncertainly of disorder of a system).