Fano's inequality

From Wikipedia, the free encyclopedia

In information theory, Fano's inequality (also known as the Fano converse and the Fano lemma) relates the average information lost in a noisy channel to the probability of the categorization error. It was derived by Robert Fano in the early 1950s while teaching a Ph.D. seminar in information theory at MIT, and later recorded in his 1961 textbook.

It is used to find a lower bound on the error probability of any decoder as well as the lower bounds for in density estimation.

Let the random variables and represent input and output messages with a joint probability . Let represent an occurrence of error; i.e., that , with being an approximate version of . Fano's inequality is

where denotes the support of ,

is the conditional entropy,

is the probability of the communication error, and

is the corresponding binary entropy.


Proof[]

Define an indicator random variable , that indicates the event that our estimate is in error,

Consider . We can use the chain rule for entropies to expand this in two different ways

Equating the two

Expanding the right most term,

Since means ; being given the value of allows us to know the value of with certainty. This makes the term . On the other hand, means that , hence given the value of , we can narrow down to one of different values, allowing us to upper bound the conditional entropy . Hence

The other term, , because conditioning reduces entropy. Because of the way is defined, , meaning that . Putting it all together,

Because is a Markov chain, we have by the data processing inequality, and hence , giving us

Alternative formulation[]

Let be a random variable with density equal to one of possible densities . Furthermore, the Kullback–Leibler divergence between any pair of densities cannot be too large,

for all

Let be an estimate of the index. Then

where is the probability induced by

Generalization[]

The following generalization is due to Ibragimov and Khasminskii (1979), Assouad and Birge (1983).

Let F be a class of densities with a subclass of r + 1 densities ƒθ such that for any θ ≠ θ

Then in the worst case the expected value of error of estimation is bound from below,

where ƒn is any density estimator based on a sample of size n.

References[]

  • P. Assouad, "Deux remarques sur l'estimation", Comptes Rendus de l'Académie des Sciences de Paris, Vol. 296, pp. 1021–1024, 1983.
  • L. Birge, "Estimating a density under order restrictions: nonasymptotic minimax risk", Technical report, UER de Sciences Économiques, Universite Paris X, Nanterre, France, 1983.
  • T. Cover, J. Thomas (1991). Elements of Information Theory. pp. 38–42. ISBN 978-0-471-06259-2.
  • L. Devroye, A Course in Density Estimation. Progress in probability and statistics, Vol 14. Boston, Birkhauser, 1987. ISBN 0-8176-3365-0, ISBN 3-7643-3365-0.
  • Fano, Robert (1968). Transmission of information: a statistical theory of communications. Cambridge, Mass: MIT Press. ISBN 978-0-262-56169-3. OCLC 804123877.
    • also: Cambridge, Massachusetts, M.I.T. Press, 1961. ISBN 0-262-06001-9
  • R. Fano, Fano inequality Scholarpedia, 2008.
  • I. A. Ibragimov, R. Z. Has′minskii, Statistical estimation, asymptotic theory. Applications of Mathematics, vol. 16, Springer-Verlag, New York, 1981. ISBN 0-387-90523-5
Retrieved from ""