Bayes classifier

From Wikipedia, the free encyclopedia

In statistical classification, the Bayes classifier minimizes the probability of misclassification.[1]

Definition[]

Suppose a pair takes values in , where is the class label of . This means that the conditional distribution of X, given that the label Y takes the value r is given by

for

where "" means "is distributed as", and where denotes a probability distribution.

A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function , with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as

The Bayes classifier is

In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, . The Bayes classifier is a useful benchmark in statistical classification.

The excess risk of a general classifier (possibly depending on some training data) is defined as Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.[2]

Proof of Optimality[]

Proof that the Bayes classifier is optimal and Bayes error rate is minimal proceeds as follows.

Define the variables: Risk , Bayes risk , all possible classes to which the points can be classified . Let the posterior probability of a point belonging to class 1 be . Define the classifier as

Then we have the following results:

(a) , i.e. is a Bayes classifier,

(b) For any classifier , the excess risk satisfies

(c)


Proof of (a): For any classifier , we have

Notice that is minimised by taking ,

Therefore the minimum possible risk is the Bayes risk, .


Proof of (b):


Proof of (c):


The general case that the Bayes classifier minimises classification error when each element can belong to either of n categories proceeds by towering expectations as follows.

This is minimised by classifying

for each observation x.

See also[]

References[]

  1. ^ Devroye, L.; Gyorfi, L. & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. ISBN 0-3879-4618-7.
  2. ^ Farago, A.; Lugosi, G. (1993). "Strong universal consistency of neural network classifiers". IEEE Transactions on Information Theory. 39 (4): 1146–1151. doi:10.1109/18.243433.
Retrieved from ""