The Mathematics of Communication
The next step is to conclude that a character occurring with probability p should cost log2(1/p) = - log2p bits, no more and no less, irrespective of the probabilities of the other characters in the alphabet. This criterion defines an optimal code.
The average cost per character of such a code can be computed by observing that out of N characters, pN will cost - log2p bits, so for m characters with probabilities p1, p2, ... , pm the average cost per character will be
For English text the relative frequencies of letters have been tabulated. Here they are taken from Cryptography Theory and Practice by D. R. Stinson, CRC Press, Boca Raton, Florida 1996.
Information and Entropy. The quantity H = - p1log2p1 - p2log2p2 - ... - pmlog2pm was introduced in this context by Claude Shannon in A Mathematical Theory of Communication (Bell System Technical Journal, July 1948). He called it the "entropy" of the set of probabilities p1, p2, ... , pm , in analogy with its interpretation in statistical mechanics. But he thought of it as "a quantity which will measure, in some sense, how much information is produced by ... a process, or better, at what rate information is produced." It is now usually called the information per character. It is an extremely useful measure in determining, for example, how many messages a given communications channel can carry. But the word "information" has led to some confusion since, as Bar-Hillel puts it, "it turned out to be humanly impossible not to believe that one has got some hold of this important and doubtless rather difficult concept on the basis of such a simple procedure as, say, counting frequencies of letter occurrences in English." (Language and Information, Addison-Wesley 1964, p. 285 ).