The Mathematics of Communication

Web resources on information theory range from Shannon's 1948 classic A Mathematical Theory of Communication to the somewhat more accessible Primer on Information Theory by Thomas Schneider of the NIH Laboratory of Molecular Biology (interesting focus on the genetic code) and the Basics of Information Theory by Dave Touretzky at Carnegie Mellon. These notes are adapted from a Feature Column on the American Mathematical Society website, November 2000.

## 1. Numbers, bases and logarithms

Take a number, say 565937. This number has an existence of its own independent of the numbering system used to represent it. It could be for example the population of a mid-sized city, or the salary of a mid-level executive.

The mathematics of communication addresses the problem of determining the cost of recording or transmitting this number and, beyond numbers, any conceivable message.

Staying with 565937, suppose we used for our numbering system a base different from 10. Here are some of the possibilities:

 base possible digits 565937 in this base number of digits 16 0 1 2 3 4 5 6 7 8 9 A B C D E 8A2B1 5 10 0 1 2 3 4 5 6 7 8 9 565937 6 5 0 1 2 3 4 12110222 9 3 0 1 2 1001202022122 13 2 0 1 10001010001010110001 20

The trade-off between size of base and length of representation manifests what is called the amount of information carried by the number.

Information always depends on the context in which a signal is to be understood. In many cases, however, it is not practical to make up a different encoding system for each context. For example, in encoding numbers, one usually makes the hypothesis that in each place all digits are equally likely. So in writing a six-digit decimal number we are choosing one of the one million possibilities. On the other hand writing the same number in base 5 would imply that we are choosing one of the 1953125 possible 9-digit base-5 numbers.

To get a standard measure of this information, and for practical reasons as well, we use base 2. It takes 20 base-2 digits to write the number and consequently we say it has 20 bits (=Binary digITs) of information.

Since the decimal (base-10) representation gives the same information in 6 decimal digits, each digit in 565937 carries 20/6 = 3.33 bits of information. In general, the length of the base 2 representation of a number n is very nearly log2n; similarly the length of its base-10 representation will be very nearly log10n. For long strings of numbers these estimates become exact and the number of bits per decimal will be log2n / log10n = log210 = 3.3219..

Here is where logarithms come into the picture. More generally, for any number n and base b, so each digit is one of b equally likely possibilities, the number of bits per digit will be

log2(n) / logb(n)= log2(b).

This is the basic observation that underlies the mathematical theory of communication.

Tony Phillips
Stony Brook
February 14, 2009