The Mathematics of Communication
4. Constructing an optimal code
Here are Stinson's data modified to take account of _ (space)
with probability .166.
A  .068  F  .018  K  .007  P  .015
 U  .023  Z  .0008

B  .0125  G  .016  L  .033  Q  .0008
 V  .008  _  .166

C  .023  H  .051  M  .020  R  .05
 W  .019  

D  .043  I  .070  N  .067  S  .063
 X  .001  

E  .106  J  .0016  O  .0625  T  .075
 Y  .016   
There is an algorithm for constructing a binary code for a set
of characters with different probabilities, with cost per character
as close as possible
to the theoretical minimum  p_{1}log_{2}p_{1}  p_{2}log_{2}p_{2}  ... 
p_{m}log_{2}p_{m}.
The minimum can only be exactly
attained if all the p_{i} are exactly powers of 1/2.
 List the characters in order of decreasing probability.
 Put the characters at the ends of the branches of a binary
tree in such a way that at each fork the total probability on
one side is as close as possible to the total probability on the
other.
 Start at the base of the tree. All characters on the
left branch have first code element 0. All characters on the
right branch have first element 1.
 Continue through the higher branchings, each time determining
the next digit in the code for all the characters on one side and on
the other.
Here is the process illustrated for English, with the probabilities
from the table above (optimum=4.13).
_ 000
_E <
/ E 001
_ETAO < T 010
< TAO< A 0110
\ AO<
\ O 0111
\ I 1000
\ IN<
\ / N 1001
\ INSHR< S 1010
< SHR< H 10110
\ HR<
\ R 10111
\ D 11000
\ DL<
\ / L 11001
\ DLUC< U 11010
< UC<
\ C 11011
\ M 11100
\ MWF< W 111010
< WF<
\ F 111011
\ Y 111100
\ YG<
< G 111101
\ P 111110
<
\ B 1111110
<
\ V 11111110
<
\ K 111111110
< J 11111111100
\ JX<
< X 11111111101
\ Q 11111111110
QZ<
Z 11111111111

The cost for this code is 4.167 bits/character.
@ Copyright 2000, American Mathematical Society.