Next: Defining functions with proc; Up: fsqFsHn sGGousG Previous: Introduction to Cryptography

Subsections

Simple substitution
The Caesar cipher, and the ASCII encoding
- Treating characters as numbers
- Render unto Caesar

Simple Ciphers

Simple substitution

One of the most common (and very easy to crack) ciphers is substitution. One sometimes sees these in a newspaper somewhere near the crossword puzzle. This scheme is also the one used in Poe's story ``The Gold Bug'', and the Sherlock Holmes ``Mystery of the Dancing Men''. The idea is straightforward: choose a rearrangement of the letters of the alphabet, and replace each letter in the plaintext by its corresponding one.

Such substitutions are very simple to implement in Maple. We will use the CharacterMap function from the StringTools package^4.2which does exactly what we want. First, we define two ``alphabets''- one for our plain text, and one for our encrypted text. The second is a permutation of the first.

> 
  with(StringTools):
  Alphabet :="abcdefghijklmnopqrstuvwxyz":
  Cryptabet:="THEQUICKBROWNFXJMPSVLAZYDG":

Now we define a pair of functions^4.3to encrypt and decrypt our text.

> 
  Scramble  := msg  -> CharacterMap(Alphabet, Cryptabet, msg):
  Unscramble:= code -> CharacterMap(Cryptabet,Alphabet, code):

> 
  Scramble("please do not read this");

$\begin{maplelatex} \begin{displaymath}\mbox{\lq\lq JWUTSU QX FXV PUTQ VKBS''} \end{displaymath}\end{maplelatex}$

> 
  Unscramble(

$\begin{maplelatex} \begin{displaymath}\mbox{\lq\lq please do not read this''} \end{displaymath}\end{maplelatex}$

Note that our message contains a spaces which are preserved in the encryption process, because the CharacterMap function only modifies those characters which are found in the first string. If a character isn't found, it is left alone. Thus, if we try to scramble a message containing punctuation or upper-case letters, they aren't modified by Scramble. Since our Cryptabet is all upper-case, a message which began with mixed case will be distorted.

> 
  Scramble("He said I need direction; I head for the door.");
  Unscramble(

$\begin{maplelatex} \begin{displaymath}\mbox{\lq\lq HU STBQ I FUUQ QBPUEVBXF; I KUTQ I... ...said f need direction; f head for the door.''} \end{displaymath}\end{maplelatex}$

It is traditional in pen-and-paper cryptography to use a 26-letter alphabet (that is, to remove spaces and punctuation and ignore any distinction between upper and lower case letters). If you wish to do this, it could be done^4.4using LowerCase (which translates all letters to lower case), and a combination of Select and IsLower to remove anything that isn't a letter.

> 
  Select(IsLower,LowerCase("I don't mind suggestions. Don't give me any more."));

$\begin{maplelatex} \begin{displaymath}\mbox{\lq\lq idontmindsuggestionsdontgivemeanymore''} \end{displaymath}\end{maplelatex}$

> 
  Scramble(  Unscramble(

$\begin{maplelatex} \begin{displaymath}\mbox{\lq\lq BXFVCNBFQSLCCUSVBXFSQXFVCBAUNUTFDN... ...box{\lq\lq idontmindsuggestionsdontgivemeanymore''} \end{displaymath}\end{maplelatex}$

In general we won't adhere to this convention in this chapter. Rather, we will treat spaces and punctuation like any other character. This is not a common approach, but one we happen to like.

There are two big drawbacks to such a substitution cipher. One primary problem is that the key for enciphering and deciphering is the permutation: you need to remember an ordering of the entire alphabet.

A second problem, shared with all such monoalphabetic substitutions (i.e., those that substitute one letter for another), is that they aren't that hard to break. If you have a long enough message, you can use the fact that certain letters occur much more frequently than others to help tease out the original text. In English text the letter ``e'' is the occurs most frequently, followed by ``t'', ``a'', ``n'', ``i'', and so on. Spaces occur about twice as often as the letter ``e'', assuming they are encrypted, rather than just omitted. Once you guess a few letters, common words like ``the'' and ``and'' fall into place, giving more clues.

We will examine some polyalphabetic substitution ciphers in §5.

The Caesar cipher, and the ASCII encoding

A variation on the arbitrary permutation discussed above is the ``Caesar cipher'', named after Julius Caesar, who supposedly invented it himself. This is also what was done by a ``Captain Midnight Secret Decoder Ring'', popular in the 1930s and 40s. Here we convert our alphabet to numeric equivalents (with, say A=0, B=1, and so on), add an offset to each numeric equivalent (legend has it that Caesar used an offset of 3), then re-encode the numbers as letters. We do the addition modulo the length of the alphabet; that is, when we shift a Y by 3, we get B, since 24 + 3 = 27, and 27 $\equiv$ 1 (mod 26). However, we won't always restrict ourselves to a 26 letter alphabet-- we may want punctuation, numbers, and to preserve upper- and lower-case distinctions.^4.5

While we could easily implement the Caesar cipher without converting our text to numbers, doing so makes it easier to understand and sets us up for more complicated ciphers we will need later.

Treating characters as numbers

Computers already have a built-in numeric representation of the alphabet-- the ASCII encoding.^4.6At their lowest level, digital computers operate only on numeric quantities-- they just manipulate ones and zeros. The smallest piece of information is the state of a switch: it is off (0) or it is on (1). This is called a bit, which is short for binary digit. A group of 8 bits is called a byte, and is a single ``character'' of information.^4.7A byte can have 2⁸ = 256 different states:

00000000, 00000001, 00000010, 00000011, ..., 11111111

which are generally interpreted either as representing the integers 0 through 255, or the highest bit is often taken to represent the sign, in which case we can represent integers between -128 and 127. (We get an ``extra'' negative number because we don't need -0, so the byte 10000000 can mean -128 instead.) For notational convenience, bytes are often written in hexadecimal (that is, in base 16), grouping the bits into two blocks of 4 bits (sometimes called a nybble, half a byte), and using the characters A-F for the integers 10-15. Thus, the byte 01011111 would be written as 5F in hexadecimal, since 0101 means 2² + 2⁰ = 5, and 1111 is 2³ + 2² + 2¹ + 2⁰ = 15.

So how does a computer represent characters? There is an agreed upon standard encoding of characters, punctuation, and control codes into the first 127 integers. This is called ASCII,^4.8an acronym for American Standard Code for Information Interchange. Extended ASCII encodes more characters into the range 128-255 (this is where you would find characters for å and ±, for example), although the characters found in this range vary a bit. Not all programs can deal successfully with characters in the extended range^4.9

Maple has a built-in routine to allow us to convert characters to and from their ASCII equivalents. If it is called with a list of numbers, it returns a string of characters, and if called with a character string, out comes a list of numbers. For example,

> 
  convert("How do I work this?",bytes);

$\begin{maplelatex} \begin{displaymath}[72, 111, 119, 32, 100, 111, 3... ... 107, 32, 116, 104, 105, 115, 63] \end{displaymath}\end{maplelatex}$

> 
  convert([72, 117, 104, 63],bytes);

$\begin{maplelatex} \begin{displaymath}\mbox{\lq\lq Huh?''} \end{displaymath}\end{maplelatex}$

We can use this to make a list of the ASCII codes. Maple prints a $\Box$ when the ASCII code is not a printable character. We make the table for characters up through 127 because the meanings in extended ASCII differ depending on what character set you use. Feel free to make your own table of all 255 characters. (In the interests of making a nicely formatted table, we have used a somewhat complicated command. The much simpler seq([i,convert([i],bytes)],i=0..127); would have produced the same information, just not as neatly.)

> 
   matrix([ [_,seq(i, i=0..9)],
           seq([10*j, seq(convert([i+10*j],bytes), i=0..9)], j=0..12)]);

Note that some of the characters (such as ASCII code 10) print as a backslash and a letter, like \n. These are codes for special control characters: \n indicates a newline, \t is a tab, and so on. Since Maple indicates a string by printing it inside double quotes, it indicates the double quote^4.10character (ASCII 34) using \'', and the backslash character (ASCII 92) with a pair of backslashes. While it doesn't print out in Maple, ASCII 127 is the control sequence DEL (sometimes printed as ^?). This sometimes is interpreted as ``delete the previous character''; if your output device interprets it that way, characters can seem to mysteriously vanish.

Another special character is ASCII code 0: note that it prints as nothing, not even a blank space (ASCII 32 is the space character). This is the null character, and is typically used internally to indicate the end of a string. If a null occurs in the middle of a string, usually what comes after it won't print out unless you take special steps. To see this for yourself, try a command like convert([73,116,39,115,0,103,111,110,101,33],bytes). You'll see that the characters following the 0 don't print out.

The StringTools package also contains Char which gives the character corresponding to a decimal number, and Ord which gives the number corresponding to a character. We'll generally use convert, although it makes little difference.

Render unto Caesar

Now we are nearly ready to write a version of the Caesar cipher. The basic idea is that for each character of the plaintext, we convert it a numerical representation, add a pre-specified offset modulo 127, then convert back to characters. Unlike the Scramble function we wrote in §2.1, we needn't declare an alphabet explicitly because we will use the ASCII codes. However, in addition to the Maple command convert( ,bytes), we will use another new command, map.

The map command allows us to apply a function to every element of a vector or list in one call. For example, to compute the square root of each of the first 10 integers, we could do

> 
  map(sqrt,[1,2,3,4,5,6,7,8,9,10]);

$\begin{maplelatex} \begin{displaymath}[1, \sqrt{2}, \sqrt{3}, 2, \sqrt{5... ... \sqrt{ 7}, 2 \sqrt{2}, 3, \sqrt{10}] \end{displaymath}\end{maplelatex}$

If the function we wish to apply takes two variables, we can give the second one as a third argument to map. For example, the command

> 
  map(modp,[1,2,3,4,5,6,7,8,9,10],3);

$\begin{maplelatex} \begin{displaymath}[1, 2, 0, 1, 2, 0, 1, 2, 0, 1] \end{displaymath}\end{maplelatex}$

computes n mod 3 as n ranges over the list [1, 2,..., 10]. (Recall that n mod p is the remainder after dividing n by p.)

We finally are ready. See if you can understand it before reading the explanation below. Remember to read from the middle outward. Note that we are defining the function we are mapping directly as the first argument.

> 
  Caesar:= (plain, shift) -> 
           convert(map(x -> (x+shift-1) mod 127 + 1,
                       convert(plain,bytes)),
                   bytes):

Let's give it a try:

> 
  Caesar("Et tu, Brute?",3);

$\begin{maplelatex} \begin{displaymath} \mbox{\lq\lq Hw\char93 wx/\char93 EuxwhB''} \end{displaymath}\end{maplelatex}$

If you don't see what is going on in Caesar, try to think of it as a composition of several functions, each of which you DO understand. That is, Caesar works as follows

First, we convert the input string (called plain) into a list of numbers, using the call convert(plain,bytes). Note that this is done on the third line, the innermost function.
Next, we apply the function x ->(x+shift-1) mod 127 + 1 to each of numbers in the resulting list using map. This call to map has two arguments: the definition of the shift, and what to apply it to (the innermost function convert(plain,bytes)).
Finally, we convert the list back to characters again. This is the outermost function: it begins on the second line (immediately after the -> and ends with the final parenthesis.

Viewing it as a composition, we have

$\displaystyle \tt Caesar$ : = $\displaystyle \mverb$ convert^-1o $\displaystyle \mverb$ mapo $\displaystyle \mverb$ convert.

Note that we didn't really compute $(x + \mbox{\mverb{shift}}) \mod 127$ ; instead we subtracted one from the sum, computed mod 127, then added 1 again. Why? Consider what would happen if, for example, our character was ``k'' (ASCII code 107) and we shifted by 20. Then (107 + 20) mod 127 gives 0, which is the code for the null byte, indicating the end of a string. This would be a problem. So instead, we add 1 afterward to ensure that our answer is in the range 1,..., 127. And just to avoid confusing ourselves, we subtract the 1 first, so that shifting by 0 doesn't change anything.

Aside from the fact that this function is a bit confusing to read and offers essentially no cryptographic security, what else is wrong with it? Consider the following example:

> 
   Caesar("Who put the benzedrine in Mrs. Murphy's Ovaltine?", 86);

$\begin{maplelatex} \begin{displaymath} \mbox{\lq\lq .?FvGLKvK?$<$v9$<$EQ$<$;I@E$<$v@Ev\$IJ$\Box $v\$LIG?P\}Jv\&M8CK@E$<\Box $''} \end{displaymath}\end{maplelatex}$

In addition to the fact that the result looks funny, if we were to try to decipher this from printed output, we would have a little (but not much) trouble, because the characters ``.'' and ``?'', after encoding, print out as $\Box$ .^4.11 This happens because the numeric values of these characters, after shifting by 86, are 5 and 22, respectively. Neither of those corresponds to a printing character, so we can't tell them apart. Of course, if we had no intention of dealing with our encoded message in printed form, this would be no problem. The actual character values are different, they just print out the same.

How can we get around this? We can restrict ourselves to printing characters using a variation of the idea in Scramble of §2.1, that is, giving an explicit Alphabet that we will deal in.

Before doing this, however, it will be helpful to discuss the proc command, which allows us to construct functions that consist of more than one Maple command.

Footnotes

... package ^4.2: The StringTools package was introduced in Maple 7, and the string data type was introduced in Maple V release 5. Everything we do in this chapter can easily be implemented without using StringTools- in fact, the original version of this chapter was written for Maple Vr4 using names rather than strings. Since Maple 6 is still in widespread use (Spring 2002), we'll try to include alternatives to using this package when practical.
... functions ^4.3: In Maple Vr5 and Maple 6, we could define CharacterMap as
CharacterMap := (old,new,txt)->cat(seq(new[SearchText(txt[i],old)],i=1..length(txt)));
This will only work if all characters in the txt correspond to ones in old, so a space would have to be added to both Alphabet and Cryptabet to get this example to work.
... done ^4.4: How to accomplish this without StringTools is left to the seriously motivated reader.
... distinctions.^4.5: For example, the title of this chapter (fsqFsHn sGGousG) is actually the result of using a Caesar cipher on its original title. A 53-character alphabet consisting of all the upper-case letters, a space, and all the lower-case letters was used, so the space in the middle may not correspond to a space in the original. The three Gs that occur should be a good hint to decoding it.
... encoding.^4.6: In fact, while most computers these days use ASCII, it is not the only such arrangement. IBM mainframes, for example, primarily use another encoding called EBCDIC. But this makes no real difference to our discussion.
... information.^4.7: The next larger unit in a computer is a word. However, different makes of computers have different sizes of words. Most computers today have 4 byte words (32 bits), but word sizes of 2 bytes (16 bits) and 8 bytes (64 bits) are not uncommon. 8-bit bytes are now essentially universal although in the early days of computing, there were computers which used other sizes, such as 6 bit bytes.
...ASCII,^4.8: ASCII is not the only such standard, although it is currently the most commonly used. Prior to the widespread use of personal computers in the 1980s, EBCDIC (Extended Binary Coding for Data Interchange and Communication) tended to be in more common use, because it was what IBM mainframes used. There are many other such assignments.
... range ^4.9: In the late 1980s and early 1990s, there was a big push to make programs ``8-bit clean'', that is, to be able to handle these ``unusual'' characters properly. There are a number of distinct assignments of characters to the upper 128 positions. One common one is Latin-1 (ISO 8859-1), which contains the characters used in the majority of western European languages. Other assignments include characters for Scandinavian languages, Arabic, Cyrillic, Greek, Hebrew, Thai, and so on. Since the late 1990s, a multi-byte character encoding called Unicode (ISO 10646) has been in use; this universal character set allows computers to handle alphabets (such as Chinese) that need more than 255 characters. While use of Unicode is becoming more widespread, single byte character sets like ASCII will likely predominate for quite a long while.
... quote ^4.10: The ASCII code does not have distinct codes for the open-quote (``) and close-quote ('') characters used in typesetting. Neither do most computer keyboards. However, the apostrophe (') and grave accent or backquote (`) characters have distinct ASCII codes (39 and 96, respectively).
....^4.11: Of course, it is easy to make examples where significant parts of the message all come out as $\Box$ . One such is the same message with a shift of 50.

Next: Defining functions with proc; Up: fsqFsHn sGGousG Previous: Introduction to Cryptography

Translated from LaTeX by Scott Sutherland
2002-08-29