Next: Defining functions with proc; Up: fsqFsHn sGGousG Previous: Introduction to Cryptography

Subsections

Simple Ciphers

Simple substitution

One of the most common (and very easy to crack) ciphers is substitution. One sometimes sees these in a newspaper somewhere near the crossword puzzle. This scheme is also the one used in Poe's story ``The Gold Bug'', and the Sherlock Holmes ``Mystery of the Dancing Men''. The idea is straightforward: choose a rearrangement of the letters of the alphabet, and replace each letter in the plaintext by its corresponding one.

As an example, let's implement this in Maple. Recall that we need to use a backquote (`) around character strings in Maple.^3.2

> Alphabet :=`abcdefghijklmnopqrstuvwxyz `:
  Cryptabet:=`THEQUICKBROWNFXJMPSVLAZYDG `:
  

  Scramble:= msg ->
     cat(seq(substring(Cryptabet,
                       searchtext(substring(msg,i),
                                  Alphabet)),
             i=1..length(msg)));

$\begin{maplelatex}\begin{eqnarray*} \lefteqn{\mathit{Scramble} := \mathit{msg}\r... ...rm{length}(\mathit{msg}))) \mbox{\hspace{215pt}} \end{eqnarray*}\end{maplelatex}$

> Scramble(`please do not read this`);

$\begin{maplelatex}\begin{displaymath} \mathit{JWUTSU\ QX\ FXV\ PUTQ\ VKBS} \end{displaymath} \end{maplelatex}$

Probably some comments are in order. First, we tell maple what our permutation is by defining the Alphabet that our plaintext is written in, and the corresponding Cryptabet. Our encoding function Scramble operates as follows (read from the middle out)

substring(msg,i) extracts the i-th character from the input string msg.
searchtext looks up where the character is in Alphabet.
substring is used again, to extract the corresponding character from Cryptabet.
this is done for each i ranging from 1 to the length of msg, giving a sequence of characters.
finally, all the characters are smashed together again into one long string using cat.

Note that our ``alphabet'' contains a space, which corresponds to a space in our key. We didn't need to do this, but it left the message in words of the same length. This, of course, makes the message easier to guess. Also, notice that we used searchtext, which ignores case in the search (that is, ``A'' and ``a'' are the same). Had we wished to, we could have distinguished between upper an lower case using SearchText in which case we would also need to use a 53 letter alphabet (and a 53 letter key). In this implementation, we also had to limit our input message to characters occurring in our alphabet. Had our message contained numbers or punctuation, the function Scramble would have failed with an error.

The usual way to break such a code is a combination of ``frequency analysis'' (that is, knowing that the most commonly occurring letter in the English language is ``E'', followed by ``T''), plus a bit of trial and error. Leaving the spaces between words helps the cracker greatly.

The Caesar cipher, and the ASCII encoding

A variation on the arbitrary permutation discussed above is the ``Caesar cipher'', named after Julius Caesar, who supposedly invented it himself. This is also what was done by a ``Captain Midnight Secret Decoder Ring'', popular in the 1930s and 40s. Here we convert our alphabet to numeric equivalents (with, say A=0, B=1, and so on), add an offset to each numeric equivalent (Caesar used 3), then re-encode the numbers as letters. We do the addition modulo the length of the alphabet; that is, when we shift a Y by 3, we get B, since 24 + 3 = 27, and 27 $\equiv$ 1 (mod 26). However, we won't always restrict ourselves to a 26 letter alphabet-- we may want punctuation, numbers, and to preserve upper and lower case distinctions.

Treating characters as numbers

Computers already have a built-in numeric representation of the alphabet-- the ASCII encoding.^3.3 At their lowest level, digital computers operate only on numeric quantities-- they just manipulate ones and zeros. The smallest piece of information is the state of a switch: it is off (0) or it is on (1). This is called a bit, which is short for binary digit. A group of 8 bits is called a byte, and is a single ``character'' of information. A byte can have 2⁸ = 256different states:

00000000, 00000001, 00000010, 00000011,..., 11111111

which are generally interpreted either as representing the integers 0 through 255, or the highest bit is often taken to represent the sign, in which case we can represent integers between -128 and 127. (We get an ``extra'' negative number because we don't need -0, so the byte 10000000 can mean -128 instead.)^3.4 For notational convenience, bytes are often written in hexadecimal (that is, in base 16), grouping the bits into two blocks of 4 bits (sometimes called a nybble, half a byte), and letting the characters A-F stand for the integers 10-15. Thus, the byte 01011111 would be written as 5F in hexadecimal. So how does a computer represent characters? There is an agreed upon standard encoding of characters, punctuation, and control codes into the first 127 integers. This is called ASCII, an acronym for American Standard Code for Information Interchange. ``Extended ASCII'' encodes more characters into the range 128-255 (this is where you would find characters for å and $\pm$ , for example), although the characters found in this range vary a bit. Not all programs can deal successfully with characters in the extended range.^3.5

Maple has a built-in routine to allow us to convert characters to and from their ASCII equivalents. If it is called with a list of numbers, it returns a string of characters, and if called with a character string, out comes a list of numbers. For example,

> convert(`How do I work this¿,bytes);

$\begin{maplelatex}\begin{displaymath}[72, \,111, \,119, \,32, \,100, \,111, \,32... ...,107, \,32, \,116, \,104, \,105, \,115, \,63] \end{displaymath} \end{maplelatex}$

> convert([72, 117, 104, 63],bytes);

$\begin{maplelatex}\begin{displaymath} \mbox{Huh?} \end{displaymath} \end{maplelatex}$

We can use this to make a list of the ASCII codes. Maple prints a $\mbox{\rlap{$\sqcup$ }\raise .2ex\hbox{$\sqcap$ }}$ when the ASCII code is not a printable character.

> seq([i,convert([i],bytes)],i=0..127);

$\begin{maplelatex}\begin{eqnarray*} & & [0, ], \,[1, \,\mbox{\rlap{$\sqcup$ }\ra... ...ox{\rlap{$\sqcup$ }\raise .2ex\hbox{$\sqcap$ }}] \end{eqnarray*}\end{maplelatex}$

Render unto Caesar

Now we are nearly ready to write a version of the Caesar cipher. The basic idea is that for each character of the plaintext, we convert it to its ASCII encoding, add a pre-specified offset modulo 127, then convert back to characters. It will look very similar to the Scramble function we wrote in §2.1, except we needn't declare an alphabet explicitly. However, in addition to the maple commands convert( ,bytes), we will use another new command, map.

The map command allows us to apply a function to every element of a vector or list in one call. For example, to compute the square root of each of the first 10 integers, we could do

> map(sqrt,[1,2,3,4,5,6,7,8,9,10]);

$\begin{maplelatex}\begin{displaymath}[1, \,\sqrt{2}, \,\sqrt{3}, \,2, \,\sqrt{5}... ...\,\sqrt{ 7}, \,2\,\sqrt{2}, \,3, \,\sqrt{10}] \end{displaymath} \end{maplelatex}$

If the function we wish to apply takes two variables, we can give the second one as a third argument to map. For example, the command

> map(modp,[1,2,3,4,5,6,7,8,9,10],3);

$\begin{maplelatex}\begin{displaymath}[1, \,2, \,0, \,1, \,2, \,0, \,1, \,2, \,0, \,1] \end{displaymath} \end{maplelatex}$

computes n mod 3 as n ranges over the list [1, 2,..., 10]. (Recall that n mod p is the remainder after dividing n by p.

We finally are ready. Remember to read from the middle outward. Note that we are defining the function we are mapping directly.

> Caesar:= (plain, shift) -> 
    convert(map((x,y)->(x+y-1) mod 127 + 1,
                convert(plain,bytes),
                shift),
            bytes);

$\begin{maplelatex}\begin{eqnarray*} \lefteqn{\mathit{Caesar} := (\mathit{plain},... ...t{bytes}) , \,\mathit{shift}), \,\mathit{bytes}) \end{eqnarray*}\end{maplelatex}$

Let's give it a try:

> Caesar(`Hello there`,3);

$\begin{maplelatex}\begin{displaymath} \mbox{Khoor\char93 wkhuh} \end{displaymath} \end{maplelatex}$

It seems to work. Note that we didn't really compute $(x + \mbox{\tt shift}) \mbox{mod}\,127$ ; instead we subtracted one from the sum, computed mod 127, then added 1 again. Why? Consider what would happen if, for example, our character was ``k'' (ASCII code 107) and we shifted by 20. Then 107 + 20 mod 127gives 0, which is the code for the null byte, indicating the end of a string. This would be a problem. So instead, we add 1 afterwards to ensure that our answer is in the range 1,..., 127. And just to avoid confusing ourselves, we subtract the 1 first, so that shifting by 0 doesn't change anything.

Also, you might wonder why we can't use

> map(x -> (x+shift-1) mod 127 + 1, convert(plain,bytes))

in the middle of the function. If you try it, you'll see it does not work. This is because maple tries to fill in the value of shift at the wrong time.

Now, aside from the fact that this function is a bit confusing to read and offers essentially no cryptographic security, what else is wrong with it? Consider the following example:

> Caesar(`Who put the benzedrine in Mrs. Murphy's Ovaltine¿, 86);

$\begin{maplelatex}\begin{displaymath} \mbox{.?FvGLKvK?$<$ v9$<$ EQ$<$ ;I@E$<$ v@... ...lap{$\sqcup$ }\raise .2ex\hbox{$\sqcap$ }}$ } \end{displaymath} \end{maplelatex}$

In addition to the fact that the result looks funny, if we were to try to decipher this from printed output, we would have a little (but not much) trouble, because the characters ``.'' and ``?'', after encoding, print out as $\mbox{\rlap{$\sqcup$ }\raise .2ex\hbox{$\sqcap$ }}$ .^3.6 This happens because the numeric values of these characters, after shifting by 86, are 5 and 22, respectively. Neither of those corresponds to a printing character, so we can't tell them apart. Of course, if we had no intention of dealing with our encoded message in printed form, this would be no problem.

How can we get around this? We can restrict ourselves to printing characters using a variation of the idea in Scramble of §2.1, that is, giving an explicit Alphabet that we will deal in.

Before doing this, however, it will be helpful to discuss the proc command, which allows us to construct functions that consist of more than one maple command.

Footnotes

... Maple.^3.2: In Maple V release 5 and later, strings may be surrounded by double quotes ("), which is a more common way of denoting character strings. However, using a backquote works in all versions.
... encoding.^3.3: In fact, while most computers these days use ASCII, it is not the only such arrangement. IBM mainframes, for example, primarily use another encoding called EBCDIC. But this makes no real difference to our discussion.
... instead.)^3.4: The next larger unit in a computer is a word. However, different makes of computers have different sizes of words. Most computers today have 4 byte words (32 bits), but word sizes of 2 bytes (16 bits) and 8 bytes (64 bits) are not uncommon. 8-bit bytes are now essentially universal although in the early days of computing, there were computers which used other sizes, such as 6 bit bytes.
... range.^3.5: In the late 1980s and early 1990s, there was a big push to make programs ``8-bit clean'', that is, to be able to handle these ``unusual'' characters. Currently (late 1990s), work is being done to integrate the handling of alphabets (such as Chinese) that need more than 255 characters.
... $\mbox{\rlap{$\sqcup$ }\raise .2ex\hbox{$\sqcap$ }}$ .^3.6: Of course, it is easy to make examples where significant parts of the message all come out as $\mbox{\rlap{$\sqcup$ }\raise .2ex\hbox{$\sqcap$ }}$ . One such is the same message with a shift of 50.

Next: Defining functions with proc; Up: fsqFsHn sGGousG Previous: Introduction to Cryptography

Translated from LaTeX by Scott Sutherland
1999-12-08