Next: Reading and Writing from Up: fsqFsHn sGGousG Previous: Caesar cipher redux

Subsections

The Vignère cipher
One-time pads
Multi-character alphabets

Improved Caesar-like ciphers

Certainly the Caesar cipher offers no cryptographic security at all: if you know the alphabet the message was encoded in, you need only guess one character to crack the code. Even if you don't know the alphabet, guessing the correspondence is not very hard with a little patience.

In this section, we will discuss a few approaches to improving the security, while retaining the basic idea of character shifting.

The Vignère cipher

One way to make a Caesar cipher a bit harder to break is to use different shifts at different positions in the message. For example, we could shift the first character by 25, the second by 14, the third by 17, and the fourth by 10. Then we repeat the pattern, shifting the fifth character by 25, the sixth by 14, and so on, until we run out of characters in the plaintext. Such a scheme is called a Vignère cipher^4.14, which was first used around 1600, and was popularly believed to be unbreakable.^4.15 This cipher is called a polyalphabetic substitution cipher, because several different substitutions are made depending on the position of the character within the text.

In our first example, the key consists of the four shifts [25, 14, 17, 10], which are the numerical equivalents of the string ``ZORK'' in a 26-letter alphabet consisting of the letters A-Z. It is common practice to think of our key as plaintext letters, rather than their numerical equivalents, but either will do. We can encode the string ``CRYPTOGRAPH'' as

C	R	Y	P	T	O	G	R	A	P	H
+25	+14	+17	+10	+25	+14	+17	+10	+25	+14	+17
B	F	P	Z	S	C	X	B	Z	D	Y

Note that in the above, the letter ``R'' in the plaintext encodes to both ``F'' and ``B'' in the crypttext, depending on its position. Similarly, the two ``Z''s in the crypttext come from different plain characters.

Now we implement this in Maple. This is very similar to the Caesar cipher, just with the extra complication of multiple shifts, and letting our key be a string.

First, we set our Alphabet to the usual one. We also use the conversion functions from §4.

> 
  Alphabet := cat("", Select(IsPrintable,convert([seq(i,i=1..255)],bytes))):

> 
  Vignere:= proc(plaintext::string, key::string)
    local textnum,codenum,i,p,offsets,keylen;
    global Alphabet;
  

  p       := length(Alphabet);
    offsets := StringToList(key);
    keylen  := length(key);
    textnum := StringToList(plaintext);
    codenum := [seq( modp(textnum[i] + offsets[modp(i-1,keylen)+1], p),
                    i=1..length(plaintext)) ];  
    ListToString(codenum);
  end:

To try it out, we'll the same text as in the previous section. Notice how much harder it is to pick out the word boundaries in the resulting ciphertext.

> 
  coded:=Vignere(text,"Prufrock");

$\begin{maplelatex} \mapleinline{inert}{2d}{coded := ''\{t^HiUeT6ThKtdLQR\lq [Y\lq QMPD... ...ckslash\backslash$PmD$\backslash\backslash$dNtdSm$ >$X\$''} } } \end{maplelatex}$

We can make the decoding function from the original (let's call it unVignere) by changing exactly one + sign to a -.^4.16We omit the change here so perhaps you will figure it out for yourself. But we will test it, to show you that it does work.

> 
  printf(unVignere(coded,"Prufrock"));

$\begin{maplettyout} I have heard the mermaids singing, each to each. I do not think that they will sing to me. \end{maplettyout}$

Even though this scheme looks quite daunting, it is not so very hard to crack if you use a computer or have a very large supply of perseverance. If we know that the key is of a certain length, say 4, and our plaintext is sufficiently long, then we can perform frequency analysis on every fourth letter. Even if we don't know the key length, it is not too hard to write a computer program to try all the lengths less than, say, 10, and pick the one that looks best.

One-time pads

Note that the longer the key is in the Vignère cipher, the harder it is to break. If the key were as longer than the text, then it might seem at first that analyzing the frequency of letters in the encrypted text would be of no help, since each letter would be shifted by a different amount. This is almost true. If the key is an passage of text in English, then the shifts will occur with a predictable frequency. Of course, the problem gets very difficult, but cryptanalysts are persistent people.

But what if there were no predictability within the key, having the shifts come at random? The result (a Vignère cipher with an infinitely long, completely random key) is an cryptosystem that cannot be broken. Since the shifts are random, if you manage to decipher part of the message, this gives you no clue about the rest. Furthermore, any plaintext of a given length can encrypt to any ciphertext (with a different key, of course). For example, the ciphertext ``=5nwhn KDNO?uWTBC-XA'' might have come from the phrase ``Let's have dinner.'' (with the key ``pOyntmbbXYtrjSTGe1''), or it might be the encryption of ``Attack at midnight'' with the key ``{@y4#!Jbz>&moSYEoL''. Since any message could encrypt to any other, there is no way to break such a code unless you know the key.

But that is the problem: the key is infinitely long. Infinitely long, truly random sequences of numbers tend to be somewhat unwieldy. And to decode the message, you must know what random sequence the message was encoded with.

Such a system is called a one-time pad, and was used regularly by spies and military personnel. Agents were furnished with codebooks containing pages and pages of random characters. Then the key to the encryption is given by the page on which to begin. It is, of course, important that each page be used only once (hence the name ``one-time pad''), because otherwise if a codebreaker were able to intercept a message and (via some other covert means) its corresponding translation, that could be used to decipher messages encoded with the same page. This sort of setup makes sense if an agent in the field is communicating with central command (but not with each other). Each agent could be given his own codebook (so that if he is captured, the whole system is not compromised), and he uses one page per message. Central command has on file the books for each agent.

A variation on this theme is the Augustus cipher,^4.17where instead of a random sequence of shifts, a phrase or passage from a text which is as long as the plaintext is used. The trouble with this is that, because of the regularities in the key, a statistical analysis of the crypttext allows one to break the cipher.

Another issue is that to be truly unbreakable, the random sequence must be truly random, with no correlation among the characters. This is harder than it sounds-- real randomness is hard to come by. If the random sequence has some predictability, the resulting stream can be attacked. A number of attacks on cryptosystems have been made not by breaking the encryption scheme directly, but because the underlying random-number generator was predictable.

We can easily modify our Vignere program to be a one-time pad system, using Maple's random number generator to make our one-time pad.^4.18You might think that generating a random sequence of numbers would be inherently unreproducible, so the message would be indecipherable unless we record the stream of random numbers. However, computers can not usually generate a truly random sequence. Instead, they generate a pseudo-random sequence s₁, s₂, s₃,... where the pattern of the numbers s_i is supposed to be unpredictable, no matter how many of the values of the sequence you already know. However, to start the sequence off, the pseudo-random number generator requires a seed-- whenever it is started with the same seed, the same sequence results. We can use this to our advantage, taking the seed to be our key.^4.19Note that in order to decode the message by knowing the key (the seed), the recipient must use the same pseudo-random number generator.

Maple's random number generator gives different results when called in different ways. If called as rand(), it gives a pseudo-random, non-negative 12 digit integer. When called as rand(p), it gives a procedure that can be called later to generate a random integer between 0 and p; it is this second version that we will use. The seed for Maple's random number generators is the global variable _seed.^4.20

> 
  OneTimePad := proc(plaintext::string, keynum::posint)
    local textnum,codenum,i,p,randnum;
    global Alphabet,_seed;
  

  p       := length(Alphabet);
    randnum := rand(p);
    _seed   := keynum;
    textnum := StringToList(plaintext);
    codenum := [seq((textnum[i] + randnum()) mod p, i=1..length(plaintext))];
    ListToString(codenum);
  end:

In this implementation, it is assumed that the key is a positive integer (it can be as large as you like). It would be easy to change it to use a string of characters, however, by converting the string to a number first. One way to do that is discussed in the next section.

In most descriptions of one-time pad systems, one takes the exclusive-or (XOR) of the random number and the integer of the plaintext, rather than adding them as we have done. This does not significantly change the algorithm if the random sequence is truly random, since adding a random number is the same as XOR'ing a different random number. However, the XOR approach has the advantage that the enciphering transformation is its own inverse. That is, if we produce the crypttext using crypt:=OneTimePadXOR(plain,key), then OneTimePadXOR(crypt,key) will give the decryption with the same key. This is not the case for the version given above; to make a decryption procedure, we would need to modify the above by changing the + to a -.

Multi-character alphabets

We can also improve security a bit by treating larger chunks of text as the characters of our message. For example, if we start with the usual 26-letter alphabet A-Z, we can turn it into a 676-letter alphabet by treating pairs of letters as a unit (such pairs are called digraphs), or a 26³-letter alphabet by using trigraphs, or triples of letters. This makes frequency analysis much harder, and is quite easy to combine with the other crytptosystems already discussed. We will use 99-graphs on a 256-letter alphabet (the ASCII code) when we implement the RSA cryptosystem in §11.2. While frequency analysis is still possible (charts of digraph frequencies are readily available, trigraphs less so), the analysis is much more complex.

To convert the digraph ``HI'' to an integer (using a length 26² alphabet of digraphs), one simple way is to just treat it as a base-26 number. That is, ``HI'' becomes 7×26 + 8 = 190, assuming the correspondence of H=7, I=8. To convert back, we look at the quotient and remainder when dividing by 26. For example, 300 = 26×11 + 14, yielding ``LO''.

Either we can do this arithmetic ourselves directly, or we can use the convert(,base) command. This command takes a list of numbers in one base and converts it to another. One slightly confusing fact is that in this conversion, the least significant figure comes first in the list,^4.21 instead of the usual method of writing numbers with the most significant digit first.

For example, 128 = 1×10² + 2×10¹ + 8×10⁰ would be written as [8,2,1]. To convert this to base 16, we would note that 128 = 8×16¹ + 0×16⁰, so in base 16, it is written as 80.

Doing this calculation in Maple, we have

> 
  convert([8,2,1], base, 10, 16);

$\begin{maplelatex} \begin{displaymath}[0,8]\end{displaymath}\end{maplelatex}$

Below is one way to implement the conversion of text to numeric values for k-graphs. We assume our usual functions StringToList and ListToString are defined (see §4), as well as the global Alphabet. The routine below converts text into a list of integers, treating each block of k letters as a unit. A block of k characters c₁c₂c₃...c_k is assigned the numeric value $\sum_{i=0}^{k-1}$ x_ip^k, where x_i is the numeric equivalent of c_{i + 1} assigned by StringToList.

> 
  StringToKgraph := proc(text::string, k::posint)
     local p;
     global Alphabet;
     p:= length(Alphabet);
     convert(StringToList(text), base, p, p^k);
  end:

> 
  KgraphToString := proc(numlist::list(nonnegint), k::posint)
     local p;
     global Alphabet;
     p:=length(Alphabet);
     ListToString( convert(numlist, base, p^k, p));
  end:

In the examples below, we are using our usual 97-character alphabet. Of course, this will work on any alphabet, although the specific numbers will differ.

> 
  StringToKgraph("two by two",2);

$\begin{maplelatex} \begin{displaymath}[8719, \,275, \,8895, \,8344, \,7946]\end{displaymath}\end{maplelatex}$

In our alphabet, ``t'' is character number 86 and ``w'' is number 89, so the digraph ``tw'' encodes as 86 + 89×97 = 8719 (remember we are using a little-endian representation). Similarly, ``o '' gives 2×97 + 81, and so on. Notice that the two occurrences of ``two'' give different numbers, because one begins on an even character and the other starts on an odd one. The first corresponds to the digraphs ``tw'' and ``o '', while the second is `` t'' and ``wo''.

We can also encode with 4-graphs, if we like.

> 
  StringToKgraph("two by two",4);

$\begin{maplelatex}[2596194, \,78517591, \,7946] \end{maplelatex}$

One advantage (for us) of Maple's use of the little-endian order is that we needn't worry whether the length of our text is divisible by k. In the above example, the last 4-graph is ``wo'', which is encoded as though it had two more characters on the end with numeric code of 0. The disadvantage of this is that if our text ends with the character with code 0 (in our standard 97-character alphabet, this is a newline), that character will be lost.

Another way to treat multiple characters together is to think of them as vectors. For example, the digraph ``by'' might correspond to the vector [68, 91]. We will treat this approach in §8.

Footnotes

... cipher ^4.14: This cipher takes its name after Blaise de Vignère, although it is actually a corruption of the one he introduced in 1585. Vignère's original cipher changed the shift amount each letter based on the result of the last encoding, and never repeated. This scheme is much harder to break. However, one reason for its lack of popularity was probably due to the fact that a single error renders the rest of the message undecipherable. More details van be found in [Kahn].
... unbreakable.^4.15: In fact, as late as 1917, this cipher was described as ``impossible of translation'' in a respected journal (Scientific American), even though the means to break it had been well known among cryptographers for at least 50 years.
....^4.16: Note that we could also use the Vignere routine, but with the inverse of the key. For example, in the preceding example, the inverse of ``Prufrock'' is ``M+(7+.:2'': the numeric code of P plus the numeric code of M is 97 (the length of the alphabet), similarly for r and +, u and (, and so on.
... cipher,^4.17: Sometimes a Caesar cipher with a shift of +1 is also called an ``Augustus Cipher'', even though these are very different ciphers.
... pad.^4.18: Technically speaking, this is not a one-time pad, but a one-time stream. The distinction is subtle, and we will ignore it here.
... key.^4.19: Pseudo-random number generators appropriate for cryptography are rare. Most implementations (including Maple's) are good enough for everyday use, but not enough to be cryptographically secure. By analyzing the output of a typical random number generator, a good cryptanalyst can usually determine the pattern. For example, Maple's rand function (and that of most computer languages, such as C, Fortran, and Java) gives the result of a affine sequence of numbers, reduced to some modular base. That is, x_i = ax_{i - 1} + b and s_i = x_i mod n for some fixed choices of a, b, and n. In this setting, the seed is x₀. We shall ignore the problem that this sequence is guessable, but if you want real security, you cannot.
..._seed.^4.20: We can choose a ``random'' seed (based on the computer's clock) using the function randomize().
... list,^4.21: Such a representation is called little-endian, as opposed to a big-endian one. Some computers represent numbers internally in big-endian format (Sun SPARC, Power Macintosh), and others use a little-endian representation (those using Intel processors, which is what MS-Windows and most versions of Linux run on). This name comes from Gulliver's Travels, where in the land of Blefuscu there is war between the big-endians (who eat their eggs big end first), and the little-endians (who eat eggs starting from the little end).

Next: Reading and Writing from Up: fsqFsHn sGGousG Previous: Caesar cipher redux

Translated from LaTeX by Scott Sutherland
2002-08-29