Improved Caesar-like ciphers

Certainly the Caesar cipher offers no cryptographic security at all: if you know the alphabet the message was encoded in, you need only guess one character to crack the code. Even if you don't know the alphabet, guessing the correspondence is not very hard with a little patience.

In this section, we will discuss a few approaches to improving the security, while retaining the basic idea of character shifting.

One way to make a Caesar cipher a bit harder to break is to use different
shifts at different positions in the message. For example, we could shift
the first character by 25, the second by 14, the third by 17, and the fourth
by 10. Then we repeat the pattern, shifting the fifth character by 25, the
sixth by 14, and so on, until we run out of characters in the plaintext.
Such a scheme is called a Vignère cipher^{4.14},
which was first used around 1600, and was popularly believed to be
unbreakable.^{4.15}
This cipher is called a polyalphabetic substitution cipher, because
several different substitutions are made depending on the position of
the character within the text.

In our first example, the key consists of the four shifts [25, 14, 17, 10], which are the numerical equivalents of the string ``ZORK'' in a 26-letter alphabet consisting of the letters A-Z. It is common practice to think of our key as plaintext letters, rather than their numerical equivalents, but either will do. We can encode the string ``CRYPTOGRAPH'' as

C |
R |
Y |
P |
T |
O |
G |
R |
A |
P |
H |

+25 | +14 | +17 | +10 | +25 | +14 | +17 | +10 | +25 | +14 | +17 |

B |
F |
P |
Z |
S |
C |
X |
B |
Z |
D |
Y |

Note that in the above, the letter ``R'' in the plaintext encodes to both ``F'' and ``B'' in the crypttext, depending on its position. Similarly, the two ``Z''s in the crypttext come from different plain characters.

Now we implement this in Maple. This is very similar to the Caesar cipher,
just with the extra complication of multiple shifts, and letting our key be a
string.

First, we set our Alphabet to the usual one. We also use the conversion functions from §4.

> Alphabet := cat("", Select(IsPrintable,convert([seq(i,i=1..255)],bytes))):

> Vignere:= proc(plaintext::string, key::string) local textnum,codenum,i,p,offsets,keylen; global Alphabet;p := length(Alphabet); offsets := StringToList(key); keylen := length(key); textnum := StringToList(plaintext); codenum := [seq( modp(textnum[i] + offsets[modp(i-1,keylen)+1], p), i=1..length(plaintext)) ]; ListToString(codenum); end:

To try it out, we'll the same text as in the previous section. Notice how much harder it is to pick out the word boundaries in the resulting ciphertext.

> coded:=Vignere(text,"Prufrock");

We can make the decoding function from the original (let's call it
unVignere) by changing exactly one + sign to a -.^{4.16}We omit the change here so perhaps you will figure it out for yourself. But
we will test it, to show you that it does work.

> printf(unVignere(coded,"Prufrock"));

Even though this scheme looks quite daunting, it is not so very hard to crack
if you use a computer or have a very large supply of perseverance.
If we know that the key is of a certain length, say 4, and our plaintext is
sufficiently long, then we can perform frequency analysis on every fourth
letter. Even if we don't know the key length, it is not too hard to write a
computer program to try all the lengths less than, say, 10, and pick the one
that looks best.

Note that the longer the key is in the Vignère cipher, the harder it is to
break. If the key were as longer than the text, then it might seem at first
that analyzing the frequency of letters in the encrypted text would be of no
help, since each letter would be shifted by a different amount. This is
*almost* true. If the key is an passage of text in English, then the
shifts will occur with a predictable frequency. Of course, the problem gets
very difficult, but cryptanalysts are persistent people.

But what if there were no predictability within the key, having the shifts
come at random? The result (a Vignère cipher with an infinitely long,
completely random key) is an cryptosystem that *cannot be broken*.
Since the shifts are random, if you manage to decipher part of the message,
this gives you no clue about the rest. Furthermore, any plaintext of a given
length can encrypt to any ciphertext (with a different key, of course).
For example, the ciphertext ``=5nwhn KDNO?uWTBC-XA'' might have come
from the phrase ``Let's have dinner.'' (with the key
``pOyntmbbXYtrjSTGe1''), or it might be the encryption of ``Attack at
midnight'' with the key ``{@y4#!Jbz>&moSYEoL''. Since any message
could encrypt to any other, there is no way to break such a code unless you
know the key.

But that is the problem: the key is infinitely long. Infinitely long, truly random sequences of numbers tend to be somewhat unwieldy. And to decode the message, you must know what random sequence the message was encoded with.

Such a system is called a one-time pad, and was used regularly by spies and military personnel. Agents were furnished with codebooks containing pages and pages of random characters. Then the key to the encryption is given by the page on which to begin. It is, of course, important that each page be used only once (hence the name ``one-time pad''), because otherwise if a codebreaker were able to intercept a message and (via some other covert means) its corresponding translation, that could be used to decipher messages encoded with the same page. This sort of setup makes sense if an agent in the field is communicating with central command (but not with each other). Each agent could be given his own codebook (so that if he is captured, the whole system is not compromised), and he uses one page per message. Central command has on file the books for each agent.

A variation on this theme is the Augustus cipher,^{4.17}where instead of a
random sequence of shifts, a phrase or passage from a text which is as long
as the plaintext is used. The trouble with this is that, because of the
regularities in the key, a statistical analysis of the crypttext allows one
to break the cipher.

Another issue is that to be truly unbreakable, the random sequence must be truly random, with no correlation among the characters. This is harder than it sounds-- real randomness is hard to come by. If the random sequence has some predictability, the resulting stream can be attacked. A number of attacks on cryptosystems have been made not by breaking the encryption scheme directly, but because the underlying random-number generator was predictable.

We can easily modify our Vignere program to be a one-time pad system,
using Maple's random number generator to make our one-time pad.^{4.18}You might think that generating a random sequence of numbers would be
inherently unreproducible, so the message would be indecipherable unless we
record the stream of random numbers. However, computers can not
usually generate a truly random sequence. Instead, they generate a
pseudo-random sequence
*s*_{1}, *s*_{2}, *s*_{3},... where the pattern
of the numbers *s*_{i} is supposed to be unpredictable, no matter how
many of the values of the sequence you already know. However, to
start the sequence off, the pseudo-random number generator requires a
seed-- whenever it is started with the same
seed, the same sequence results. We can use this to our advantage, taking
the seed to be our key.^{4.19}Note that in order to decode the message by knowing the key (the seed), the
recipient must use the *same* pseudo-random number generator.

Maple's random number generator gives different results when called in
different ways. If called as rand(), it gives a pseudo-random,
non-negative 12 digit integer. When called as rand(p), it gives a
procedure that can be called later to generate a random integer
between 0 and p; it is this second version that we will use.
The seed for Maple's random number generators is the global variable
_seed.^{4.20}

> OneTimePad := proc(plaintext::string, keynum::posint) local textnum,codenum,i,p,randnum; global Alphabet,_seed;p := length(Alphabet); randnum := rand(p); _seed := keynum; textnum := StringToList(plaintext); codenum := [seq((textnum[i] + randnum()) mod p, i=1..length(plaintext))]; ListToString(codenum); end:

In this implementation, it is assumed that the key is a positive integer (it can be as large as you like). It would be easy to change it to use a string of characters, however, by converting the string to a number first. One way to do that is discussed in the next section.

In most descriptions of one-time pad systems, one takes the exclusive-or (XOR) of the random number and the integer of the plaintext, rather than adding them as we have done. This does not significantly change the algorithm if the random sequence is truly random, since adding a random number is the same as XOR'ing a different random number. However, the XOR approach has the advantage that the enciphering transformation is its own inverse. That is, if we produce the crypttext using crypt:=OneTimePadXOR(plain,key), then OneTimePadXOR(crypt,key) will give the decryption with the same key. This is not the case for the version given above; to make a decryption procedure, we would need to modify the above by changing the + to a -.

We can also improve security a bit by treating larger chunks of text as the
characters of our message. For example, if we start with the usual
26-letter alphabet A-Z, we can turn it into a 676-letter alphabet by
treating pairs of letters as a unit (such pairs are called digraphs),
or a 26^{3}-letter alphabet by using trigraphs, or triples of letters. This
makes frequency analysis much harder, and is quite easy to combine with the
other crytptosystems already discussed. We will use 99-graphs on a
256-letter alphabet (the ASCII code) when we implement the RSA cryptosystem in
§11.2. While frequency analysis is still
possible (charts of digraph frequencies are readily available,
trigraphs less so), the analysis is much more complex.

To convert the digraph ``HI'' to an integer (using a length 26^{2} alphabet
of digraphs), one simple way is to just treat it as a base-26 number. That
is, ``HI'' becomes
7×26 + 8 = 190, assuming the correspondence of
H=7, I=8. To convert back, we look at the quotient and remainder when
dividing by 26. For example,
300 = 26×11 + 14, yielding ``LO''.

Either we can do this arithmetic ourselves directly, or we can use
the convert(,base) command. This command takes a list of
numbers in one base and converts it to another. One slightly confusing
fact is that in this conversion, the least significant figure comes
first in the list,^{4.21}
instead of the usual method of writing numbers with the most
significant digit first.

For example,
128 = 1×10^{2} + 2×10^{1} + 8×10^{0} would be
written as [8,2,1]. To convert this to base 16, we would note
that
128 = 8×16^{1} + 0×16^{0}, so in base 16, it is written
as 80.

Doing this calculation in Maple, we have

> convert([8,2,1], base, 10, 16);

Below is one way to implement the conversion of text to numeric values
for k-graphs. We assume our usual functions StringToList and
ListToString are defined (see §4), as
well as the global Alphabet. The routine below converts
text into a list of integers, treating each block of k
letters as a unit.
A block of k characters
*c*_{1}*c*_{2}*c*_{3}...*c*_{k} is assigned the
numeric value
*x*_{i}*p*^{k}, where *x*_{i} is the numeric
equivalent of *c*_{i + 1} assigned by StringToList.

> StringToKgraph := proc(text::string, k::posint) local p; global Alphabet; p:= length(Alphabet); convert(StringToList(text), base, p, p^k); end:

> KgraphToString := proc(numlist::list(nonnegint), k::posint) local p; global Alphabet; p:=length(Alphabet); ListToString( convert(numlist, base, p^k, p)); end:

In the examples below, we are using our usual 97-character alphabet. Of course, this will work on any alphabet, although the specific numbers will differ.

> StringToKgraph("two by two",2);

In our alphabet, ``t'' is character number 86 and ``w'' is number 89, so the digraph ``tw'' encodes as 86 + 89×97 = 8719 (remember we are using a little-endian representation). Similarly, ``o '' gives 2×97 + 81, and so on. Notice that the two occurrences of ``two'' give different numbers, because one begins on an even character and the other starts on an odd one. The first corresponds to the digraphs ``tw'' and ``o '', while the second is `` t'' and ``wo''.

We can also encode with 4-graphs, if we like.

> StringToKgraph("two by two",4);

One advantage (for us) of Maple's use of the little-endian order is
that we needn't worry whether the length of our text is divisible by
*k*. In the above example, the last 4-graph is ``wo'', which is
encoded as though it had two more characters on the end with numeric
code of 0. The disadvantage of this is that if our text ends with the
character with code 0 (in our standard 97-character alphabet, this is a
newline), that character will be lost.

Another way to treat multiple characters together is to think of them as
vectors. For example, the digraph ``by'' might correspond to the vector
[68, 91]. We will treat this approach in §8.

- ... cipher
^{4.14} - This cipher takes its name after Blaise de Vignère,
although it is actually a corruption of the one he introduced in 1585.
Vignère's original cipher changed the shift amount each letter based on
the result
of the last encoding, and never repeated. This scheme is
*much*harder to break. However, one reason for its lack of popularity was probably due to the fact that a single error renders the rest of the message undecipherable. More details van be found in [Kahn]. - ...
unbreakable.
^{4.15} - In fact, as late as 1917, this cipher was described as ``impossible of translation'' in a respected journal (Scientific American), even though the means to break it had been well known among cryptographers for at least 50 years.
- ....
^{4.16} - Note that we could also use the Vignere routine, but with the inverse of the key. For example, in the preceding example, the inverse of ``Prufrock'' is ``M+(7+.:2'': the numeric code of P plus the numeric code of M is 97 (the length of the alphabet), similarly for r and +, u and (, and so on.
- ... cipher,
^{4.17} - Sometimes a Caesar cipher with a shift of +1 is also called an ``Augustus Cipher'', even though these are very different ciphers.
- ... pad.
^{4.18} - Technically speaking, this is not a one-time pad, but a one-time stream. The distinction is subtle, and we will ignore it here.
- ... key.
^{4.19} - Pseudo-random number generators appropriate for cryptography are rare.
Most implementations (including Maple's) are good enough for everyday
use, but not enough to be cryptographically secure. By analyzing the
output of a typical random number generator, a good cryptanalyst
can usually determine the pattern. For example, Maple's
rand function (and that of most computer languages,
such as C, Fortran, and Java)
gives the result of a affine sequence of numbers, reduced to
some modular base. That is,
*x*_{i}=*ax*_{i - 1}+*b*and*s*_{i}=*x*_{i}mod*n*for some fixed choices of*a*,*b*, and*n*. In this setting, the seed is*x*_{0}. We shall ignore the problem that this sequence is guessable, but if you want real security, you cannot. - ..._seed.
^{4.20} - We can choose a ``random'' seed (based on the computer's clock) using the function randomize().
- ... list,
^{4.21} -
Such a representation is called little-endian, as opposed to
a big-endian one. Some computers represent numbers
internally in big-endian format (Sun SPARC, Power Macintosh), and
others use a little-endian representation (those using Intel
processors, which is what MS-Windows and most versions of
Linux run on). This name comes from Gulliver's Travels, where in
the land of
*Blefuscu*there is war between the big-endians (who eat their eggs big end first), and the little-endians (who eat eggs starting from the little end).

2002-08-29