One of the most common (and very easy to crack) ciphers is substitution. One sometimes sees these in a newspaper somewhere near the crossword puzzle. This scheme is also the one used in Poe's story ``The Gold Bug'', and the Sherlock Holmes ``Mystery of the Dancing Men''. The idea is straightforward: choose a rearrangement of the letters of the alphabet, and replace each letter in the plaintext by its corresponding one.
As an example, let's implement this in Maple. Recall that we need to use a backquote (`) around character strings in Maple.3.2
Scramble:= msg ->
cat(seq(substring(Cryptabet,
searchtext(substring(msg,i),
Alphabet)),
i=1..length(msg)));
> Alphabet :=`abcdefghijklmnopqrstuvwxyz `:
Cryptabet:=`THEQUICKBROWNFXJMPSVLAZYDG `:
> Scramble(`please do not read this`);
Probably some comments are in order. First, we tell maple what our
permutation is by defining the Alphabet
that our plaintext is
written in, and the corresponding Cryptabet
. Our encoding function
Scramble
operates as follows (read from the middle out)
substring(msg,i)
extracts the i-th
character from the input string msg
.
searchtext
looks up where the character is in
Alphabet
.
substring
is used again, to extract the corresponding
character from Cryptabet
.
msg
, giving a sequence of characters.
cat
.
Note that our ``alphabet'' contains a space, which corresponds to a
space in our key. We didn't need to do this, but it left the message
in words of the same length. This, of course, makes the message
easier to guess. Also, notice that we used searchtext
, which ignores
case in the search (that is, ``A'' and ``a'' are the same).
Had we wished to, we could have distinguished between upper an lower case
using SearchText
in which case we would also need
to use a 53 letter alphabet (and a 53 letter key). In this
implementation, we also had to limit our input message to characters
occurring in our alphabet. Had our message contained numbers or
punctuation, the function Scramble
would have failed with an
error.
The usual way to break such a code is a combination of ``frequency analysis'' (that is, knowing that the most commonly occurring letter in the English language is ``E'', followed by ``T''), plus a bit of trial and error. Leaving the spaces between words helps the cracker greatly.
A variation on the arbitrary permutation discussed above is the ``Caesar
cipher'', named after Julius Caesar, who supposedly invented it himself.
This is also what was done by a ``Captain Midnight Secret Decoder Ring'',
popular in the 1930s and 40s. Here we convert our alphabet to numeric
equivalents (with, say A=0, B=1, and so on), add an offset to each numeric
equivalent (Caesar used 3), then re-encode the numbers as letters. We do
the addition modulo the length of the alphabet; that is, when we shift a Y
by 3, we get B, since
24 + 3 = 27, and
27
1 (mod 26). However,
we won't always restrict ourselves to a 26 letter alphabet-- we may want
punctuation, numbers, and to preserve upper and lower case distinctions.
Computers already have a built-in numeric representation of the alphabet-- the ASCII encoding.3.3 At their lowest level, digital computers operate only on numeric quantities-- they just manipulate ones and zeros. The smallest piece of information is the state of a switch: it is off (0) or it is on (1). This is called a bit, which is short for binary digit. A group of 8 bits is called a byte, and is a single ``character'' of information. A byte can have 28 = 256different states:
Maple has a built-in routine to allow us to convert characters to and from
their ASCII equivalents. If it is called with a list of numbers, it returns
a string of characters, and if called with a character string, out comes a
list of numbers. For example,
> convert(`How do I work this¿,bytes);
> convert([72, 117, 104, 63],bytes);
We can use this to make a list of the ASCII codes. Maple prints a
when the ASCII code is not a printable character.
> seq([i,convert([i],bytes)],i=0..127);
Now we are nearly ready to write a version of the Caesar cipher.
The basic idea is that for each character of the plaintext, we convert it to
its ASCII encoding, add a pre-specified offset modulo 127, then convert back
to characters. It will look very similar to the Scramble
function we wrote in §2.1, except we needn't declare
an alphabet explicitly. However, in addition to the maple commands
convert( ,bytes)
, we will use another new command, map
.
The map
command allows us to apply a function to every element of a
vector or list in one call. For example, to compute the square root
of each of the first 10 integers, we could do
> map(sqrt,[1,2,3,4,5,6,7,8,9,10]);
If the function we wish to apply takes two variables, we can
give the second one as a third argument to map
. For example, the
command
> map(modp,[1,2,3,4,5,6,7,8,9,10],3);
computes n mod 3 as n ranges over the list [1, 2,..., 10]. (Recall that n mod p is the remainder after dividing n by p.
We finally are ready. Remember to read from the middle outward.
Note that we are defining the function we are map
ping directly.
> Caesar:= (plain, shift) ->
convert(map((x,y)->(x+y-1) mod 127 + 1,
convert(plain,bytes),
shift),
bytes);
Let's give it a try:
> Caesar(`Hello there`,3);
It seems to work. Note that we didn't really compute
; instead we subtracted one
from the sum, computed
mod 127, then added 1 again. Why?
Consider what would happen if, for example, our character was ``k''
(ASCII code 107) and we shifted by 20. Then
107 + 20 mod 127gives 0, which is the code for the null byte, indicating the end of
a string. This would be a problem. So instead, we add 1 afterwards
to ensure that our answer is in the range
1,..., 127. And just
to avoid confusing ourselves, we subtract the 1 first, so that
shifting by 0 doesn't change anything.
Also, you might wonder why we can't use
> map(x -> (x+shift-1) mod 127 + 1, convert(plain,bytes))
in the middle of the function. If you try it, you'll see it does not work.
This is because maple tries to fill in the value of shift
at the
wrong time.
Now, aside from the fact that this function is a bit confusing to read and offers essentially no cryptographic security, what else is wrong with it? Consider the following example:
> Caesar(`Who put the benzedrine in Mrs. Murphy's Ovaltine¿, 86);
In addition to the fact that the result looks funny, if we were to try to
decipher this from printed output, we would have a little (but not much)
trouble, because the characters ``.'' and ``?'', after encoding, print out
as
.3.6 This happens because the numeric values of these
characters, after shifting by 86, are 5 and 22, respectively. Neither
of those corresponds to a printing character, so we can't tell them apart.
Of course, if we had no intention of dealing with our encoded message in
printed form, this would be no problem.
How can we get around this? We can restrict ourselves to printing
characters using a variation of the idea in Scramble
of
§2.1, that is, giving an explicit Alphabet
that we will deal in.
Before doing this, however, it will be helpful to discuss the proc
command, which allows us to construct functions that consist of more than
one maple command.