Implementing a Basic Substitution Cipher in Python

Written by Dan Sackett on January 15, 2015

As I progress in my Cryptography class, I find myself very immersed in things.

In fact, I think crypto work is one of my new muses. It's fun to think about how to take a plaintext string and convert that to a ciphertext string and then back again. Today, I wanted to do the first of many crypto posts on the simple substitution cipher.

A substitution cipher is a simple idea where we create a map of keys and then randomize those keys and assign them back as values. With those, we would then be able to go letter by letter in a plaintext message and substitute the current letter for the new mapped letter. This new word is a ciphertext and looks like random gibberish.

So for instance if we took all of the lowercase letters from a-z then we could create a mapping like so:

a = z
b = y
c = t
d = f
...
z = h

By doing this, we now have a way to convert a plaintext message to a ciphertext message. Let's try one to get the idea of how the encryption works.

alphabet = abcdefghijklmnopqrstuvwxyz
key      = ixangqjmlkbfsepdyuozcrhwvt

plaintext  = hello world
ciphertext = mgffp hpufn

As we can see, we took each letter from "hello world" and matched that to the appropriate value in our key to form a new word. Now, if we choose to decrypt this message, all we need to know is the key and we can go letter by letter doing the opposite that you did before until we get back our plaintext value.

See, nice and simple.

Unfortunately for us, doing something this simple raises red flags. If we look at the possible number of solutions to this then we will see that we have 26!(factorial) possible solutions which in terms of computing is a tiny range. Even worse is the fact that this type of cipher easy falls victim to a letter frequencies distribution. For instance, since the letter "e" is the most widely used letter in English (12.7% occurence rate), an attacker can attempt to find an "e" to slowly crack the mapping. After many iterations, they can have the entire mapping (key) and decrypt your message and any other message that you use your key to encrypt.

So how do we avoid this?

The best answer of course is to use a stronger cipher. There are many better implementations out there and this is more for a fun puzzle situation honestly. We can however take small steps to make cracking our key harder. For instance, what if we masked the spaces better? As of now, it's easy to see where words are and therefore much easier to guess.

We can add a space to our characters like so:

alphabet = abcdefghijklmnopqrstuvwxy z
key      = ixangqjmlkbfsepdyuozcrhw vt

plaintext  = hello world
ciphertext = mgffpvhpufn

Now we have one long word. This may seem like a small victory but what if we have a sentence with a lot of spaces? Based on frequencies, we can assume where the space is and have one leg in the door. Still, we expanded our potential keys by adding a space so what if we created a much larger space with uppercase letters, lowercase letters, numbers, and special characters?

Well by doing that we have actually increased our key space from 26! to 63!. This makes an attack take more time and allows us to obscure our message even more. I will admit however, the substitution cipher still isn't unsolvable. In the same sense as before, all the attack has to do is use word frequencies to make logical guesses to the key.

Still, let's play with this. I wrote a python script to show how we could do this in code:

import random

chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' + \
        'abcdefghijklmnopqrstuvwxyz' + \
        '0123456789' + \
        ':.;,?!@#$%&()+=-*/_<> []{}`~^"\'\\'

def generate_key():
    """Generate an key for our cipher"""
    shuffled = sorted(chars, key=lambda k: random.random())
    return dict(zip(chars, shuffled))

def encrypt(key, plaintext):
    """Encrypt the string and return the ciphertext"""
    return ''.join(key[l] for l in plaintext)

def decrypt(key, ciphertext):
    """Decrypt the string and return the plaintext"""
    flipped = {v: k for k, v in key.items()}
    return ''.join(flipped[l] for l in ciphertext)

def show_result(plaintext):
    """Generate a resulting cipher with elements shown"""
    key = generate_key()
    encrypted = encrypt(key, plaintext)
    decrypted = decrypt(key, encrypted)

    print 'Key: %s' % key
    print 'Plaintext: %s' % plaintext
    print 'Encrytped: %s' % encrypted
    print 'Decrytped: %s' % decrypted

Starting from the top, I defined all of the characters that we want to use in our key. This will allow us to create the random mapping that we need. I then have a function which does the actual generation of the key. I simply create a shuffled copy of the characters and then zip them into a dict with the original characters and we have a mapping ready.

The encryption function is next and it's very straightforward. All I do is generate a new string taking each character of the plaintext and replacing it with the value it maps to. My decryption function does essentially the same thing except I have to reverse the mapping dict so I can get the correct mapping back to the regular alphabet.

I then provided a utility print function to show that the result worked including the actual key used. We can run the function like so:

show_result('Hello, world!')

This then returns:

Key: {' ': 'I', '$': 'V', '(': '7', ',': 'i', '0': '!', '4': '}', '8': ')', '<': '#', '@': '-', 'D': ']', 'H': 'R', 'L': 'T', 'P': 'X', 'T': '@', 'X': 'S', '\\': 'o', '`': 'Z', 'd': '5', 'h': '3', 'l': 'a', 'p': 'c', 't': 'z', 'x': 'k', '#': 'x', "'": 'm', '+': 'r', '/': '^', '3': 'j', '7': '~', ';': 'G', '?': 'e', 'C': 'v', 'G': 's', 'K': 'l', 'O': 'C', 'S': '(', 'W': 'b', '[': 'W', '_': 'B', 'c': '"', 'g': '>', 'k': 'A', 'o': 'f', 's': '+', 'w': 'N', '{': '2', '"': '8', '&': ';', '*': 'M', '.': 'd', '2': 'U', '6': '1', ':': 'y', '>': 'h', 'B': 'J', 'F': 'Q', 'J': '<', 'N': 'F', 'R': 'g', 'V': 'P', 'Z': '*', '^': '6', 'b': ',', 'f': '4', 'j': '9', 'n': '[', 'r': ' ', 'v': '0', 'z': 'E', '~': '`', '!': 'L', '%': '=', ')': '%', '-': '?', '1': '$', '5': 'q', '9': 'p', '=': 'O', 'A': 'Y', 'E': '/', 'I': '.', 'M': 'K', 'Q': 'D', 'U': '{', 'Y': '&', ']': 'H', 'a': 't', 'e': ':', 'i': 'n', 'm': 'w', 'q': "'", 'u': '_', 'y': '\\', '}': 'u'}
Plaintext: Hello, world!
Encrytped: R:aafiINf a5L
Decrytped: Hello, world!

Every time you run this, the key will change so your output will differ from mine. As you can see from the result though, we have a working cipher!

Conclusion

See, crypto can be fun! While it is strongly recommended that you use time tested functions to do your encrypting, it is fun to find out how these ciphers actually work behind the scenes. I'll be doing some more posts on ciphers as the days go on and as I code them up. For now, you can see what I'm working on or get code from my Github repo.

As a parting gift for those that want a challenge, I'll leave you a message if you want to try and solve it. It is generated using the code that I listed here.

Mi Zk+}Z$%-RZi;$i lZ(}ZikZ Yh'Z(idHbZi-RZk+}Zlh-Zi;$i lZhKY-Z Yh'Z@id}bZi-RZ1i Zk+}Z$%-tlZY@ZR}lk%- Zdi'' Z YhZi;Y@kZkYZRi-d}Z$%k+Zk+}Zlki'l

python cryptography

comments powered by Disqus