Implementing a Basic Vigenere Cipher in Python

Written by Dan Sackett on January 19, 2015

Taking the Caesar cipher from my last post, we can build a more complex, yet still brittle, cipher known as a Vigenere cipher.

When we talk about naive ciphers, the Vigenere cipher is likely the the most secure. It builds on the principle of the Caesar cipher yet provides a decent way to avoid the easy to solve shift problems. The basic gist of this cipher is we have both a message and a key. The key can be any length, but you must repeat the key for the length of our message to get this to work. This can be seen here:

alpha    = ABCDEFGHIJKLMNOPQRSTUVWXYZ
message  = IAMTHEWALRUS
key      = HELLOHELLOHE

Our key is actually "HELLO", but we expanded it to the length of our message giving us the repeated nature we see. Once we have these defined, we go character by character performing a pseudo-Caesar cipher.

m1 = I = 9
k1 = H = 8
9 + 8 = 17 % 26 = 17 = Q
c1 = Q

Looking at this, we see that the first character of our message is "I" which is the ninth letter in the alphabet. We then look at the first character of the key which is "H" or the eighth letter in the alphabet. We add those two numbers and modulo 26 giving us 17 which points to the seventeenth letter in the alphabet: "Q". We now know that the first letter of our ciphertext is "Q".

We then repeat this method for each character in our message until we have the ciphertext. This can be better shown as the algorithm:

Let m be our message and k be our key:
E(m) = ((m1 + k1) % 26, (m2 + k2) % 26, ..., (mi + ki) % 26)
D(m) = ((c1 - k1) % 26, (c2 - k2) % 26, ..., (ci - ki) % 26)

This is much like the Caesar cipher except instead of defining a fixed rotation, we allow our key's character index to be the rotation. As you can see, this is why the Vigenere cipher can be considered a string of Caesar ciphers. Pretty cool when you actually see it.

So how do we attack this then?

Well, the problem with this cipher is the fact that the key repeats itself. When you have a repeating key, it's common to see patterns in the ciphertext that completely match each other. By recognizing those patterns, you can determine the block size of the key and from there you simply do a Caesar brute force shift on each block until the plaintext appears.

On relatively short messages, this is harder to crack (as with any short ciphertext) but if encrypting a uniformly distributed text then you can really start to pick up on these things.

Now let's code this up in Python and see how how we can automate this:

from itertools import cycle

ALPHA = 'abcdefghijklmnopqrstuvwxyz'


def encrypt(key, plaintext):
    """Encrypt the string and return the ciphertext"""
    pairs = zip(plaintext, cycle(key))
    result = ''

    for pair in pairs:
        total = reduce(lambda x, y: ALPHA.index(x) + ALPHA.index(y), pair)
        result += ALPHA[total % 26]

    return result.lower()


def decrypt(key, ciphertext):
    """Decrypt the string and return the plaintext"""
    pairs = zip(ciphertext, cycle(key))
    result = ''

    for pair in pairs:
        total = reduce(lambda x, y: ALPHA.index(x) - ALPHA.index(y), pair)
        result += ALPHA[total % 26]

    return result


def show_result(plaintext, key):
    """Generate a resulting cipher with elements shown"""
    encrypted = encrypt(key, plaintext)
    decrypted = decrypt(key, encrypted)

    print 'Key: %s' % key
    print 'Plaintext: %s' % plaintext
    print 'Encrytped: %s' % encrypted
    print 'Decrytped: %s' % decrypted

Starting from the top, I'm importing the cycle() function from the itertools library. You will see why in a minute but it's important here. I then define our alphabet in order to get character indexes correctly. In my encryption function, I'm taking in a plaintext value and a key and from those I build a list of tuples. Since I'm doing this with zip() and zip() is a terminating sequence which ends after the shorter string is exhausted, I use cycle() to repeat the letters of my key for the entire length of the plaintext.

I then use a reduce statement so I can get the index of our characters to add them. I could use the sum here as well, but I would need to remember that I need the indexes of the letters and not the letters themselves or else we'll get a value error. I finally get the new letter after a modulo of 26 and append that to our resulting ciphertext string.

The decryption function is exactly the same but instead of adding numbers I'm subtracting them. As with the other programs, I provided a print function to see this in action. We can run:

show_result('ihadadream', 'boom')

Where "ihadadream" is the message and "boom" is the key. We then get back:

Key: boom
Plaintext: ihadadream
Encrytped: jvopbrfqba
Decrytped: ihadadream

Just as we expected! This cipher is fun to use and is somewhat smart in its construction. It can still be targeted by key frequencies, but it's not as easy as some of the others to crack that way.

Conclusion

This ends my first three ciphers which were all naive ciphers. You can see the code for all of these on my Github repo. They're all trivial to solve yet they laid the groundwork for better ciphers. These concepts of substitution and rotation are smart ways to obfuscate messages and help you have more secure communication. While not practical in a real world setting, they're good to know when you want to stump a lot of people in a cheap way.

Stay tuned as I prepare next week to get into stream ciphers which are both fast and more secure alternatives.

As with the last posts, I'll leave you with a message encoded on a Vigenere cipher:

lfrhlcuegwybksbsrrwtllfrhlcuegwy

python cryptography

comments powered by Disqus