eulerbooks/notebooks/problem0059.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4a3c53da",
   "metadata": {},
   "source": [
    "# [XOR Decryption](https://projecteuler.net/problem=59)\n",
    "\n",
    "The encryption scheme described in the problem is the [Vigenère cipher](https://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher). Until the 19th century, there were no known methods to break it. (In a seemingly rare exception, SageMath has some [tooling for Vigenère ciphers](https://doc.sagemath.org/html/en/reference/cryptography/sage/crypto/classical.html), but it's not that useful for this problem.)\n",
    "\n",
    "First things first, let's read in our ciphertext."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "3ba17bb2",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"txt/0059_cipher.txt\") as f:\n",
    "    ciphertext = bytes(int(n) for n in f.read().split(\",\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1130e9c",
   "metadata": {},
   "source": [
    "To understand how to break the Vigenère cipher, it's helpful to first study a much simpler (but related) cipher.\n",
    "\n",
    "## Caesar cipher\n",
    "The [Caesar cipher](https://en.wikipedia.org/wiki/Caesar_cipher) works by XORing each character in the plaintext with the *same byte* - put another way, it's like the Vigenère cipher with a key of length 1. This cipher is trivial to crack via brute force, but it can also be broken using [frequency analysis](https://en.wikipedia.org/wiki/Frequency_analysis). Since each character is encrypted with the same byte, the distribution of characters in the plaintext will perfectly match the distribution of bytes in the ciphertext. This allows us to make an educated guess about the key.\n",
    "\n",
    "For example, the most common character in English text is the space character. This means if you encrypt English text with a Caesar cipher, the most common byte in the ciphertext is most likely to be an encrypted space character. This allows you to guess the key by XORing the most common encrypted byte with the space character. You can then check by XORing your key guess with the whole ciphertext and just eyeballing if it's now readable.\n",
    "\n",
    "This function applies the frequency analysis approach. It returns possible keys for the Caesar cipher, ordered by most to least likely, assuming that the most common character in the plaintext is the space character."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "aacec89a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "\n",
    "def break_caesar(ciphertext):\n",
    "    frequencies = Counter(ciphertext)\n",
    "    return (b ^^ ord(\" \") for (b, _) in frequencies.most_common())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1bfb343f",
   "metadata": {},
   "source": [
    "## Vigenère cipher\n",
    "What makes the Vigenère cipher trickier is that it's possible that two instances of the same character get XORed with *different* characters from the key, resulting in two distinct bytes in the ciphertext. This causes the distribution of characters in the ciphertext to change from the plaintext, so a direct frequency analysis is ineffective. However, if we're a bit more clever, we can still apply frequency analysis.\n",
    "\n",
    "The first step is to determine the likely length of the key. Fortunately for us, this was already given as 3 in the problem, but there are techniques for deducing the key length yourself, too, such as Kasiski examination or the Friedman test.\n",
    "\n",
    "Since the key length is 3, we know that the 0th, 3rd, 6th, ... characters were all XORed with the same byte - likewise for the 1st, 4th, 7th, ... and 2nd, 5th, 8th, ... characters. In other words, each batch of characters was encrypted with a *separate Caesar cipher*. This means we can find each character of the overall key by just applying frequency analysis on each batch separately.\n",
    "\n",
    "So, to tackle this problem, we can generate all the possible keys from the Cartesian product of the possible keys in each Caesar cipher."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c48a1bd6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import product\n",
    "\n",
    "caesar_keys = (break_caesar(ciphertext[i::3]) for i in (0, 1, 2))\n",
    "keys = (bytes(k) for k in product(*caesar_keys))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fa589d2",
   "metadata": {},
   "source": [
    "And here's a simple function for decrypting (and encrypting) text with the given key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7dff4c4c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import cycle\n",
    "\n",
    "def decrypt(ciphertext, key):\n",
    "    return bytes(x ^^ y for (x, y) in zip(ciphertext, cycle(key)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c2e0417",
   "metadata": {},
   "source": [
    "We can then iterate through all our possible keys and try decrypting with each key to see if we get something sensible. Since our technique of breaking each Caesar cipher is probabilistic, it's possible in the general case that we have to try multiple keys, if it turns out that the space character is not the most common character in one or more of our batches.\n",
    "\n",
    "Fortunately for us, the first key we generate ends up being the one we're looking for."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a712bebb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b'An extract taken from the introduction of one of Euler\\'s most celebrated papers, \"De summis serierum reciprocarum\" [On the sums of series of reciprocals]: I have recently found, quite unexpectedly, an elegant expression for the entire sum of this series 1 + 1/4 + 1/9 + 1/16 + etc., which depends on the quadrature of the circle, so that if the true sum of this series is obtained, from it at once the quadrature of the circle follows. Namely, I have found that the sum of this series is a sixth part of the square of the perimeter of the circle whose diameter is 1; or by putting the sum of this series equal to s, it has the ratio sqrt(6) multiplied by s to 1 of the perimeter to the diameter. I will soon show that the sum of this series to be approximately 1.644934066842264364; and from multiplying this number by six, and then taking the square root, the number 3.141592653589793238 is indeed produced, which expresses the perimeter of a circle whose diameter is 1. Following again the same steps by which I had arrived at this sum, I have discovered that the sum of the series 1 + 1/16 + 1/81 + 1/256 + 1/625 + etc. also depends on the quadrature of the circle. Namely, the sum of this multiplied by 90 gives the biquadrate (fourth power) of the circumference of the perimeter of a circle whose diameter is 1. And by similar reasoning I have likewise been able to determine the sums of the subsequent series in which the exponents are even numbers.'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "key = next(keys)\n",
    "plaintext = decrypt(ciphertext, key)\n",
    "plaintext"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "c2cbe5a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b'exp'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "key"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66847037",
   "metadata": {},
   "source": [
    "They're so clever over at Project Euler, aren't they?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "111c2bf6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "129448"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sum(plaintext)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8832df2",
   "metadata": {},
   "source": [
    "#### Copyright (C) 2025 filifa\n",
    "\n",
    "This work is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International license](https://creativecommons.org/licenses/by-sa/4.0/) and the [BSD Zero Clause license](https://spdx.org/licenses/0BSD.html)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "SageMath 9.5",
   "language": "sage",
   "name": "sagemath"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}