eulerbooks/notebooks/problem0003.ipynb

134 lines
6.4 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "1a8d8609",
"metadata": {},
"source": [
"# [Largest Prime Factor](https://projecteuler.net/problem=3)\n",
"\n",
"The [fundamental theorem of arithmetic](https://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic) says that every integer greater than 1 has a unique prime factorization.\n",
"\n",
"There are numerous factorization algorithms. Some of the easiest to implement are [trial division](https://en.wikipedia.org/wiki/Trial_division), [wheel factorization](https://en.wikipedia.org/wiki/Wheel_factorization) and [Pollard's rho algorithm](https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm).\n",
"\n",
"In general, as an integer gets larger, [factoring it gets harder](https://en.wikipedia.org/wiki/Integer_factorization). We don't have any efficient algorithm for factoring in general, but somewhat strangely, we don't know for certain that no such algorithm exists.\n",
"\n",
"Fortunately for us, this number is not too large for any of these algorithms, or for SageMath."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d2f8884c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[71, 839, 1471, 6857]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prime_factors(600851475143)"
]
},
{
"cell_type": "markdown",
"id": "d4343a42",
"metadata": {},
"source": [
"If we wanted to implement our own factorization algorithm, though, Pollard's rho is relatively easy to implement and more efficient (on average) than trial division. It's also got some interesting math at its foundation, so let's take a look at it.\n",
"\n",
"## Pollard's rho algorithm\n",
"Let $n=pq$ be a positive integer where $p$ and $q$ are non-trivial factors of $n$ (i.e. not 1 or $n$), and let $\\{x_i\\}$ be a sequence such that $x_0$ is some random initial value and\n",
"$$x_{i+1} = (x_i^2 - 1) \\bmod{n}$$\n",
"Since the number of values this sequence can take on is finite, and each term is determined entirely by the previous term, it must ultimately become periodic, although there may be some initial terms before we reach a term that will repeat (refer to [this image](https://en.wikipedia.org/wiki/File:Pollard_rho_cycle.svg) for a visual aid - the fact this diagram looks like the Greek letter $\\rho$ is where the algorithm gets its name).\n",
"\n",
"How many terms will we see before the sequence repeats? If we assume the sequence is random, the [birthday paradox](https://en.wikipedia.org/wiki/Birthday_problem) implies that it will take an average of $O(\\sqrt{n})$ iterations before it repeats (this is using [Big O notation](https://en.wikipedia.org/wiki/Big_O_notation)). Of course, the sequence is not truly random, but it *is* [pseudorandom](https://en.wikipedia.org/wiki/Pseudorandomness), and experimental results seem to corroborate this analysis despite the somewhat faulty assumption.\n",
"\n",
"Now consider the related sequence $\\{x_i \\bmod p\\}$. Obviously, we can't calculate this sequence directly since we don't know the value of $p$, but similarly to $\\{x_i\\}$, this sequence is eventually periodic, with an average of $O(\\sqrt{p})$ iterations before a repetition. In other words, this sequence is expected to repeat earlier than $\\{x_i\\}$.\n",
"\n",
"When $\\{x_i \\bmod p\\}$ cycles, it implies that $x_j \\equiv x_k \\pmod{p}$ for some $j \\neq k$. This implies that $p | (x_j - x_k)$. Since $p$ also divides $n$, this means that $\\gcd(x_j - x_k, n)$ will be some multiple of $p$.\n",
"\n",
"This all means that we simply need to iterate through terms in $\\{x_i\\}$ until we find $x_j$ and $x_k$ such that $\\gcd(x_j - x_k, n)$ equals something other than 1. When it does, by definition this will be a factor of $n$. However, this factor may be $n$ itself, which is trivial. In this case the algorithm fails, but it can be attempted again with a different value for $x_0$.\n",
"\n",
"To facilitate finding $x_j$ and $x_k$, [Brent's cycle-finding algorithm](https://en.wikipedia.org/wiki/Cycle_detection) is used. Briefly, two variables are used to iterate through $\\{x_i\\}$ at different rates. Because of this, the two variables will never have the same index, but eventually $x_i \\equiv x_j \\pmod{p}$, which we will detect with the GCD."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d12993bf",
"metadata": {},
"outputs": [],
"source": [
"from itertools import count\n",
"\n",
"def factorization(n):\n",
" x = randint(0, n-1)\n",
" y = x\n",
" k = 2\n",
" for i in count(1):\n",
" x = (x ** 2 - 1) % n\n",
" d = gcd(y - x, n)\n",
" \n",
" assert d != n\n",
" \n",
" if d != 1:\n",
" return d\n",
" \n",
" if i == k:\n",
" y = x\n",
" k *= 2"
]
},
{
"cell_type": "markdown",
"id": "569b615c",
"metadata": {},
"source": [
"Note that this algorithm gives us a factor, but not necessarily a *prime* factor. However, we could just repeatedly apply the algorithm until we only have prime factors left. But how do we know when a factor is prime? We'll need a [primality test](https://en.wikipedia.org/wiki/Primality_test) like the Miller-Rabin algorithm - see [problem 7](https://projecteuler.net/problem=7) for a discussion of that algorithm."
]
},
{
"cell_type": "markdown",
"id": "12233560",
"metadata": {},
"source": [
"## Relevant sequences\n",
"* Greatest prime factors: [A006530](https://oeis.org/A006530)\n",
"\n",
"#### Copyright (C) 2025 filifa\n",
"\n",
"This work is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International license](https://creativecommons.org/licenses/by-sa/4.0/) and the [BSD Zero Clause license](https://spdx.org/licenses/0BSD.html)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "SageMath 9.5",
"language": "sage",
"name": "sagemath"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}