Quantcast
Channel: Planet Sage
Viewing all articles
Browse latest Browse all 414

Simon Spicer: How to efficiently enumerate prime numbers

$
0
0
There's a part in my code that requires me to evaluate a certain sum
$$ \sum_{p\text{ prime }< \,M} h_E(p) $$
where $h_E$ is a function related to specified elliptic curve that can be evaluated efficiently, and $M$ is a given bound that I know. That is, I need to evaluate the function h_E at all the prime numbers less than $t$, and then add all those values up.

The question I hope to address in this post is: how can we do this efficiently as $M$ gets bigger and bigger? Specifically, what is the best way to compute a sum over all prime numbers up to a given bound when that bound can be very large?

[For those who have read my previous posts (you can skip this paragraph if you haven't - it's not the main thrust of this post), what I want to compute is, for an elliptic curve $E/\mathbb{Q}$, the analytic rank bounding sum $ \sum_{\gamma} \text{sinc}^2(\Delta \gamma) $ over the zeros of $L_E(s)$ for positive parameter $\Delta$; this requires us to evaluate the sum $ \sum_{n < \exp(2\pi\Delta)} c_n\cdot(2\pi\Delta-\log n)$. Here the $c_n$  are the logarithmic derivative coefficients of the completed $L$-function of $E$. Crucially $c_n = 0$ whenever $n$ isn't a prime power, and we can lump together all the terms coming from the same prime; we can therefore express the sum in the form you see in the first paragraph.]

As with so many things in mathematical programming, there is a simple but inefficient way to do this, and then there are more complicated and ugly ways that will be much faster. And as has been the case with other aspects of my code, I've initially gone with the first option to make sure that my code is mathematically correct, and then gone back later and reworked the relevant methods in an attempt to speed things up.

METHOD 1: SUCCINCT BUT STUPID


Here's a Python function that will evaluate the sum over primes. The function takes two inputs: a function $h_E$ and an integer $M$, and returns a value equal to the sum of $h_E(p)$ for all primes less than $M$. We're assuming here that the primality testing function is_prime() is predefined.


As you can see, we can achieve the desired outcome in a whopping six lines of code. Nothing mysterious going on here: we simply iterate over all integers less than our bound and test each one for primality; if that integer is prime, then we evaluate the function h_E at that integer and add the result to y. The variable y is then returned at the end.

Why is this a bad way to evaluate the sum? Because there are far more composite integers than there are primes. According to the prime number theorem, the proportion of integers up to $M$ that are prime is approximately $\frac{1}{\log M}$. For my code I want to compute with bounds in the order of $M = e^{8\pi} \sim 10^{11}$; the proportion of integers that are prime up to this bound value is correspondingly about $\frac{1}{8\pi} \sim 0.04$. That is, 96% of the integers we iterate over aren't prime, and we end up throwing that cycle away.

Just how inefficient this method is of course depends on how quickly we can evaluate the primality testing function is_prime(). The best known deterministic primality testing algorithm has running time that scales with (at most) the 6th power of $\log n$, where $n$ is the number being tested. This places primality testing in a class of algorithms called Polynomial Time Complexity Algorithms, which means the runtime of the function scales relatively well with the size of the input. However, what kills us here is the sheer number of times we have to call is_prime() - on all integers up to our bound $M$ - so even if it ran in constant time the prime_sum() function's running time is going to scale with the magnitude of $M$.

METHOD 2: SKIP THOSE $n$ WE KNOW ARE COMPOSITE


We can speed things up considerably by noting that apart from 2, all prime numbers are odd. We are therefore wasting a huge amount of time running primality tests on integers that we know a priori are composite. Assuming is_prime() takes a similar time to execute than our coefficient function h_E(), we could therefore roughly halve the runtime of the prime sum function by skipping the even integers and just checking odd numbers for primality.

We can go further. Apart from 2 and 3, all primes yield a remainder of 1 or 5 when you divide them by 6 (because all primes except for 2 are 1 (modulo 2) and all primes except for 3 are 1 or 2 (modulo 3)). We can therefore skip all integers that are 0, 2, 3 or 4 modulo 6; this means we only have to check for primality on only one third of all the integers less than $M$.

Here's a second version of the prime_sum() function that does this:


Of course we could go even further with the technique by looking at remainders modulo $p$ for more primes $p$ and combining the results: for example, all primes outside of 2, 3 and 5 can only have a remainder of 7, 11, 13, 17, 19, 23 or 29 modulo 30. However, the further you go the more special cases you need to consider, and the uglier your code becomes; as you can see, just looking at cases modulo 6 requires us to write a function about three times as long as previously. This method therefore will only be able to take us so far before the code we'd need to write would become too unwieldy for practicality.

METHOD 3: PRIME SIEVING...


This second prime_sum() version is a rudimentary example of a technique called prime sieving. The idea is to use quick computations to eliminate a large percentage of integers from consideration in a way that doesn't involve direct primality testing, since this is computationally expensive. Sieving techniques are an entire field of research in their own right, so I thought I'd just give as example one of the most famous methods: the Sieve of Eratosthenes (named after the ancient Greek mathematician who is thought to first come up with the idea). This takes as input a positive bound $M$ and returns a list of all prime numbers less than $M$. The method goes as follows:
  1. Start with a list of boolean flags indexed by the numbers 2 through $M$, and set all of them to True. 
  2. Starting at the beginning of the list, let $i$ be the index of the first True entry. Set all entries at indices a multiples of $i$ to False.
  3. Repeat step 2 until the first True entry is at index $> \sqrt{M}$.
  4. Return a list of all integers $i$ such that the entry at index $i$ is True.
This is definitely a case where a (moving) picture is worth a thousand words:
A good graphic representation of the Sieve of Eratosthenes being used to generate all primes less than 121. Courtesy Wikipedia: "Sieve of Eratosthenes animation". Licensed under CC BY-SA 3.0 via Wikimedia Commons
How and why does this work? By mathematical induction, at each iteration the index of the first True entry will always be prime. However any multiple thereof is by definition composite, so we can walk along the list and flag them as not prime. Wash, rinse, repeat. We can stop at $\sqrt{M}$, since all composite numbers at most $M$ in magnitude must have at least one prime factor at most $\sqrt{M}$ in size.

Here is a third version of our prime_sum() function that utilizes the Sieve of Eratosthenes:

Let's see how the three versions stack up against each other time-wise in the Sage terminal. I've saved the three functions in a file called prime_sum_functions.py, which I then import up front (if you want to do the same yourself, you'll need to import or define appropriate is_prime() and sqrt() functions at the top of the file). I've also defined a sample toy function h_E() and bound M:

sage: from prime_sum_functions import *
sage: def h_E(n): return sin(float(n))/float(n)
sage: M = 10000
sage: prime_sum_v1(h_E,M)
0.19365326958140347
sage: prime_sum_v2(h_E,M)
0.19365326958140347
sage: prime_sum_v3(h_E,M)
0.19365326958140347
sage: %timeit prime_sum_v1(h_E,M)
1 loops, best of 3: 363 ms per loop
sage: %timeit prime_sum_v2(h_E,M)
1 loops, best of 3: 206 ms per loop
sage: %timeit prime_sum_v3(h_E,M)
10 loops, best of 3: 86.8 ms per loop

Good news! All three functions (thankfully) produce the same result. And we see version 2 is about 1.8 times faster than version 1, while version 3 is four times as fast. These ratios remained roughly the same when I timed the functions on larger bounds, which indicates that the three versions have the same or similar asymptotic scaling - this should be expected, since no matter what we do we will always have to check something at each integer up to the bound.

METHOD 4: ...AND BEYOND


It should be noted, however, that the Sieve of Eratosthenes as implemented above would be a terrible choice for my GSoC code. This is because in order to enumerate the primes up to $M$ we need to create a list in memory of size $M$. This isn't an issue when $M$ is small, but for my code I need $M \sim 10^{11}$; an array of booleans that size would take up about 12 gigabytes in memory, and any speedups we get from not having to check for primality would be completely obliterated by read/write slowdowns due to working with an array that size. In other words, while the Sieve of Eratosthenes has great time complexity, it has abysmal space complexity.

Thankfully, more memory-efficient sieving methods exist that drastically cut down the space requirements. The best of these - for example, the Sieve of Atkin - need about $\sqrt{M}$ space. For $M \sim 10^{11}$ this translates to only about 40 kilobytes; much more manageable.

Of course, there's always a downside: bleeding edge prime enumeration methods are finicky and intricate, and there are a plethora of ways to get it wrong when implementing them. At some point squeezing an extra epsilon of speedup from your code is no longer worth it in terms of the time and effort it will take to get there. For now, I've implemented a more optimized version of the second prime_sum() function in my code (where we skip over all integers that are obviously not prime), since for now that is my happy middle ground.  If I have time at the end of the project I will revisit the issue of efficient prime enumeration and try implement a more optimized sieving method, but that is a tomorrow problem.

Viewing all articles
Browse latest Browse all 414

Trending Articles