The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Math::Prime::FastSieve - Generate a list of all primes less than or equal to $n. Do it quickly.

While we're at it, supply a few additional tools that a Prime Sieve facilitates.

VERSION

Version 0.13

DESCRIPTION

This module provides an optimized implementation of the Sieve of Eratosthenes, and uses it to return a reference to an array all primes up to any integer specified, within the limitations of addressable memory.

Additionally the module provides access to other Prime-related functions that are facilitated as a by-product of having a really fast Prime Sieve.

At the time of writing, the primes function will return all primes up to and including $n faster than any other module I can find on CPAN (including Math::Prime::XS). While a segmented sieve (which this isn't) would extend the range of primes accessible, the fact that this module uses a bit-sieve means that primes over a billion are easily within reach of most modern systems.

SYNOPSIS

    # ----------    The simple (and fastest) interface:    ----------
    use Math::Prime::FastSieve qw( primes );

    # Obtain an reference to an array containing all primes less than or
    # equal to 5 Million.
    my $aref = primes( 5_000_000 );



    # ----------   The object (far more flexible) interface.   ----------

    # Create a new sieve and flag all primes less or equal to n.
    my $sieve = Math::Prime::FastSieve::Sieve->new( 5_000_000 );

    # Obtain a ref to an array containing all primes <= 5 Million.
    my $aref = $sieve->primes( 5_000_000 );

    # Obtain a ref to an array containing all primes >= 5, and <= 16.
    my $aref = $sieve->ranged_primes( 5, 16 );

    # Query the sieve: Is 17 prime? Return a true or false value.
    my $result = $sieve->isprime( 17 );

    # Get the value of the nearest prime less than or equal to 42.
    my $less_or_equal = $sieve->nearest_le( 42 );

    # Get the value of the nearest prime greater than or equal to 42.
    my $greater_or_equal = $sieve->nearest_ge( 42 );

    # Count how many primes exist within the sieve (ie, count all primes less
    # than or equal to 5 Million, assuming we instantiated the sieve with
    # Math::Prime::FastSieve::Sieve->new( 5_000_000 );.
    my $num_found = $sieve->count_sieve();

    # Count how many primes fall between 1 and 42 inclusive.
    my $quantity_le = $sieve->count_le( 42 );

    # Return the value of the 42nd prime number.
    my $fourty_second_prime = $sieve->nth_prime( 42 );

EXPORT

Exports nothing by default. If you ask nicely it will export the single subroutine, primes.

    use Math::Prime::FastSieve qw( primes );  # Import primes().
    use Math::Prime::FastSieve;               # No imports.

SUBROUTINES/METHODS

This module provides two modus operandi. The simplest also happens to be the fastest way of generating a list of all primes up to and including $n. That is via a direct call to the primes($n) function.

The more powerful way to use this module is via its object oriented interface. With that approach, the constructor new($n) creates a prime sieve flagging all primes from 2 to $n inclusive, and returns a Sieve object. That object may then be queried by way of accessor methods to get at any of the following:

  • primes() A list of all primes within the sieve.

  • ranged_primes( $lower, $upper ) A list of all primes >= $lower, and <= $upper.

  • isprime($n) A primality test for any integer within the bounds of the sieve.

  • nearest_le($n) Find the nearest prime less than or equal to $n within the bounds of the sieve.

  • nearest_ge($n) Find the nearest prime greater or equal to $n within the bounds of the sieve.

  • count_sieve() A count of all primes in the sieve.

  • count_le($n) A a count of all primes less than or equal to $n within the bounds of the sieve.

  • nth_prime($n) Return the $nth prime, within the bounds of the sieve.

Because the sieve is created when the object is instantiated, many of the tests you might call on the sieve object will execute quite quickly. Each of the subs and methods documented below will also attempt to describe the computational and memory complexity of the function.

The Standard Interface

Objective

Provide a fast and simple means of generating a big list of prime numbers.

primes()

This is a regular function (ie, not an object method).

Pass a positive integer as its parameter. Returns a reference to an array of all prime numbers 2 .. $n.

This function is implemented in C++ using Inline::CPP, which in turn binds it to Perl via XS. The method used is the Sieve of Eratosthenes. The sieve is one of the fastest known methods of generating a list of primes up to $n.

This implementation makes several optimizations beyond the cannonical Sieve, including:

  • Uses a bit vector as a sieve: A memory optimization that allows the sieve to scale well without consuming so much memory that your system grinds to a stand-still for large $ns (where "large" depends on your system, of course).

  • The sieve's bits only represent odd numbers (memory optimization).

  • Only sifts and tests odd integers. (2 is handled as a special case.)

  • Returns an array-ref rather than a potentially huge (slow) list.

  • Other micro-optimizations to squeeze a few cycles out here and there.

The result is a prime number generator that is...fast. It operates in O( n log log n ) time, with a O(n) memory growth rate.

The Object Oriented Interface

Objective

Where the standard interface wraps the sieve constructor and the sieve accessor in a single function called primes(), the object oriented interface places the sieve constructor in Math::Prime::FastSieve::Sieve-new()>, and leaves the interpretation of the sieve's results to separate accessor methods. The advantage is that if you don't really need "a big list", but instead need other functionality that may be computationally (and memory) cheaper to obtain from the sieve, you can get at those results without constructing a huge list of primes.

The standard interface is slightly faster if you just want a big list. But the object interface is still very fast, and provides greater flexibility, including the ability to use ranged_primes() to process primes in smaller, more memory efficient chunks.

new()

Class method of Math::Prime::FastSieve::Sieve Requires a single integer parameter that represents the largest value to be held in the sieve. For example:

    my $sieve = Math::Prime::FastSieve::Sieve->new( 1_000_000_000 );

This will create a Sieve object that flags all integers from 2 to 1 billion that are prime.

Calling new() is an O(n log log n) operation. The memory growth is at a rate that is 1/8th the rate of growth of $n.

primes()

This works just like the standard primes() function, except that it is a member function of your Sieve object, and also (behind the scenes) it doesn't need to create a sieve of prime flags since new() already did that for you. You must call primes() with an integer parameter. The integer must be less than or equal to the integer value previously passed to the new() constructor. primes() returns a reference to an array of all prime numbers from 2 to the integer passed to it.

Passing an out-of-bounds integer will result in a return value of an array ref pointing to an empty array.

    my $primes_ref = $sieve->primes( 1_000_000_000 );
    my $primes_ref = $sieve->primes( 50 );

The primes() method is an O(n) operation for both time and memory.

ranged_primes()

This behaves exactly like the primes() method, except that you specify a lower and upper limit within the bounds of the sieve. The return value is a reference to an array holding all prime numbers greater than or equal to the lower limit, and less than or equal to the upper limit.

The purpose of this method is to allow you to create a sieve ( with new() ), and then get results in a segmented form. The reasons this may be desirable are two-fold: First, you may only need a subset. Second, this gives you finer control over how much memory is gobbled up by the list returned. For a huge sieve the sieve's memory footprint is much smaller than the list of integers that are flagged as prime. By dealing with that list in chunks you can have a sieve of billions of prime flags, but never hold that big of a list of integers in memory all at once.

    my $primes_ref = $sieve->ranged_primes( 5, 16 );
    # $primes_ref now holds [ 5, 7, 11, 13 ].

The time complexity of this method is O(n) where 'n' is the upper limit minus the lower limit. So a range of 5_000_000 .. 5_000_010 consumes as much time as 100 .. 110.

isprime()

Pass a parameter consisiting of a single integer within the range of the Sieve object. Returns true if the integer is prime, false otherwise. If the integer is out of the Sieve object's bounds, the result will be false.

    if( $sieve->isprime(42) ) {
        say "42 is prime.";
    } else {
        say "42 isn't prime.";
    }

This is an O(1) operation.

nearest_le()

The nearest_le() method returns the closest prime that is less than or equal to its integer parameter. Passing an out of bounds parameter will return a 0.

    my $nearest_less_or_equal = $sieve->nearest_le( 42 );

Since the nearest prime is never very far away, this is an O( n / ( log n ) ) operation.

nearest_ge()

Like the nearest_le() method, but this method returns the prime that is greater than or equal to the input parameter. If the input param. is out of range, or if the next prime is out of range, a 0 is returned.

    my $nearest_greater_or_equal = $sieve->nearest_ge( 42 );

This is also an O( n / ( log n ) ) operation.

By adding one to the return value and passing that new value as a parameter to the nearest_ge() method again and again in a loop it is easy to iterate through a list of primes without generating them all at once. Of course it's not going to be as fast as getting the big list all at once, but you can't have everything in life, now can you?

count_sieve()

Takes no input parameter. Counts all of the primes in the sieve and returns the count. The first time this is called on a Sieve object the count takes O(n) time. Subsequent calls benefit from the first run being cached, and thus become O(1) time.

count_le()

Pass an integer within the range of the sieve as a parameter. Return value is a count of how many primes are less than or equal to that integer. If the value passed as a parameter is the same as the size of the sieve, the results are cached for future calls to count_sieve().

This is an O(n) operation.

nth_prime()

This method returns the n-th prime, where $n is the cardinal index in the sequence of primes. For example:

    say $sieve->nth_prime(1) # prints 2: the first prime is 2.
    say $sieve->nth_prime(3) # prints 5: the third prime is 5.

If there is no nth prime in the bounds of the sieve 0 is returned.

Implementation Notes

The sieve is implemented as a bit sieve using a C++ vector<bool>. All odd integers from 3 to $n are represented based on their index within the sieve. The only even prime, 2, is handled as a special case. A bit sieve is highly efficient from a memory standpoint because obviously it only consumes one byte per eight integers. This sieve is further optimized by reperesenting only odd integers in the sieve. So a sieve from 0 .. 1_000_000_000 only needs 500_000_000 bits, or 59.6 MB.

So, while a bit sieve was used for memory efficiency, just about every other optimization favored reducing time complexity.

Furthermore, all functions and methods are implemented in C++ by way of Inline::CPP. In fact, the sieve itself is never exposed to Perl (this decision is both a time and memory optimization).

A side effect of using a bit sieve is that the sieve itself actually requires far less memory than the integer list of primes sifted from it. That means that the memory bottleneck with the primes() function, as well as with the primes() object method is not, in fact, the sieve, but the list passed back to Perl via an array-ref.

If you find that your system memory isn't allowing you to call primes with as big an integer as you wish, use the object oriented interface. new will generate a sieve up to the largest integer your system. Then rather than calling the primes method, use ranged_primes, or nearest_ge to iterate over the list. Of course this is slightly slower, but it beats running out of memory doesn't it?

NOTE: As of version 0.10, support for long ints is built into Math::Prime::FastSieve. If your version of Perl was built with support for long ints, you can create sieve sizes well over the 2.14 billion limit that standard ints impose.

DEPENDENCIES

You will need: Inline, Inline::C (which is packaged with Inline), Parse::RecDescent, Inline::MakeMaker, ExtUtils::MakeMaker (core), and Test::More (core).

CONFIGURATION AND ENVIRONMENT

In addition to the module dependencies listed in the previous section, your system must have a C++ compiler that is compatible with the compiler used to build Perl. You may need to install one. Additionally, if your Perl was built with long integer support, this module will take advantage of that support.

While it may sound like there are a lot of dependencies and configuration requirements, in practice, most users should be able to install this module with the familiar incantations:

    cpan Math::Prime::FastSieve

Or download and unpack the tarball, and...

    perl Makefile.PL
    make
    make test
    make install

Using the cpan shell, cpanm, or cpanplus will have the added benefit of pulling in and building the dependencies automatically.

DIAGNOSTICS

If you encounter an installation failure, please email me a transcript of the session.

INCOMPATIBILITIES

This module can only be built using systems that have a C++ compiler, and that are able to cleanly install Inline::CPP. That is going to cover the majority of potential users. If you're one of the unlucky few, please send me an email. Since I also maintain Inline::CPP, your feeback may help me to sort out the few remaining installation problems with that great module.

AUTHOR

David Oswald, <davido at cpan.org>

BUGS AND LIMITATIONS

Since the Sieve of Eratosthenes requires one 'bit' per integer in the sieve, the memory requirements can be high for large tests. Additionally, as the result set is generated, because Perl's scalars take up a lot more space than one bit, it's even more likely the entire result set won't fit into memory.

The OO interface does provide functions that don't build a big huge memory-hungry list.

Please report any bugs or feature requests to bug-math-prime-FastSieve at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Math-Prime-FastSieve. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Math::Prime::FastSieve

You can also look for information at:

ACKNOWLEDGEMENTS

This module is made possible by Inline::CPP, which wouldn't be possible without the hard work of the folks involved in the Inline and Inline::C project. There are many individuals who have contributed and continue to contribute to the Inline project. I won't name them all here, but they do deserve thanks and credit.

Dana Jacobsen provided several optimizations that improved even further on the speed and memory performance of this module. Dana's contributions include reducing the memory footprint of the bit sieve in half, and trimming cycles by cutting in half the number of iterations of an inner loop in the sieve generator. This module started fast and got even faster (and more memory efficient) with Dana's contributions.

LICENSE AND COPYRIGHT

Copyright 2011 David Oswald.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.