The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Math::Prime::Util - Utilities related to prime numbers, including fast sieves and factoring

VERSION

Version 0.10

SYNOPSIS

  # Normally you would just import the functions you are using.
  # Nothing is exported by default.  List the functions, or use :all.
  use Math::Prime::Util ':all';


  # Get a big array reference of many primes
  my $aref = primes( 100_000_000 );

  # All the primes between 5k and 10k inclusive
  my $aref = primes( 5_000, 10_000 );

  # If you want them in an array instead
  my @primes = @{primes( 500 )};


  # For non-bigints, is_prime and is_prob_prime will always be 0 or 2.
  # They return return 0 (composite), 2 (prime), or 1 (probably prime)
  say "$n is prime"  if is_prime($n);
  say "$n is ", (qw(composite maybe_prime? prime))[is_prob_prime($n)];

  # Strong pseudoprime test with multiple bases, using Miller-Rabin
  say "$n is a prime or 2/7/61-psp" if is_strong_pseudoprime($n, 2, 7, 61);

  # Strong Lucas-Selfridge test
  say "$n is a prime or slpsp" if is_strong_lucas_pseudoprime($n);

  # step to the next prime (returns 0 if not using bigints and we'd overflow)
  $n = next_prime($n);

  # step back (returns 0 if given input less than 2)
  $n = prev_prime($n);


  # Return Pi(n) -- the number of primes E<lt>= n.
  $primepi = prime_count( 1_000_000 );
  $primepi = prime_count( 10**14, 10**14+1000 );  # also does ranges

  # Quickly return an approximation to Pi(n)
  my $approx_number_of_primes = prime_count_approx( 10**17 );

  # Lower and upper bounds.  lower <= Pi(n) <= upper for all n
  die unless prime_count_lower($n) <= prime_count($n);
  die unless prime_count_upper($n) >= prime_count($n);


  # Return p_n, the nth prime
  say "The ten thousandth prime is ", nth_prime(10_000);

  # Return a quick approximation to the nth prime
  say "The one trillionth prime is ~ ", nth_prime_approx(10**12);

  # Lower and upper bounds.   lower <= nth_prime(n) <= upper for all n
  die unless nth_prime_lower($n) <= nth_prime($n);
  die unless nth_prime_upper($n) >= nth_prime($n);


  # Get the prime factors of a number
  @prime_factors = factor( $n );

  # Get all factors
  @divisors = all_factors( $n );

  # Euler phi (aka the totient) on a large number
  use bigint;  say euler_phi( 801294088771394680000412 );

  # Moebius function used to calculate Mertens
  $sum += moebius($_) for (1..200); say "Mertens(200) = $sum";

  # Ei, li, and Riemann R functions
  my $ei = ExponentialIntegral($x);    # $x a real: $x != 0
  my $li = LogarithmicIntegral($x);    # $x a real: $x >= 0
  my $R  = RiemannR($x)                # $x a real: $x > 0


  # Precalculate a sieve, possibly speeding up later work.
  prime_precalc( 1_000_000_000 );

  # Free any memory used by the module.
  prime_memfree;

  # Alternate way to free.  When this leaves scope, memory is freed.
  my $mf = Math::Prime::Util::MemFree->new;


  # Random primes
  my $small_prime = random_prime(1000);      # random prime <= limit
  my $rand_prime = random_prime(100, 10000); # random prime within a range
  my $rand_prime = random_ndigit_prime(6);   # random 6-digit prime
  my $rand_prime = random_nbit_prime(128);   # random 128-bit prime
  my $rand_prime = random_maurer_prime(256); # random 256-bit provable prime

DESCRIPTION

A set of utilities related to prime numbers. These include multiple sieving methods, is_prime, prime_count, nth_prime, approximations and bounds for the prime_count and nth prime, next_prime and prev_prime, factoring utilities, and more.

The default sieving and factoring are intended to be (and currently are) the fastest on CPAN, including Math::Prime::XS, Math::Prime::FastSieve, Math::Factor::XS, Math::Prime::TiedArray, Math::Big::Factors, and Math::Primality (when the GMP module is available). For numbers in the 10-20 digit range, it is often orders of magnitude faster. Typically it is faster than Math::Pari for 64-bit operations, with the exception of factoring 16+ digit semiprimes.

The main development of the module has been for working with Perl UVs, so 32-bit or 64-bit. Bignum support is still experimental. One advantage is that it requires no external software (e.g. GMP or Pari). For much faster performance for bigints, install the Math::Prime::Util::GMP module. If you're doing a lot of big number operations, look into Math::GMPz and Math::Pari as well.

The module is thread-safe and allows concurrency between Perl threads while still sharing a prime cache. It is not itself multithreaded. See the Limitations section if you are using Win32 and threads in your program.

BIGNUM SUPPORT

By default all functions support bigints. The module will not turn on bigint support for you -- you will need to use bigint, use bignum, or pass in a Math::BigInt object as your input. The functions take some care to perform all bignum operations using the same class as was passed in, allowing the module to work properly with Calc, FastCalc, GMP, Pari, etc. You should try to install Math::Prime::Util::GMP if you plan to use bigints with this module, as it will make it run much faster.

Some of the functions, notably:

  factor
  is_prime
  is_prob_prime
  is_strong_pseudoprime
  next_prime
  prev_prime
  prime_count
  nth_prime

work very fast (under 1 microsecond) on small inputs, but the wrappers for input validation and bigint support take more time than the function itself. Using the flag '-bigint', e.g.:

  use Math::Prime::Util qw(-bigint);

will turn off bigint support for those functions. Those functions will then go directly to the XS versions, which will speed up very small inputs a lot. This is useful if you're using the functions in a loop, but since the difference is less than a millisecond, it's really not important in general (also, a future implementation may find a way to speed this up without the option).

If you are using bigints, there are two performance suggestions. The first is to install Math::Prime::Util::GMP, as that will vastly increase the speed for many of the functions. This does require the GMP library be installed on your system, but this increasingly comes pre-installed or easily available using the OS vendor package installation tool. If you do not want to use that, I recommend Math::BigInt::GMP or Math::BigInt::Pari and then writing use bigint try = 'GMP,Pari'>. Large modular exponentiation is much faster using the GMP or Pari backends. This is not so important if you installed Math::Prime::Util::GMP, but it can still speed up large random Maurer primes.

Having run these functions on many versions of Perl, if you're using anything older than Perl 5.14, I would recommend you upgrade if you are using bignums a lot. There are some brittle behaviors on 5.12.4 and earlier with bignums.

FUNCTIONS

is_prime

  print "$n is prime" if is_prime($n);

Returns 2 if the number is prime, 0 if not. For numbers larger than 2^64 it will return 0 for composite and 1 for probably prime, using a strong BPSW test. Also note there are probabilistic prime testing functions available.

primes

Returns all the primes between the lower and upper limits (inclusive), with a lower limit of 2 if none is given.

An array reference is returned (with large lists this is much faster and uses less memory than returning an array directly).

  my $aref1 = primes( 1_000_000 );
  my $aref2 = primes( 1_000_000_000_000, 1_000_000_001_000 );

  my @primes = @{ primes( 500 ) };

  print "$_\n" for (@{primes( 20, 100 )});

Sieving will be done if required. The algorithm used will depend on the range and whether a sieve result already exists. Possibilities include trial division (for ranges with only one expected prime), a Sieve of Eratosthenes using wheel factorization, or a segmented sieve.

next_prime

  $n = next_prime($n);

Returns the next prime greater than the input number. If the input is not a bigint, then 0 is returned if the next prime is larger than a native integer type (the last representable primes being 4,294,967,291 in 32-bit Perl and 18,446,744,073,709,551,557 in 64-bit).

prev_prime

  $n = prev_prime($n);

Returns the prime smaller than the input number. 0 is returned if the input is 2 or lower.

prime_count

  my $primepi = prime_count( 1_000 );
  my $pirange = prime_count( 1_000, 10_000 );

Returns the Prime Count function Pi(n), also called primepi in some math packages. When given two arguments, it returns the inclusive count of primes between the ranges (e.g. (13,17) returns 2, 14,17 and 13,16 return 1, and 14,16 returns 0).

The current implementation relies on sieving to find the primes within the interval, so will take some time and memory. It uses a segmented sieve so is very memory efficient, and also allows fast results even with large base values. The complexity for prime_count(a, b) is approximately O(sqrt(a) + (b-a)), where the first term is typically negligible below ~ 10^11. Memory use is proportional only to sqrt(a), with total memory use under 1MB for any base under 10^14.

A later implementation may work on improving performance for values, both in reducing memory use (the current maximum is 140MB at 2^64) and improving speed. Possibilities include a hybrid table approach, using an explicit formula with li(x) or R(x), or one of the Meissel, Lehmer, or Lagarias-Miller-Odlyzko-Deleglise-Rivat methods. For any use with inputs over 1,000 million or so, think about whether an approximation or bounds would work, as they will be much faster.

prime_count_upper

prime_count_lower

  my $lower_limit = prime_count_lower($n);
  my $upper_limit = prime_count_upper($n);
  #   $lower_limit  <=  prime_count(n)  <=  $upper_limit

Returns an upper or lower bound on the number of primes below the input number. These are analytical routines, so will take a fixed amount of time and no memory. The actual prime_count will always be equal to or between these numbers.

A common place these would be used is sizing an array to hold the first $n primes. It may be desirable to use a bit more memory than is necessary, to avoid calling prime_count.

These routines use verified tight limits below a range at least 2^35, and use the Dusart (2010) bounds of

    x/logx * (1 + 1/logx + 2.000/log^2x) <= Pi(x)

    x/logx * (1 + 1/logx + 2.334/log^2x) >= Pi(x)

above that range. These bounds do not assume the Riemann Hypothesis.

prime_count_approx

  print "there are about ",
        prime_count_approx( 10 ** 18 ),
        " primes below one quintillion.\n";

Returns an approximation to the prime_count function, without having to generate any primes. The current implementation uses the Riemann R function which is quite accurate: an error of less than 0.0005% is typical for input values over 2^32. A slightly faster (0.1ms vs. 1ms), but much less accurate, answer can be obtained by averaging the upper and lower bounds.

nth_prime

  say "The ten thousandth prime is ", nth_prime(10_000);

Returns the prime that lies in index n in the array of prime numbers. Put another way, this returns the smallest p such that Pi(p) >= n.

This relies on generating primes, so can require a lot of time and space for large inputs. A segmented sieve is used for large inputs, so it is memory efficient. On my machine it will return the 203,280,221st prime (the largest that fits in 32-bits) in 2.5 seconds. The 10^9th prime takes 15 seconds to find, while the 10^10th prime takes nearly four minutes. As with prime_count, think carefully about whether a bound or an approximation would be acceptable.

If the bigint or bignum module is not in use, this will generate an overflow exception if the number requested would result in a prime that cannot fit in a native type. If bigints are in use, then the calculation will proceed, though it will be exceedingly slow. A later version of Math::Prime::Util::GMP may include this functionality which would help for 32-bit machines.

nth_prime_upper

nth_prime_lower

  my $lower_limit = nth_prime_lower($n);
  my $upper_limit = nth_prime_upper($n);
  #   $lower_limit  <=  nth_prime(n)  <=  $upper_limit

Returns an analytical upper or lower bound on the Nth prime. These are very fast as they do not need to sieve or search through primes or tables. An exact answer is returned for tiny values of n. The lower limit uses the Dusart 2010 bound for all n, while the upper bound uses one of the two Dusart 2010 bounds for n >= 178974, a Dusart 1999 bound for n >= 39017, and a simple bound of n * (logn + 0.6 * loglogn) for small n.

nth_prime_approx

  say "The one trillionth prime is ~ ", nth_prime_approx(10**12);

Returns an approximation to the nth_prime function, without having to generate any primes. Uses the Cipolla 1902 approximation with two polynomials, plus a correction for small values to reduce the error.

is_strong_pseudoprime

  my $maybe_prime = is_strong_pseudoprime($n, 2);
  my $probably_prime = is_strong_pseudoprime($n, 2, 3, 5, 7, 11, 13, 17);

Takes a positive number as input and one or more bases. The bases must be greater than 1. Returns 1 if the input is a prime or a strong pseudoprime to all of the bases, and 0 if not.

If 0 is returned, then the number really is a composite. If 1 is returned, then it is either a prime or a strong pseudoprime to all the given bases. Given enough distinct bases, the chances become very, very strong that the number is actually prime.

This is usually used in combination with other tests to make either stronger tests (e.g. the strong BPSW test) or deterministic results for numbers less than some verified limit (e.g. it has long been known that no more than three selected bases are required to give correct primality test results for any 32-bit number). Given the small chances of passing multiple bases, there are some math packages that just use multiple MR tests for primality testing.

Even numbers other than 2 will always return 0 (composite). While the algorithm does run with even input, most sources define it only on odd input. Returning composite for all non-2 even input makes the function match most other implementations including Math::Primality's is_strong_pseudoprime function.

miller_rabin

An alias for is_strong_pseudoprime. This name is being deprecated.

is_strong_lucas_pseudoprime

Takes a positive number as input, and returns 1 if the input is a strong Lucas pseudoprime using the Selfridge method of choosing D, P, and Q (some sources call this a strong Lucas-Selfridge pseudoprime). This is one half of the BPSW primality test (the Miller-Rabin strong pseudoprime test with base 2 being the other half).

is_prob_prime

  my $prob_prime = is_prob_prime($n);
  # Returns 0 (composite), 2 (prime), or 1 (probably prime)

Takes a positive number as input and returns back either 0 (composite), 2 (definitely prime), or 1 (probably prime).

For 64-bit input (native or bignum), this uses a tuned set of Miller-Rabin tests such that the result will be deterministic. Either 2, 3, 4, 5, or 7 Miller-Rabin tests are performed (no more than 3 for 32-bit input), and the result will then always be 0 (composite) or 2 (prime). A later implementation may change the internals, but the results will be identical.

For inputs larger than 2^64, a strong Baillie-PSW primality test is performed (aka BPSW or BSW). This is a probabilistic test, so only 0 (composite) and 1 (probably prime) are returned. There is a possibility that composites may be returned marked prime, but since the test was published in 1980, not a single BPSW pseudoprime has been found, so it is extremely likely to be prime. While we believe (Pomerance 1984) that an infinite number of counterexamples exist, there is a weak conjecture (Martin) that none exist under 10000 digits.

moebius

  say "$n is square free" if moebius($n) != 0;
  $sum += moebius($_) for (1..200); say "Mertens(200) = $sum";

Returns the Möbius function (also called the Moebius, Mobius, or MoebiusMu function) for a positive non-zero integer input. This function is 1 if n = 1, 0 if n is not square free (i.e. n has a repeated factor), and -1^t if n is a product of t distinct primes. This is an important function in prime number theory.

euler_phi

  say "The Euler totient of $n is ", euler_phi($n);

Returns the Euler totient function (also called Euler's phi or phi function) for an integer value. This is an arithmetic function that counts the number of positive integers less than or equal to n that are relatively prime to n. Given the definition used, euler_phi will return 0 for all n < 1. This follows the logic used by SAGE. Mathematic/WolframAlpha also returns 0 for input 0, but returns euler_phi(-n) for n < 0.

random_prime

  my $small_prime = random_prime(1000);      # random prime <= limit
  my $rand_prime = random_prime(100, 10000); # random prime within a range

Returns a psuedo-randomly selected prime that will be greater than or equal to the lower limit and less than or equal to the upper limit. If no lower limit is given, 2 is implied. Returns undef if no primes exist within the range. The rand function is called one or more times for selection.

The goal is to return a uniform distribution of the primes in the range, meaning for each prime in the range, the chances are equally likely that it will be seen.

The current algorithm does a random index selection for small numbers, which is deterministic. For larger numbers, this slows down, so for 32-bit ranges, the obvious Monte Carlo method is used, where random numbers in the range are selected until one is prime. For even larger ranges, a method similar to that of Fouque and Tibouchi (2011) algorithm A1 is used.

Perl's rand function is normally called, but if the sub main::rand exists, it will be used instead. When called with no arguments it should return a float value between 0 and 1-epsilon, with 31 bits of randomness. Examples:

  # Use Mersenne Twister
  use Math::Random::MT::Auto qw/rand/;

  # Use a custom random function
  sub rand { ... }

If you want cryptographically secure primes, at minimum a better source of random numbers should be used, e.g. Crypt::Random. Until this module has more testing, I would point the user to Crypt::Primes for production use.

random_ndigit_prime

  say "My 4-digit prime number is: ", random_ndigit_prime(4);

Selects a random n-digit prime, where the input is an integer number of digits between 1 and the maximum native type (10 for 32-bit, 20 for 64-bit, 10000 if bigint is active). One of the primes within that range (e.g. 1000 - 9999 for 4-digits) will be uniformly selected using the rand function as described above.

random_nbit_prime

  use bigint;  my $bigprime = random_nbit_prime(512);

Selects a random n-bit prime, where the input is an integer number of bits between 2 and the maximum representable bits (32, 64, or 100000 for native 32-bit, native 64-bit, and bigint respectively). A prime with the nth bit set will be uniformly selected, with randomness supplied via calls to the rand function as described above.

Since this uses the random_prime function, all uniformity properties of that function apply to this. The n-bit range is partitioned into nearly equal segments less than 2^31, a segment is randomly selected, then the trivial Monte Carlo algorithm is used to select a prime from within the segment. This gives a nearly uniform distribution, doesn't use excessive random source, and can be very fast. When used with bigints, having the Math::Prime::Util::GMP module installed will make it run much faster.

random_maurer_prime

  use bigint;  my $bigprime = random_maurer_prime(512);

Construct an n-bit provable prime, using the algorithm of Ueli Maurer (1995). This is the same algorithm used by Crypt::Primes.

The differences between this function and that in Crypt::Primes include (1) the current version of C::P has been in use for 9 years, while M::P::U is new and relatively untested; (2) no external libraries are needed for this module, while C::P requires Math::Pari; (3) C::P is quite fast for all sizes -- M::P::U is really fast for native bit sizes, so-so for large bit sizes when Math::Prime::Util::GMP is installed, but ridiculously slow when using native Perl bigints for large bit sizes; (4) C::P uses a modified version of final acceptance criteria (q < n**(1/3) without the rest of Lemma 2), while this module uses the original set; (5) C::P has some useful options for cryptography; (6) C::P is hardcoded to use Crypt::Random, while this function will use whatever you set rand to (this is more flexible but also prone to misuse).

Any feedback on this function would be greatly appreciated.

UTILITY FUNCTIONS

prime_precalc

  prime_precalc( 1_000_000_000 );

Let the module prepare for fast operation up to a specific number. It is not necessary to call this, but it gives you more control over when memory is allocated and gives faster results for multiple calls in some cases. In the current implementation this will calculate a sieve for all numbers up to the specified number.

prime_memfree

  prime_memfree;

Frees any extra memory the module may have allocated. Like with prime_precalc, it is not necessary to call this, but if you're done making calls, or want things cleanup up, you can use this. The object method might be a better choice for complicated uses.

Math::Prime::Util::MemFree->new

  my $mf = Math::Prime::Util::MemFree->new;
  # perform operations.  When $mf goes out of scope, memory will be recovered.

This is a more robust way of making sure any cached memory is freed, as it will be handled by the last MemFree object leaving scope. This means if your routines were inside an eval that died, things will still get cleaned up. If you call another function that uses a MemFree object, the cache will stay in place because you still have an object.

prime_get_config

  my $cached_up_to = prime_get_config->{'precalc_to'};

Returns a reference to a hash of the current settings. The hash is copy of the configuration, so changing it has no effect. The settings include:

  precalc_to      primes up to this number are calculated
  maxbits         the maximum number of bits for native operations
  xs              0 or 1, indicating the XS code is available
  gmp             0 or 1, indicating GMP code is available
  maxparam        the largest value for most functions, without bigint
  maxdigits       the max digits in a number, without bigint
  maxprime        the largest representable prime, without bigint
  maxprimeidx     the index of maxprime, without bigint

FACTORING FUNCTIONS

factor

  my @factors = factor(3_369_738_766_071_892_021);
  # returns (204518747,16476429743)

Produces the prime factors of a positive number input, in numerical order. The special cases of n = 0 and n = 1 will return n, which guarantees multiplying the factors together will always result in the input value, though those are the only cases where the returned factors are not prime.

The current algorithm for non-bigints is a sequence of small trial division, a few rounds of Pollard's Rho, SQUFOF, Hart's one line factorization, a long run of Pollard's Rho, and finally trial division if anything survives. This process is repeated for each non-prime factor. In practice, it is very rare to require more than the first Rho + SQUFOF to find a factor.

Factoring bigints works with pure Perl, and can be very handy on 32-bit machines for numbers just over the 32-bit limit, but it can be very slow for "hard" numbers. Installing the Math::Prime::Util::GMP module will speed up bigint factoring a lot, and all future effort on large number factoring will be in that module. If you do not have that module for some reason, use the GMP or Pari version of bigint if possible (e.g. use bigint try = 'GMP,Pari'>), which will run 2-3x faster (though still 100x slower than the real GMP code).

all_factors

  my @divisors = all_factors(30);   # returns (2, 3, 5, 6, 10, 15)

Produces all the divisors of a positive number input. 1 and the input number are excluded (which implies that an empty list is returned for any prime number input). The divisors are a power set of multiplications of the prime factors, returned as a uniqued sorted list.

trial_factor

  my @factors = trial_factor($n);

Produces the prime factors of a positive number input. The factors will be in numerical order. The special cases of n = 0 and n = 1 will return n, while with all other inputs the factors are guaranteed to be prime. For large inputs this will be very slow.

fermat_factor

  my @factors = fermat_factor($n);

Produces factors, not necessarily prime, of the positive number input. The particular algorithm is Knuth's algorithm C. For small inputs this will be very fast, but it slows down quite rapidly as the number of digits increases. It is very fast for inputs with a factor close to the midpoint (e.g. a semiprime p*q where p and q are the same number of digits).

holf_factor

  my @factors = holf_factor($n);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. It is possible the function will be unable to find a factor, in which case a single element, the input, is returned. This uses Hart's One Line Factorization with no premultiplier. It is an interesting alternative to Fermat's algorithm, and there are some inputs it can rapidly factor. In the long run it has the same advantages and disadvantages as Fermat's method.

squfof_factor

  my @factors = squfof_factor($n);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. It is possible the function will be unable to find a factor, in which case a single element, the input, is returned. This function typically runs very fast.

prho_factor

pbrent_factor

pminus1_factor

  my @factors = prho_factor($n);

  # Use a very small number of rounds
  my @factors = prho_factor($n, 1000);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. These attempt to find a single factor using one of the probabilistic algorigthms of Pollard Rho, Brent's modification of Pollard Rho, or Pollard's p - 1. These are more specialized algorithms usually used for pre-factoring very large inputs, or checking very large inputs for naive mistakes. If the input is prime or they run out of rounds, they will return the single input value. On some inputs they will take a very long time, while on others they succeed in a remarkably short time.

MATHEMATICAL FUNCTIONS

ExponentialIntegral

  my $Ei = ExponentialIntegral($x);

Given a non-zero floating point input x, this returns the real-valued exponential integral of x, defined as the integral of e^t/t dt from -infinity to x. Depending on the input, the integral is calculated using continued fractions (x < -1), rational Chebyshev approximation ( -1 < x < 0), a convergent series (small positive x), or an asymptotic divergent series (large positive x).

Accuracy should be at least 14 digits.

LogarithmicIntegral

  my $li = LogarithmicIntegral($x)

Given a positive floating point input, returns the floating point logarithmic integral of x, defined as the integral of dt/ln t from 0 to x. If given a negative input, the function will croak. The function returns 0 at x = 0, and -infinity at x = 1.

This is often known as li(x). A related function is the offset logarithmic integral, sometimes known as Li(x) which avoids the singularity at 1. It may be defined as Li(x) = li(x) - li(2).

This function is implemented as li(x) = Ei(ln x) after handling special values.

Accuracy should be at least 14 digits.

RiemannR

  my $r = RiemannR($x);

Given a positive non-zero floating point input, returns the floating point value of Riemann's R function. Riemann's R function gives a very close approximation to the prime counting function.

Accuracy should be at least 14 digits. The current implementation isn't correctly storing constants as big floats, so is not giving increased accuracy with big numbers like it should.

EXAMPLES

Print pseudoprimes base 17:

    perl -MMath::Prime::Util=:all -E 'my $n=$base|1; while(1) { print "$n " if is_strong_pseudoprime($n,$base) && !is_prime($n); $n+=2; } BEGIN {$|=1; $base=17}'

Print some primes above 64-bit range:

    perl -MMath::Prime::Util=:all -Mbigint -E 'my $start=100000000000000000000; say join "\n", @{primes($start,$start+1000)}'
    # Similar code using Pari:
    # perl -MMath::Pari=:int,PARI,nextprime -E 'my $start = PARI "100000000000000000000"; my $end = $start+1000; my $p=nextprime($start); while ($p <= $end) { say $p; $p = nextprime($p+1); }'

LIMITATIONS

I have not completed testing all the functions near the word size limit (e.g. 2^32 for 32-bit machines). Please report any problems you find.

Perl versions earlier than 5.8.0 have issues with 64-bit that show up in the factoring tests. The test suite will try to determine if your Perl is broken. If you use later versions of Perl, or Perl 5.6.2 32-bit, or Perl 5.6.2 64-bit and keep numbers below ~ 2^52, then everything works. The best solution is to update to a more recent Perl.

The module is thread-safe and should allow good concurrency on all platforms that support Perl threads except Win32 (Cygwin works). With Win32, either don't use threads or make sure prime_precalc is called before using primes, prime_count, or nth_prime with large inputs. This is only an issue if you use non-Cygwin Win32 and call these routines from within Perl threads.

PERFORMANCE

Counting the primes to 10^10 (10 billion), with time in seconds. Pi(10^10) = 455,052,511.

   External C programs in C / C++:

       1.9  primesieve 3.6 forced to use only a single thread
       2.2  yafu 1.31
       3.8  primegen (optimized Sieve of Atkin, conf-word 8192)
       5.6  Tomás Oliveira e Silva's unoptimized segmented sieve v2 (Sep 2010)
       6.7  Achim Flammenkamp's prime_sieve (32k segments)
       9.3  http://tverniquet.com/prime/ (mod 2310, single thread)
      11.2  Tomás Oliveira e Silva's unoptimized segmented sieve v1 (May 2003)
      17.0  Pari 2.3.5 (primepi)

   Small portable functions suitable for plugging into XS:

       5.3  My segmented SoE used in this module
      15.6  My Sieve of Eratosthenes using a mod-30 wheel
      17.2  A slightly modified verion of Terje Mathisen's mod-30 sieve
      35.5  Basic Sieve of Eratosthenes on odd numbers
      33.4  Sieve of Atkin, from Praxis (not correct)
      72.8  Sieve of Atkin, 10-minute fixup of basic algorithm
      91.6  Sieve of Atkin, Wikipedia-like

Perl modules, counting the primes to 800_000_000 (800 million), in seconds:

  Time (s)   Module                      Version  Notes
  ---------  --------------------------  -------  -----------
       0.36  Math::Prime::Util           0.09     segmented mod-30 sieve
       0.9   Math::Prime::Util           0.01     mod-30 sieve
       2.9   Math::Prime::FastSieve      0.12     decent odd-number sieve
      11.7   Math::Prime::XS             0.29     "" but needs a count API
      15.0   Bit::Vector                 7.2
      59.1   Math::Prime::Util::PP       0.09     Perl (fastest I know of)
     170.0   Faster Perl sieve (net)     2012-01  array of odds
     548.1   RosettaCode sieve (net)     2012-06  simplistic Perl
  ~11000     Math::Primality             0.04     Perl + Math::GMPz
  >20000     Math::Big                   1.12     Perl, > 26GB RAM used

is_prime: my impressions:

   Module                    Small inputs   Large inputs (10-20dig)
   -----------------------   -------------  ----------------------
   Math::Prime::Util         Very fast      Pretty fast
   Math::Prime::XS           Very fast      Very, very slow if no small factors
   Math::Pari                Slow           OK
   Math::Prime::FastSieve    Very fast      N/A (too much memory)
   Math::Primality           Very slow      Very slow

The differences are in the implementations:

Math::Prime::FastSieve only works in a sieved range, which is really fast if you can do it (M::P::U will do the same if you call prime_precalc). Larger inputs just need too much time and memory for the sieve.
Math::Primality uses GMP for all work. Under ~32-bits it uses 2 or 3 MR tests, while above 4759123141 it performs a BPSW test. This is is fantastic for bigints over 2^64, but it is significantly slower than native precision tests. With 64-bit numbers it is generally an order of magnitude or more slower than any of the others. Once bigints are being used, its performance is quite good. It is an order of magnitude or more faster than this module by default, but installing the Math::Prime::Util::GMP module makes this code run slightly faster.
Math::Pari has some very effective code, but it has some overhead to get to it from Perl. That means for small numbers it is relatively slow: an order of magnitude slower than M::P::XS and M::P::Util (though arguably this is only important for benchmarking since "slow" is ~2 microseconds). Large numbers transition over to smarter tests so don't slow down much.
Math::Prime::XS does trial divisions, which is wonderful if the input has a small factor (or is small itself). But it can take 1000x longer if given a large prime.
Math::Prime::Util looks in the sieve for a fast bit lookup if that exists (default up to 30,000 but it can be expanded, e.g. prime_precalc), uses trial division for numbers higher than this but not too large (0.1M on 64-bit machines, 100M on 32-bit machines), a deterministic set of Miller-Rabin tests for 64-bit and smaller numbers, and a BPSW test for bigints.

Factoring performance depends on the input, and the algorithm choices used are still being tuned. Math::Factor::XS is very fast when given input with only small factors, but it slows down rapidly as the smallest factor increases in size. For numbers larger than 32 bits, Math::Prime::Util can be 100x or more faster (a number with only very small factors will be nearly identical, while a semiprime with large factors will be the extreme end). Math::Pari's underlying algorithms and code are much more mature than this module, and for 20+ digit numbers will be typically be a better choice. Small numbers factor much, much faster with Math::Prime::Util. Pari passes M::P::U in speed somewhere in the 16 digit range and rapidly increases its lead. Without the Math::Prime::Util::GMP module, almost all actions on numbers greater than native scalars will be much faster in Pari.

The presentation here: http://math.boisestate.edu/~liljanab/BOISECRYPTFall09/Jacobsen.pdf has a lot of data on 64-bit and GMP factoring performance I collected in 2009. Assuming you do not know anything about the inputs, trial division and optimized Fermat or Lehmen work very well for small numbers (<= 10 digits), while native SQUFOF is typically the method of choice for 11-18 digits (I've seen claims that a lightweight QS can be faster for 15+ digits). Some form of Quadratic Sieve is usually used for inputs in the 19-100 digit range, and beyond that is the General Number Field Sieve. For serious factoring, I recommend looking at yafu, msieve, gmp-ecm, GGNFS, and Pari.

AUTHORS

Dana Jacobsen <dana@acm.org>

ACKNOWLEDGEMENTS

Eratosthenes of Cyrene provided the elegant and simple algorithm for finding the primes.

Terje Mathisen, A.R. Quesada, and B. Van Pelt all had useful ideas which I used in my wheel sieve.

Tomás Oliveira e Silva has released the source for a very fast segmented sieve. The current implementation does not use these ideas, but future versions likely will.

The SQUFOF implementation being used is my modifications to Ben Buhrow's modifications to Bob Silverman's code. I may experiment with some other implementations (Ben Buhrows and Jason Papadopoulos both have published excellent versions in the public domain).

REFERENCES

Pierre Dusart, "Estimates of Some Functions Over Primes without R.H.", preprint, 2010. http://arxiv.org/abs/1002.0442/
Gabriel Mincu, "An Asymptotic Expansion", Journal of Inequalities in Pure and Applied Mathematics, v4, n2, 2003. A very readable account of Cipolla's 1902 nth prime approximation. http://www.emis.de/journals/JIPAM/images/153_02_JIPAM/153_02.pdf
Vincent Pegoraro and Philipp Slusallek, "On the Evaluation of the Complex-Valued Exponential Integral".
William H. Press et al., "Numerical Recipes", 3rd edition.
W. J. Cody and Henry C. Thacher, Jr., "Rational Chevyshev Approximations for the Exponential Integral E_1(x)".
Ueli M. Maurer, "Fast Generation of Prime Numbers and Secure Public-Key Cryptographic Parameters", 1995. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.2151
Pierre-Alain Fouque and Mehdi Tibouchi, "Close to Uniform Prime Number Generation With Fewer Random Bits", 2011. http://eprint.iacr.org/2011/481

COPYRIGHT

Copyright 2011-2012 by Dana Jacobsen <dana@acm.org>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.