The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Math::Prime::Util - Utilities related to prime numbers, including fast sieves and factoring

VERSION

Version 0.03

SYNOPSIS

  # Normally you would just import the functions you are using.
  # Nothing is exported by default.
  use Math::Prime::Util ':all';


  # Get a big array reference of many primes
  my $aref = primes( 100_000_000 );

  # All the primes between 5k and 10k inclusive
  my $aref = primes( 5_000, 10_000 );

  # If you want them in an array instead
  my @primes = @{primes( 500 )};


  # is_prime returns 0 for composite, 1 for prime
  say "$n is prime"  if is_prime($n);


  # step to the next prime
  $n = next_prime($n);

  # step back (returns 0 if given input less than 2)
  $n = prev_prime($n);


  # Return Pi(n) -- the number of primes E<lt>= n.
  my $number_of_primes = prime_count( 1_000_000 );

  # Quickly return an approximation to Pi(n)
  my $approx_number_of_primes = prime_count_approx( 10**17 );

  # Lower and upper bounds.  lower <= Pi(n) <= upper for all n
  die unless prime_count_lower($n) <= prime_count($n);
  die unless prime_count_upper($n) >= prime_count($n);


  # Return p_n, the nth prime
  say "The ten thousandth prime is ", nth_prime(10_000);

  # Return a quick approximation to the nth prime
  say "The one trillionth prime is ~ ", nth_prime_approx(10**12);

  # Lower and upper bounds.   lower <= nth_prime(n) <= upper for all n
  die unless nth_prime_lower($n) <= nth_prime($n);
  die unless nth_prime_upper($n) >= nth_prime($n);


  # Get the prime factors of a number
  @prime_factors = factor( $n );


  # Precalculate a sieve, possibly speeding up later work.
  prime_precalc( 1_000_000_000 );

  # Free any memory used by the module.
  prime_free;

  # Alternate way to free.  When this leaves scope, memory is freed.
  my $mf = Math::Prime::Util::MemFree->new;

DESCRIPTION

A set of utilities related to prime numbers. These include multiple sieving methods, is_prime, prime_count, nth_prime, approximations and bounds for the prime_count and nth prime, next_prime and prev_prime, factoring utilities, and more.

All routines currently work in native integers (32-bit or 64-bit). Bignum support may be added later. If you need bignum support for these types of functions inside Perl now, I recommend Math::Pari.

The default sieving and factoring are intended to be (and currently are) the fastest on CPAN, including Math::Prime::XS, Math::Prime::FastSieve, and Math::Factor::XS. Math::Pari is slower in some things, faster in others.

FUNCTIONS

is_prime

  print "$n is prime" if is_prime($n);

Returns 2 if the number is prime, 0 if not. Also note there are probabilistic prime testing functions available.

primes

Returns all the primes between the lower and upper limits (inclusive), with a lower limit of 2 if none is given.

An array reference is returned (with large lists this is much faster and uses less memory than returning an array directly).

  my $aref1 = primes( 1_000_000 );
  my $aref2 = primes( 1_000_000_000_000, 1_000_000_001_000 );

  my @primes = @{ primes( 500 ) };

  print "$_\n" for (@{primes( 20, 100 )});

Sieving will be done if required. The algorithm used will depend on the range and whether a sieve result already exists. Possibilities include trial division (for ranges with only one expected prime), a Sieve of Eratosthenes using wheel factorization, or a segmented sieve.

next_prime

  $n = next_prime($n);

Returns the next prime greater than the input number. 0 is returned if the next prime is larger than a native integer type (the last primes being 4,294,967,291 in 32-bit Perl and 18,446,744,073,709,551,557 in 64-bit).

prev_prime

  $n = prev_prime($n);

Returns the prime smaller than the input number. 0 is returned if the input is 2 or lower.

prime_count

  my $number_of_primes = prime_count( 1_000 );

Returns the Prime Count function Pi(n). The current implementation relies on sieving to find the primes within the interval, so will take some time and memory. It uses a segmented sieve if the primary sieve is not large enough, so is very memory efficient. However, time growth is slightly more than linear with n, so it can take a very long time. A hybrid table approach is the usual means taken to speed this up, which a later version may do. For very large inputs the methods of Meissel, Lehmer, or Lagarias-Miller-Odlyzko-Deleglise-Rivat may be appropriate.

prime_count_upper

prime_count_lower

  my $lower_limit = prime_count_lower($n);
  die unless prime_count($n) >= $lower_limit;

  my $upper_limit = prime_count_upper($n);
  die unless prime_count($n) <= $upper_limit;

Returns an upper or lower bound on the number of primes below the input number. These are analytical routines, so will take a fixed amount of time and no memory. The actual prime_count will always be on or between these numbers.

A common place these would be used is sizing an array to hold the first $n primes. It may be desirable to use a bit more memory than is necessary, to avoid calling prime_count.

These routines use hand-verified tight limits below a range at least 2^32, and fall back to the proven Dusart bounds of

    x/logx * (1 + 1/logx + 1.80/log^2x) <= Pi(x)

    x/logx * (1 + 1/logx + 2.51/log^2x) >= Pi(x)

above that range.

prime_count_approx

  print "there are about ",
        prime_count_approx( 10 ** 18 ),
        " primes below one quintillion.\n";

Returns an approximation to the prime_count function, without having to generate any primes. The results are very close for small numbers, but less so with large ranges. The current implementation is 0.00033% too small for the example, but takes under a microsecond and no memory to get the result.

nth_prime

  say "The ten thousandth prime is ", nth_prime(10_000);

Returns the prime that lies in index n in the array of prime numbers. Put another way, this returns the smallest p such that Pi(p) >= n.

This relies on generating primes, so can require a lot of time and space for large inputs. A segmented sieve is used for large inputs, so it is memory efficient. On my machine it will return the 203,280,221st prime (the largest that fits in 32-bits) in 2.5 seconds. The 10^9th prime takes 15 seconds to find, while the 10^10th prime takes nearly four minutes.

nth_prime_upper

nth_prime_lower

  my $lower_limit = nth_prime_lower($n);
  die unless nth_prime($n) >= $lower_limit;

  my $upper_limit = nth_prime_upper($n);
  die unless nth_prime($n) <= $upper_limit;

Returns an analytical upper or lower bound on the Nth prime. This will be very fast. The lower limit uses the Dusart 1999 bounds for all n, while the upper bound uses one of the two Dusart 1999 bounds for n >= 27076, the Robin 1983 bound for n >= 7022, and the simple bound of n * (logn + loglogn) for n < 7022.

nth_prime_approx

  say "The one trillionth prime is ~ ", nth_prime_approx(10**12);

Returns an approximation to the nth_prime function, without having to generate any primes. Uses the Cipolla 1902 approximation with two polynomials, plus a correction term for small values to reduce the error.

miller_rabin

  my $maybe_prime = miller_rabin($n, 2);
  my $probably_prime = miller_rabin($n, 2, 3, 5, 7, 11, 13, 17);

Takes a positive number as input and one or more bases. The bases must be between 2 and n - 2. Returns 2 is n is definitely prime, 1 if n is probably prime, and 0 if n is definitely composite. Since this is just the Miller-Rabin test, a value of 2 is only returned for inputs of 2 and 3, which are shortcut. If 0 is returned, then the number really is a composite. If 1 is returned, there is a good chance the number is prime (depending on the input and the bases), but we cannot be sure.

This is usually used in combination with other tests to make either stronger tests (e.g. the strong BPSW test) or deterministic results for numbers less than some verified limit (such as the is_prob_prime function in this module).

is_prob_prime

  my $prob_prime = is_prob_prime($n);
  # Returns 0 (composite), 2 (prime), or 1 (probably prime)

Takes a positive number as input and returns back either 0 (composite), 2 (definitely prime), or 1 (probably prime).

This is done with a tuned set of Miller-Rabin tests such that the result will be deterministic for 64-bit input. Either 2, 3, 4, 5, or 7 Miller-Rabin tests are performed (no more than 3 for 32-bit input), and the result will then always be 0 (composite) or 2 (prime). A later implementation may switch to a BPSW test, depending on speed.

UTILITY FUNCTIONS

prime_precalc

  prime_precalc( 1_000_000_000 );

Let the module prepare for fast operation up to a specific number. It is not necessary to call this, but it gives you more control over when memory is allocated and gives faster results for multiple calls in some cases. In the current implementation this will calculate a sieve for all numbers up to the specified number.

prime_free

  prime_free;

Frees any extra memory the module may have allocated. Like with prime_precalc, it is not necessary to call this, but if you're done making calls, or want things cleanup up, you can use this. The object method might be a better choice for complicated uses.

Math::Prime::Util::MemFree->new

  my $mf = Math::Prime::Util::MemFree->new;
  # perform operations.  When $mf goes out of scope, memory will be recovered.

This is a more robust way of making sure any cached memory is freed, as it will be handled by the last MemFree object leaving scope. This means if your routines were inside an eval that died, things will still get cleaned up. If you call another function that uses a MemFree object, the cache will stay in place because you still have an object.

FACTORING FUNCTIONS

factor

  my @factors = factor(3_369_738_766_071_892_021);

Produces the prime factors of a positive number input. They may not be in numerical order. The special cases of n = 0 and n = 1 will return n, which guarantees multiplying the factors together will always result in the input value, though those are the only cases where the returned factors are not prime.

The current algorithm is to use trial division for small numbers, while large numbers go through a sequence of small trials, SQUFOF, and Pollard's Rho. This process is repeated for each non-prime factor.

trial_factor

  my @factors = trial_factor($n);

Produces the prime factors of a positive number input. The factors will be in numerical order. The special cases of n = 0 and n = 1 will return n, while with all other inputs the factors are guaranteed to be prime. For large inputs this will be very slow.

fermat_factor

  my @factors = fermat_factor($n);

Produces factors, not necessarily prime, of the positive number input. The particular algorithm is Knuth's algorithm C. For small inputs this will be very fast, but it slows down quite rapidly as the number of digits increases. It is very fast for inputs with a factor close to the midpoint (e.g. a semiprime p*q where p and q are the same number of digits).

holf_factor

  my @factors = holf_factor($n);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. It is possible the function will be unable to find a factor, in which case a single element, the input, is returned. This uses Hart's One Line Factorization with no premultiplier. It is an interesting alternative to Fermat's algorithm, and there are some inputs it can rapidly factor. In the long run it has the same advantages and disadvantages as Fermat's method.

squfof_factor

  my @factors = squfof_factor($n);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. It is possible the function will be unable to find a factor, in which case a single element, the input, is returned. This function typically runs very fast.

prho_factor

pbrent_factor

pminus1_factor

  my @factors = prho_factor($n);

  # Use a very small number of rounds
  my @factors = prho_factor($n, 1000);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. These attempt to find a single factor using one of the probabilistic algorigthms of Pollard Rho, Brent's modification of Pollard Rho, or Pollard's p - 1. These are more specialized algorithms usually used for pre-factoring very large inputs, or checking very large inputs for naive mistakes. If the input is prime or they run out of rounds, they will return the single input value. On some inputs they will take a very long time, while on others they succeed in a remarkably short time.

LIMITATIONS

I have not completed testing all the functions near the word size limit (e.g. 2^32 for 32-bit machines). Please report any problems you find.

The extra factoring algorithms are mildly interesting but really limited by not being big number aware.

Perl versions earlier than 5.8.0 have issues with 64-bit. The test suite will try to determine if your Perl is broken. This will show up in factoring tests. Perl 5.6.2 32-bit works fine, as do later versions with 32-bit and 64-bit.

PERFORMANCE

Counting the primes to 10^10 (10 billion), with time in seconds. Pi(10^10) = 455,052,511.

       1.9  primesieve 3.6 forced to use only a single thread
       5.6  Tomás Oliveira e Silva's segmented sieve v2 (Sep 2010)
       6.6  primegen (optimized Sieve of Atkin)
      11.2  Tomás Oliveira e Silva's segmented sieve v1 (May 2003)
      17.0  Pari 2.3.5 (primepi)

       5.9  My segmented SoE
      15.6  My Sieve of Eratosthenes using a mod-30 wheel
      17.2  A slightly modified verion of Terje Mathisen's mod-30 sieve
      35.5  Basic Sieve of Eratosthenes on odd numbers
      33.4  Sieve of Atkin, from Praxis (not correct)
      72.8  Sieve of Atkin, 10-minute fixup of basic algorithm
      91.6  Sieve of Atkin, Wikipedia-like

Perl modules, counting the primes to 800_000_000 (800 million), in seconds:

  Time (s)  Module                      Version
  --------  --------------------------  ------
       0.4  Math::Prime::Util           0.02
       0.9  Math::Prime::Util           0.01
       2.9  Math::Prime::FastSieve      0.12
      11.7  Math::Prime::XS             0.29
      15.0  Bit::Vector                 7.2
   [hours]  Math::Primality             0.04

is_prime is slightly faster than M::P::XS for small inputs, but is much faster with larger ones (5x faster for 8-digit, over 50x for 14-digit). Math::Pari is relatively slow for small inputs, but becomes faster in the 13-digit range.

Factoring performance depends on the input, and the algorithm choices used are still being tuned. Compared to Math::Factor::XS, it is a tiny bit faster for most input under 10M or so, and rapidly gets faster. For numbers larger than 32 bits it's 10-100x faster (depending on the number -- a power of two will be identical, while a semiprime with large factors will be on the extreme end). Pari's underlying algorithms and code are very sophisticated, and will always be more so than this module, and of course supports bignums which is a huge advantage. Small numbers factor much, much faster with Math::Prime::Util. Pari passes M::P::U in speed somewhere in the 16 digit range and rapidly increases its lead.

The presentation here: http://math.boisestate.edu/~liljanab/BOISECRYPTFall09/Jacobsen.pdf has a lot of data on 64-bit and GMP factoring performance I collected in 2009. Assuming you do not know anything about the inputs, trial division and optimized Fermat work very well for small numbers (<= 10 digits), while native SQUFOF is typically the method of choice for 11-18 digits (I've seen claims that a lightweight QS can be faster for 15+ digits). Some form of Quadratic Sieve is usually used for inputs in the 19-100 digit range, and beyond that is the Generalized Number Field Sieve. For serious factoring, I recommend looking info yafu, msieve, Pari, and GGNFS.

AUTHORS

Dana Jacobsen <dana@acm.org>

ACKNOWLEDGEMENTS

Eratosthenes of Cyrene provided the elegant and simple algorithm for finding the primes.

Terje Mathisen, A.R. Quesada, and B. Van Pelt all had useful ideas which I used in my wheel sieve.

Tomás Oliveira e Silva has released the source for a very fast segmented sieve. The current implementation does not use these ideas, but future versions likely will.

The SQUFOF implementation being used is my modifications to Ben Buhrow's modifications to Bob Silverman's code. I may experiment with some other implementations (Ben Buhrows and Jason Papadopoulos both have published excellent versions in the public domain).

COPYRIGHT

Copyright 2011-2012 by Dana Jacobsen <dana@acm.org>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.