The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Math::Prime::Util - Utilities related to prime numbers, including fast sieves and factoring

VERSION

Version 0.09

SYNOPSIS

  # Normally you would just import the functions you are using.
  # Nothing is exported by default.
  use Math::Prime::Util ':all';


  # Get a big array reference of many primes
  my $aref = primes( 100_000_000 );

  # All the primes between 5k and 10k inclusive
  my $aref = primes( 5_000, 10_000 );

  # If you want them in an array instead
  my @primes = @{primes( 500 )};


  # is_prime returns 0 for composite, 2 for prime
  say "$n is prime"  if is_prime($n);

  # is_prob_prime returns 0 for composite, 2 for prime, and 1 for maybe prime
  say "$n is ", qw(composite maybe_prime? prime)[is_prob_prime($n)];


  # step to the next prime (returns 0 if the next one is more than ~0)
  $n = next_prime($n);

  # step back (returns 0 if given input less than 2)
  $n = prev_prime($n);


  # Return Pi(n) -- the number of primes E<lt>= n.
  $primepi = prime_count( 1_000_000 );
  $primepi = prime_count( 10**14, 10**14+1000 );  # also does ranges

  # Quickly return an approximation to Pi(n)
  my $approx_number_of_primes = prime_count_approx( 10**17 );

  # Lower and upper bounds.  lower <= Pi(n) <= upper for all n
  die unless prime_count_lower($n) <= prime_count($n);
  die unless prime_count_upper($n) >= prime_count($n);


  # Return p_n, the nth prime
  say "The ten thousandth prime is ", nth_prime(10_000);

  # Return a quick approximation to the nth prime
  say "The one trillionth prime is ~ ", nth_prime_approx(10**12);

  # Lower and upper bounds.   lower <= nth_prime(n) <= upper for all n
  die unless nth_prime_lower($n) <= nth_prime($n);
  die unless nth_prime_upper($n) >= nth_prime($n);


  # Get the prime factors of a number
  @prime_factors = factor( $n );


  # Precalculate a sieve, possibly speeding up later work.
  prime_precalc( 1_000_000_000 );

  # Free any memory used by the module.
  prime_memfree;

  # Alternate way to free.  When this leaves scope, memory is freed.
  my $mf = Math::Prime::Util::MemFree->new;


  # Random primes
  my $small_prime = random_prime(1000);      # random prime <= limit
  my $rand_prime = random_prime(100, 10000); # random prime within a range
  my $rand_prime = random_ndigit_prime(6);   # random 6-digit prime

DESCRIPTION

A set of utilities related to prime numbers. These include multiple sieving methods, is_prime, prime_count, nth_prime, approximations and bounds for the prime_count and nth prime, next_prime and prev_prime, factoring utilities, and more.

All routines currently work in native integers (32-bit or 64-bit). Bignum support may be added later. If you need bignum support for these types of functions inside Perl now, I recommend Math::Pari.

The default sieving and factoring are intended to be (and currently are) the fastest on CPAN, including Math::Prime::XS, Math::Prime::FastSieve, and Math::Factor::XS. It seems to be faster than Math::Pari for everything except factoring certain 16-20 digit numbers.

The module is thread-safe and allows concurrency between Perl threads while still sharing a prime cache. It is not itself multithreaded. The one caveat is on Win32 where you must use precalc if the function will use primes (primes, prime_count greater than 900M, nth_prime greater than 45M).

FUNCTIONS

is_prime

  print "$n is prime" if is_prime($n);

Returns 2 if the number is prime, 0 if not. Also note there are probabilistic prime testing functions available.

primes

Returns all the primes between the lower and upper limits (inclusive), with a lower limit of 2 if none is given.

An array reference is returned (with large lists this is much faster and uses less memory than returning an array directly).

  my $aref1 = primes( 1_000_000 );
  my $aref2 = primes( 1_000_000_000_000, 1_000_000_001_000 );

  my @primes = @{ primes( 500 ) };

  print "$_\n" for (@{primes( 20, 100 )});

Sieving will be done if required. The algorithm used will depend on the range and whether a sieve result already exists. Possibilities include trial division (for ranges with only one expected prime), a Sieve of Eratosthenes using wheel factorization, or a segmented sieve.

next_prime

  $n = next_prime($n);

Returns the next prime greater than the input number. 0 is returned if the next prime is larger than a native integer type (the last representable primes being 4,294,967,291 in 32-bit Perl and 18,446,744,073,709,551,557 in 64-bit).

prev_prime

  $n = prev_prime($n);

Returns the prime smaller than the input number. 0 is returned if the input is 2 or lower.

prime_count

  my $primepi = prime_count( 1_000 );
  my $pirange = prime_count( 1_000, 10_000 );

Returns the Prime Count function Pi(n), also called primepi in some math packages. When given two arguments, it returns the inclusive count of primes between the ranges (e.g. (13,17) returns 2, 14,17 and 13,16 return 1, and 14,16 returns 0).

The current implementation relies on sieving to find the primes within the interval, so will take some time and memory. It uses a segmented sieve so is very memory efficient, and also allows fast results even with large base values. The complexity for prime_count(a, b) is approximately O(sqrt(a) + (b-a)), where the first term is typically negligible below ~ 10^11. Memory use is proportional only to sqrt(a), with total memory use under 1MB for any base under 10^14.

A later implementation may work on improving performance for values, both in reducing memory use (the current maximum is 140MB at 2^64) and improving speed. Possibilities include a hybrid table approach, using an explicit formula with li(x) or R(x), or one of the Meissel, Lehmer, or Lagarias-Miller-Odlyzko-Deleglise-Rivat methods.

prime_count_upper

prime_count_lower

  my $lower_limit = prime_count_lower($n);
  die unless prime_count($n) >= $lower_limit;

  my $upper_limit = prime_count_upper($n);
  die unless prime_count($n) <= $upper_limit;

Returns an upper or lower bound on the number of primes below the input number. These are analytical routines, so will take a fixed amount of time and no memory. The actual prime_count will always be on or between these numbers.

A common place these would be used is sizing an array to hold the first $n primes. It may be desirable to use a bit more memory than is necessary, to avoid calling prime_count.

These routines use hand-verified tight limits below a range at least 2^35, and fall back to the Dusart bounds of

    x/logx * (1 + 1/logx + 1.80/log^2x) <= Pi(x)

    x/logx * (1 + 1/logx + 2.51/log^2x) >= Pi(x)

above that range.

prime_count_approx

  print "there are about ",
        prime_count_approx( 10 ** 18 ),
        " primes below one quintillion.\n";

Returns an approximation to the prime_count function, without having to generate any primes. The current implementation uses the Riemann R function which is quite accurate: an error of less than 0.0005% is typical for input values over 2^32. A slightly faster (0.1ms vs. 1ms), but much less accurate, answer can be obtained by averaging the upper and lower bounds.

nth_prime

  say "The ten thousandth prime is ", nth_prime(10_000);

Returns the prime that lies in index n in the array of prime numbers. Put another way, this returns the smallest p such that Pi(p) >= n.

This relies on generating primes, so can require a lot of time and space for large inputs. A segmented sieve is used for large inputs, so it is memory efficient. On my machine it will return the 203,280,221st prime (the largest that fits in 32-bits) in 2.5 seconds. The 10^9th prime takes 15 seconds to find, while the 10^10th prime takes nearly four minutes.

nth_prime_upper

nth_prime_lower

  my $lower_limit = nth_prime_lower($n);
  die unless nth_prime($n) >= $lower_limit;

  my $upper_limit = nth_prime_upper($n);
  die unless nth_prime($n) <= $upper_limit;

Returns an analytical upper or lower bound on the Nth prime. This will be very fast. The lower limit uses the Dusart 1999 bounds for all n, while the upper bound uses one of the two Dusart 1999 bounds for n >= 27076, the Robin 1983 bound for n >= 7022, and the simple bound of n * (logn + loglogn) for n < 7022.

nth_prime_approx

  say "The one trillionth prime is ~ ", nth_prime_approx(10**12);

Returns an approximation to the nth_prime function, without having to generate any primes. Uses the Cipolla 1902 approximation with two polynomials, plus a correction term for small values to reduce the error.

miller_rabin

  my $maybe_prime = miller_rabin($n, 2);
  my $probably_prime = miller_rabin($n, 2, 3, 5, 7, 11, 13, 17);

Takes a positive number as input and one or more bases. The bases must be between 2 and n - 2. Returns 2 is n is definitely prime, 1 if n is probably prime, and 0 if n is definitely composite. Since this is just the Miller-Rabin test, a value of 2 is only returned for inputs of 2 and 3, which are shortcut. If 0 is returned, then the number really is a composite. If 1 is returned, there is a good chance the number is prime (depending on the input and the bases), but we cannot be sure.

This is usually used in combination with other tests to make either stronger tests (e.g. the strong BPSW test) or deterministic results for numbers less than some verified limit (such as the is_prob_prime function in this module).

is_prob_prime

  my $prob_prime = is_prob_prime($n);
  # Returns 0 (composite), 2 (prime), or 1 (probably prime)

Takes a positive number as input and returns back either 0 (composite), 2 (definitely prime), or 1 (probably prime).

This is done with a tuned set of Miller-Rabin tests such that the result will be deterministic for 64-bit input. Either 2, 3, 4, 5, or 7 Miller-Rabin tests are performed (no more than 3 for 32-bit input), and the result will then always be 0 (composite) or 2 (prime). A later implementation may switch to a BPSW test, depending on speed.

random_prime

  my $small_prime = random_prime(1000);      # random prime <= limit
  my $rand_prime = random_prime(100, 10000); # random prime within a range

Returns a psuedo-randomly selected prime that will be greater than or equal to the lower limit and less than or equal to the upper limit. If no lower limit is given, 2 is implied. Returns undef if no primes exist within the range. The rand function is called one or more times for selection.

This will return a uniform distribution of the primes in the range, meaning for each prime in the range, the chances are equally likely that it will be seen.

The current algorithm does a random index selection for small numbers, which is deterministic. For larger numbers, this can be very slow, so the obvious Monte Carlo method is used, where random numbers in the range are selected until one is prime. That also gets slow as the number of digits increases, but not something that impacts us in 64-bit.

If you want cryptographically secure primes, I suggest looking at Crypt::Primes or something similar. The current Math::Prime::Util module does not use strong randomness, and its primes are ridiculously small by cryptographic standards.

Perl's rand function is normally called, but if the sub main::rand exists, it will be used instead. When called with no arguments it should return a float value between 0 and 1-epsilon, with 32 bits of randomness. Examples:

  # Use Mersenne Twister
  use Math::Random::MT::Auto qw/rand/;

  # Use my custom random function
  sub rand { ... }

random_ndigit_prime

  say "My 4-digit prime number is: ", random_ndigit_prime(4);

Selects a random n-digit prime, where the input is an integer number of digits between 1 and the maximum native type (10 for 32-bit, 20 for 64-bit). One of the primes within that range (e.g. 1000 - 9999 for 4-digits) will be uniformly selected using the rand function.

UTILITY FUNCTIONS

prime_precalc

  prime_precalc( 1_000_000_000 );

Let the module prepare for fast operation up to a specific number. It is not necessary to call this, but it gives you more control over when memory is allocated and gives faster results for multiple calls in some cases. In the current implementation this will calculate a sieve for all numbers up to the specified number.

prime_memfree

  prime_memfree;

Frees any extra memory the module may have allocated. Like with prime_precalc, it is not necessary to call this, but if you're done making calls, or want things cleanup up, you can use this. The object method might be a better choice for complicated uses.

Math::Prime::Util::MemFree->new

  my $mf = Math::Prime::Util::MemFree->new;
  # perform operations.  When $mf goes out of scope, memory will be recovered.

This is a more robust way of making sure any cached memory is freed, as it will be handled by the last MemFree object leaving scope. This means if your routines were inside an eval that died, things will still get cleaned up. If you call another function that uses a MemFree object, the cache will stay in place because you still have an object.

FACTORING FUNCTIONS

factor

  my @factors = factor(3_369_738_766_071_892_021);
  # returns (204518747,16476429743)

Produces the prime factors of a positive number input. They may not be in numerical order. The special cases of n = 0 and n = 1 will return n, which guarantees multiplying the factors together will always result in the input value, though those are the only cases where the returned factors are not prime.

The current algorithm is to use trial division for small numbers, while large numbers go through a sequence of small trials, SQUFOF, Pollard's Rho, Hart's one line factorization, and finally trial division for any survivors. This process is repeated for each non-prime factor.

all_factors

  my @divisors = all_factors(30);   # returns (2, 3, 5, 6, 10, 15)

Produces all the divisors of a positive number input. 1 and the input number are excluded (which implies that an empty list is returned for any prime number input). The divisors are a power set of multiplications of the prime factors, returned as a uniqued sorted list.

trial_factor

  my @factors = trial_factor($n);

Produces the prime factors of a positive number input. The factors will be in numerical order. The special cases of n = 0 and n = 1 will return n, while with all other inputs the factors are guaranteed to be prime. For large inputs this will be very slow.

fermat_factor

  my @factors = fermat_factor($n);

Produces factors, not necessarily prime, of the positive number input. The particular algorithm is Knuth's algorithm C. For small inputs this will be very fast, but it slows down quite rapidly as the number of digits increases. It is very fast for inputs with a factor close to the midpoint (e.g. a semiprime p*q where p and q are the same number of digits).

holf_factor

  my @factors = holf_factor($n);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. It is possible the function will be unable to find a factor, in which case a single element, the input, is returned. This uses Hart's One Line Factorization with no premultiplier. It is an interesting alternative to Fermat's algorithm, and there are some inputs it can rapidly factor. In the long run it has the same advantages and disadvantages as Fermat's method.

squfof_factor

  my @factors = squfof_factor($n);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. It is possible the function will be unable to find a factor, in which case a single element, the input, is returned. This function typically runs very fast.

prho_factor

pbrent_factor

pminus1_factor

  my @factors = prho_factor($n);

  # Use a very small number of rounds
  my @factors = prho_factor($n, 1000);

Produces factors, not necessarily prime, of the positive number input. An optional number of rounds can be given as a second parameter. These attempt to find a single factor using one of the probabilistic algorigthms of Pollard Rho, Brent's modification of Pollard Rho, or Pollard's p - 1. These are more specialized algorithms usually used for pre-factoring very large inputs, or checking very large inputs for naive mistakes. If the input is prime or they run out of rounds, they will return the single input value. On some inputs they will take a very long time, while on others they succeed in a remarkably short time.

MATHEMATICAL FUNCTIONS

ExponentialIntegral

  my $Ei = ExponentialIntegral($x);

Given a non-zero floating point input x, this returns the real-valued exponential integral of x, defined as the integral of e^t/t dt from -infinity to x. Depending on the input, the integral is calculated using continued fractions (x < -1), rational Chebyshev approximation ( -1 < x < 0), a convergent series (small positive x), or an asymptotic divergent series (large positive x).

Accuracy should be at least 14 digits.

LogarithmicIntegral

  my $li = LogarithmicIntegral($x)

Given a positive floating point input, returns the floating point logarithmic integral of x, defined as the integral of dt/ln t from 0 to x. If given a negative input, the function will croak. The function returns 0 at x = 0, and -infinity at x = 1.

This is often known as li(x). A related function is the offset logarithmic integral, sometimes known as Li(x) which avoids the singularity at 1. It may be defined as Li(x) = li(x) - li(2).

This function is implemented as li(x) = Ei(ln x) after handling special values.

Accuracy should be at least 14 digits.

RiemannR

  my $r = RiemannR($x);

Given a positive non-zero floating point input, returns the floating point value of Riemann's R function. Riemann's R function gives a very close approximation to the prime counting function.

Accuracy should be at least 14 digits.

LIMITATIONS

I have not completed testing all the functions near the word size limit (e.g. 2^32 for 32-bit machines). Please report any problems you find.

The extra factoring algorithms are mildly interesting but really limited by not being big number aware. Assuming a desktop PC, every 32-bit number should be factored by the main routine in a few microseconds, and 64-bit numbers should be a few milliseconds at worst.

Perl versions earlier than 5.8.0 have issues with 64-bit that show up in the factoring tests. The test suite will try to determine if your Perl is broken. If you use later versions of Perl, or Perl 5.6.2 32-bit, or Perl 5.6.2 64-bit and keep numbers below ~ 2^52, then everything works. The best solution is to update to a more recent Perl.

The module is thread-safe and should allow good concurrency on all platforms that support Perl threads except Win32 (Cygwin works). With Win32, either don't use threads or make sure prime_precalc is called before using primes, prime_count, or nth_prime with large inputs.

PERFORMANCE

Counting the primes to 10^10 (10 billion), with time in seconds. Pi(10^10) = 455,052,511.

   External C programs in C / C++:

       1.9  primesieve 3.6 forced to use only a single thread
       2.2  yafu 1.31
       5.6  Tomás Oliveira e Silva's segmented sieve v2 (Sep 2010)
       6.6  primegen (optimized Sieve of Atkin)
      11.2  Tomás Oliveira e Silva's segmented sieve v1 (May 2003)
      17.0  Pari 2.3.5 (primepi)

   Small portable functions suitable for plugging into XS:

       5.3  My segmented SoE used in this module
      15.6  My Sieve of Eratosthenes using a mod-30 wheel
      17.2  A slightly modified verion of Terje Mathisen's mod-30 sieve
      35.5  Basic Sieve of Eratosthenes on odd numbers
      33.4  Sieve of Atkin, from Praxis (not correct)
      72.8  Sieve of Atkin, 10-minute fixup of basic algorithm
      91.6  Sieve of Atkin, Wikipedia-like

Perl modules, counting the primes to 800_000_000 (800 million), in seconds:

  Time (s)   Module                      Version  Notes
  ---------  --------------------------  -------  -----------
       0.36  Math::Prime::Util           0.09     segmented mod-30 sieve
       0.9   Math::Prime::Util           0.01     mod-30 sieve
       2.9   Math::Prime::FastSieve      0.12     decent odd-number sieve
      11.7   Math::Prime::XS             0.29     "" but needs a count API
      15.0   Bit::Vector                 7.2
      59.1   Math::Prime::Util::PP       0.09     Perl
     170.0   Faster Perl sieve (net)     2012-01  array of odds
     548.1   RosettaCode sieve (net)     2012-06  simplistic Perl
   >5000     Math::Primality             0.04     Perl + GMP

is_prime: my impressions:

   Module                    Small inputs   Large inputs (10-20dig)
   -----------------------   -------------  ----------------------
   Math::Prime::Util         Very fast      Pretty fast
   Math::Prime::XS           Very fast      Very, very slow if no small factors
   Math::Pari                Slow           OK
   Math::Prime::FastSieve    Very fast      N/A (too much memory)
   Math::Primality           Very slow      Very slow

The differences are in the implementations:

   - L<Math::Prime::FastSieve> only works in a sieved range, which is really
     fast if you can do it (M::P::U will do the same if you call
     C<prime_precalc>).  Larger inputs just need too much time and memory
     for the sieve.

   - L<Math::Primality> uses a GMP BPSW test which is overkill for our 64-bit
     range.  It's generally an order of magnitude or two slower than any
     of the others.  

   - L<Math::Pari> has some very effective code, but it has some overhead to get
     to it from Perl.  That means for small numbers it is relatively slow: an
     order of magnitude slower than M::P::XS and M::P::Util (though arguably
     this is only important for benchmarking since "slow" is ~2 microseconds).
     Large numbers transition over to smarter tests so don't slow down much.

   - L<Math::Prime::XS> does trial divisions, which is wonderful if the input
     has a small factor (or is small itself).  But it can take 1000x longer
     if given a large prime.

   - L<Math::Prime::Util> looks in the sieve for a fast bit lookup if that
     exists (default up to 30,000 but it can be expanded, e.g.
     C<prime_precalc>), uses trial division for numbers higher than this but
     not too large (0.1M on 64-bit machines, 100M on 32-bit machines), and a
     deterministic set of Miller-Rabin tests for large numbers.

Factoring performance depends on the input, and the algorithm choices used are still being tuned. Compared to Math::Factor::XS, it is a tiny bit faster for most input under 10M or so, and rapidly gets faster. For numbers larger than 32 bits it's 10-100x faster (depending on the number -- a power of two will be identical, while a semiprime with large factors will be on the extreme end). Pari's underlying algorithms and code are very sophisticated, and will always be more so than this module, and of course supports bignums which is a huge advantage. Small numbers factor much, much faster with Math::Prime::Util. Pari passes M::P::U in speed somewhere in the 16 digit range and rapidly increases its lead.

The presentation here: http://math.boisestate.edu/~liljanab/BOISECRYPTFall09/Jacobsen.pdf has a lot of data on 64-bit and GMP factoring performance I collected in 2009. Assuming you do not know anything about the inputs, trial division and optimized Fermat work very well for small numbers (<= 10 digits), while native SQUFOF is typically the method of choice for 11-18 digits (I've seen claims that a lightweight QS can be faster for 15+ digits). Some form of Quadratic Sieve is usually used for inputs in the 19-100 digit range, and beyond that is the Generalized Number Field Sieve. For serious factoring, I recommend looking info yafu, msieve, Pari, and GGNFS.

AUTHORS

Dana Jacobsen <dana@acm.org>

ACKNOWLEDGEMENTS

Eratosthenes of Cyrene provided the elegant and simple algorithm for finding the primes.

Terje Mathisen, A.R. Quesada, and B. Van Pelt all had useful ideas which I used in my wheel sieve.

Tomás Oliveira e Silva has released the source for a very fast segmented sieve. The current implementation does not use these ideas, but future versions likely will.

The SQUFOF implementation being used is my modifications to Ben Buhrow's modifications to Bob Silverman's code. I may experiment with some other implementations (Ben Buhrows and Jason Papadopoulos both have published excellent versions in the public domain).

COPYRIGHT

Copyright 2011-2012 by Dana Jacobsen <dana@acm.org>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.