The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Mock::Populate - Mock data creation

VERSION

version 0.0901

SYNOPSIS

  use Mock::Populate;
  # * Call each function below with Mock::Populate::foo(...
  $ids    = number_ranger(start => 1, end => 1001, prec => 0, random => 0, N => $n);
  $money  = number_ranger(start => 1000, end => 5000, prec => 2, random => 1, N => $n);
  $create = date_ranger(start => '1900-01-01', end => '2020-12-31', N => $n);
  $modify = date_modifier($offset, @$create);
  $times  = time_ranger(stamp => 1, start => '01:02:03' end =>'23:59:59', N => $n);
  $people = name_ranger(gender => 'b', names => 2, country => 'us', N => $n);
  $email  = email_ranger(@$people);
  $shuff  = shuffler($n, qw(foo bar baz goo ber buz));
  $stats  = distributor(type => 'u', prec => 4, dof => 2, N => $n);
  $string = string_ranger(length => 32, type => 'base64', N => $n);
  $imgs   = image_ranger(size => 10, N => $n);  # *size is density, not pixel dimension
  $coll   = collate($ids, $people, $email, $create, $times, $modify, $times);

DESCRIPTION

This is a set of functions for mock data creation.

No functions are exported, so use the entire Mock::Populate::* namespace when calling each.

Each function produces a list of elements that can be used as database columns. The handy collate() function takes these columns and returns a list of (arrayref) rows. This can then be processed into CSV, JSON, etc. It can also be directly inserted into your favorite database, with your favorite perl ORM.

FUNCTIONS

date_ranger()

  $results = date_ranger(start => $start, end => $end, N => $n);

Return a list of N random dates within a range. The start and end dates and desired number of data-points arguments are all optional. The defaults are:

  start: 2000-01-01
  end: today (computed if not given)
  N: 10

The dates must be given as YYYY-MM-DD strings.

date_modifier()

  $modify = date_modifier($offset, @$dates);

Returns a new list of random future dates, based on the offset, and respective to each given date.

time_ranger()

  $results = time_ranger(
    stamp => $stamp, start => $start, end => $end,
    N => $n);

Return a list of N random times within a range. The start and end times and desired number of data-points arguments are all optional. The defaults are:

  stamp: 1 (boolean)
  start: 00-00-00
  end: now (computed if not given)
  N: 10

The times must be given as HH-MM-SS strings.

number_ranger()

  $results = number_ranger(
    start => $start, end => $end,
    prec => $prec, random => $random,
    N => $n)

Return a list of N random numbers within a range. The start, end, precision, whether we want random or sequential numbers and desired number of data-points arguments are all optional. The defaults are:

  start: 0
  end: 9
  precision: 2
  random: 1
  N: 10

name_ranger()

  $results = name_ranger(
    gender => $gender, names => $names, country => $country,
    N => $n)

Return a list of N random person names. The gender, number of names and desired number of data-points arguments are all optional. The defaults are:

  gender: b (options: both, female, male)
  names: 2 (first, last)
  country: us
  N: 10

email_modifier()

  $results = email_modifier(@people)
  # first.last@example.{com,net,org,edu}

Return a list of N email addresses based on a list of given names.

distributor()

  $results = distributor(type => $type, prec => $prec, dof => $dof, N => $n)

Return a list of N distribution values. The type, precision, degrees-of-freedom and desired number of data-points arguments are optional. The defaults are:

  type: u (normal)
  precision: 2
  degrees-of-freedom: 2
  N: 10

Types

This function uses single letter identifiers:

  u: Normal distribution (default)
  c: Chi-squared distribution
  s: Student's T distribution
  f: F distribution

Degrees of freedom

Given the type, this function accepts the following:

  c: A single integer
  s: A single integer
  f: A fraction string of the form 'N/D' (default 2/1)

shuffler()

  $results = shuffler($n, @items)

Return a shuffled list of $n items. The items and number of data-points arguments are optional. The defaults are:

  n: 10
  items: a b c d e f g h i j

string_ranger()

  $results = string_ranger(type => $type, length => $length, N => $n)

Return a list of N strings. The strings and number of data-points arguments are optional. The defaults are:

  type: default
  length: 8
  N: 10

* This function is nearly identical to the Data::SimplePassword rndpassword program, but allows you to generate a finite number of results.

Types

  Types     Output sample     Character set
  ___________________________________________________
  default   0xaVbi3O2Lz8E69s  0..9 a..z A..Z
  ascii     n:.T<Gr!,e*[k=eu  visible ascii
  base64    PC2gb5/8+fBDuw+d  0..9 a..z A..Z / +
  path      PC2gb5/8.fBDuw.d  0..9 a..z A..Z / .
  simple    xek4imbjcmctsxd3  0..9 a..z
  hex       89504e470d0a1a0a  0..9 a..f
  alpha     femvifzscyvvlwvn  a..z
  pron      werbucedicaremoz  a..z but pronounceable!
  digit     7563919623282657  0..9
  binary    1001011110000101
  morse     -.--...-.--.-..-

image_ranger()

  $results = image_ranger(size => $size, N => $n)

Return a list of N 1x1 pixel images of varying byte sizes (not image dimension). The byte size and number of data-points are both optional.

The defaults are:

  N: 10
  size: 8

collate()

  $rows = collate(@columns)

Return a list of lists representing a 2D table of rows, given the lists provided, with each member added to a row, respectively.

SEE ALSO

Data::SimplePassword

Date::Range

Date::Simple

Image::Dot

List::Util

Mock::Person

Statistics::Distributions

Text::Password::Pronounceable

Text::Unidecode

Time::Local

Data::Random does nearly the exact same thing. Whoops!

TO DO

Implement dirty-data randomizing.

  unexpected formats: iso-8859-1, utf-16, windows codepage,
  BOM (byte order marker),
  broken unicode,
  garbled binary,
  \r and \n variations,
  commas or $ in currencies ("format fuckups"),
  bad JSON,
  broken XML,
  bad ' and " in CSV,
  statistical outliers,
  time-series drops and spikes,
  duplicate data,
  missing data,
  truncated data,

AUTHOR

Gene Boggs <gene@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Gene Boggs.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.