The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

BioUtil::Util - Utilities for operation on data or file

Some great modules like BioPerl provide many robust solutions. However, it is not easy to install for someone in some platforms. And for some simple task scripts, a lite module may be a good choice. So I reinvented some wheels and added some useful utilities into this module, hoping it would be helpful.

VERSION

Version 2015.0228

EXPORT getopt

    file_list_from_argv
    get_file_list

    delete_string_elements_by_indexes
    delete_array_elements_by_indexes

    extract_parameters_from_string
    get_parameters_from_file

    get_list_from_file
    get_column_data

    read_json_file
    write_json_file

    run
    run_time
    readable_second

    check_positive_integer
    mean_and_stdev
    
    filename_prefix
    check_all_files_exist
    check_in_out_dir 
    rm_and_mkdir

    get_paired_fq_gz_file_from_dir
    get_paired_fa_gz_file_from_dir

SYNOPSIS

  use BioUtil::Util;

SUBROUTINES/METHODS

getopt

getopt FOR ME

Example -a b -c t tt -d bb -dbtype asdfafd -test

    -a: b
    -c: ARRAY(0xee25e8)
    -d: bb
    -dbtype: asdfafd
    -infmt: fasta
    -test: 1

file_list_from_argv

Get file list from @ARGV. You should use this after parsing options!

When no arguments given, 'STDIN' will be added to the list, which could be further used by, e.g. FastaReader.

get_file_list

Find files/directories with custom filter, max serach depth could be specified.

Example (searching perl scripts)

    my $dir   = "~";
    my $depth = 2;

    my $list = get_file_list(
        $dir,
        sub {
            if ( -d or /^\./i ) {  # ignore configuration file and folders
                return 0;
            }
            if (/\.pm/i or /\.pl/i) {
                return 1;
            }
            return 0;
        },
        $depth
    );
    print "$_\n" for @$list;

delete_string_elements_by_indexes

Delete string elements by indexes, it uses delete_array_elements_by_indexes

delete_array_elements_by_indexes

Delete array elements by given indexes.

Example:

    @list = qw(a b c d e f);
    @idx = (1, 2, 4);
    $list2 = delete_array_elements_by_indexes(\@list, \@idx);
    print "@$list2\n"; # result: a, d, f

extract_parameters_from_string

Extract parameters from string.

The regular expression is

    /([\w\d\_\-\.]+)\s*=\s*([^\=;]*)[\s;]*/

Example:

    # bad format, but could also be parsed
    # my $s = " s = b; a=test; b_c=12 3; a.b =; b
    # = asdf
    # sd; ads-f = 12313";

    # recommended
    my $s = "key1=abcde; key2=123; conf.a=file; conf.b=12; ";

    my $pa = extract_parameters_from_string($s);
    print "=$_:$$p{$_}=\n" for sort keys %$pa;

get_parameters_from_file

Get parameters from a file. Comments start with # are allowed in file.

Example:

    my $pa = get_parameters_from_file("d.txt");
    print "$_: $$pa{$_}\n" for sort keys %$pa;

For a file with content:

    # cell phone 
    apple = 1 # note

    nokia = 2 #

output is:

    apple: 1
    nokia: 2

get_list_from_file

Get list from a file. Comments start with # are allowed in file.

Example:

    my $list = get_list_from_file("d.txt");
    print "$_\n" for @$list;

For a file with content:

    # cell phone 
    apple # note

    nokia

output is:

    apple
    nokia

get_column_data

Get one column of a file.

Example:

    my $list = get_column_data("d.txt", 2);
    print "$_\n" for @$list;

read_json_file

Read json file and decode it into a hash ref.

Example:

    my $hashref = read_json_file($file);

write_json_file

Write a hash ref into a file.

Example:

    my $hashref = { "a" => 1, "b" => 2 };
    write_json_file($hashref, $file);

run

Run a command

Example:

    my $fail = run($cmd);
    die "failed to run:$cmd\n" if $fail;

run_time

Run a subroutine with given arguments N times, and return the mean and stdev of time.

Example:

    my $read_by_record = sub {
        my ($file) = @_;
        my $next_seq = FastaReader($file);
        while ( my $fa = &$next_seq() ) {
            my ( $header, $seq ) = @$fa;
            # print ">$header\n$seq\n";
        }
    };
    
    my ($mean, $stdev) = run_time( 8, $read_by_record, $file );
    printf STDERR "\n## Compute time: %0.03f ± %0.03f s\n\n", $mean, $stdev;

readable_second

readable_second

Example:

    print readable_second(11312314),"\n"; # 130 day 22 hour 18 min 34 sec

check_positive_integer

Check Positive Integer

Example:

    check_positive_integer(1);

mean_and_stdev

return mean and stdev of a list

Example: my @list = qq/1 2 3/; mean_and_stdev(\@list);

filename_prefix

Get filename prefix

Example:

    filename_prefix("test.fa"); # "test"
    filename_prefix("tmp");     # "tmp"

check_all_files_exist

    Check whether all files existed.

check_in_out_dir

Check in and $fh2 directory.

Example:

    check_in_out_dir("~/dir", "~/dir.out");

rm_and_mkdir

Make a directory, remove it firstly if it exists.

Example:

    rm_and_mkdir("out")

get_paired_fq_gz_file_from_dir

Example:

    # .
    # ├── test_1.fq.gz
    # └── test_2.fq.gz
    for my $pe ( get_paired_fq_gz_file_from_dir($indir) ) {
        # test_1.fq.gz, test_1.fq.gz, test
        my ( $fqfile1, $fqfile2, $id ) = @$pe;

    }

1 POD Error

The following errors were encountered while parsing the POD:

Around line 532:

Non-ASCII character seen before =encoding in '±'. Assuming UTF-8