The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

geoCancerDiagnosticDatasetsRetriever - GEO Cancer Diagnostic Datasets Retriever is a bioinformatics tool for cancer diagnostic dataset retrieval from the GEO website.

SYNOPSIS

    Usage: geoCancerDiagnosticDatasetsRetriever -h -d "CANCER_TYPE" -p "PLATFORMS_CODES" -f "DIRECTORY_PATH" -k 

An example basic command using "myelodysplastic syndrome" as a query:

    $ geoCancerDiagnosticDatasetsRetriever -d "myelodysplastic syndrome"

When using the basic command, the input and output files of geoCancerDiagnosticDatasetsRetriever will be found in the `~/geoCancerDiagnosticDatasetsRetriever_files/data/` and `~/geoCancerDiagnosticDatasetsRetriever_files/results/` directories, respectively.

DESCRIPTION

Gene Expression Omnibus (GEO) Cancer Diagnostic Datasets Retriever is a Bioinformatics tool for cancer diagnostic dataset retrieval from the GEO database. It requires a GeoDatasets input file listing all GSE dataset entries for a specific cancer (for example, myelodysplastic syndrome), obtained as a download from the GEO database. This Bioinformatics tool functions by applying keyword filters to examine individual GSE dataset entries listed in a GEO DataSets input file. The first Diagnostic text filter flags for diagnostic keywords (for example, “diagnosis” or “health”) used by clinical science researchers and present in the title/abstract entries. Next, a flagged dataset is examined (by a second Diagnostic text filter) for diagnostic keywords, which may be present in the "Overall design" section of a GSE dataset. If found, this tool outputs the GSE code of the likely diagnostic dataset. If not found by the second filter, a more intensive filtering stage is performed. Here, this tool runs an R script (`healthyControlsPresentInputParams.r`) whose function is to detect desired keywords in the .SOFT file of this dataset and identify if it is a likely diagnostic dataset.

DEPENDENCIES

strict
warnings
Term::ANSIColor
Getopt::Std
LWP::Simple
File::Basename
File::HomeDir
App::cpanminus
Net::SSLeay

INSTALLATION

geoCancerDiagnosticDatasetsRetriever can be used on any Linux, macOS, or Windows machines. On the Windows operating system you will need to install the Windows Subsystem for Linux (WSL) compatibility layer (The WSL Installation Page). Once WSL is launched, the user can follow the geoCancerDiagnosticDatasetsRetriever installation instructions described below.

By default, Perl is installed on all Linux or macOS operating systems. Likewise, cURL is installed on all macOS versions. cURL may not be installed on Linux and would need to be manually installed through a Linux distribution’s software centre. It will be installed automatically on Linux Ubuntu by geoCancerDiagnosticDatasetsRetriever.

Manual install:

    $  perl Makefile.PL
    $  make
    $  make install

On Linux Ubuntu, you might need to run the last command as a superuser (`sudo make install`) and you will need to manually install (if not already installed in your Perl 5 configuration) the following packages:

libfile-homedir-perl

    $  sudo apt-get install -y libfile-homedir-perl

cpanminus

    $  sudo apt -y install cpanminus

LWP::Simple

    $  perl -MCPAN -e 'install "LWP::Simple"'

libnet-ssleay-perl

    $  sudo apt-get install -y libnet-ssleay-perl

CPAN install:

    $  cpanm App::geoCancerDiagnosticDatasetsRetriever

To uninstall:

    $  cpanm --uninstall App::geoCancerDiagnosticDatasetsRetriever
    

DATA FILE

The required input file is a GEO DataSets file obtainable as a download from GEO DataSets, upon querying for any particular cancer (for example, myelodysplastic syndrome) in geoCancerDiagnosticDatasetsRetriever.

EXECUTION INSTRUCTIONS

The basic usage for running geoCancerDiagnosticDatasetsRetriever is:

    $  geoCancerDiagnosticDatasetsRetriever -d "CANCER_TYPE"

An example basic usage command using "myelodysplastic syndrome" as a query:

    $  geoCancerDiagnosticDatasetsRetriever -d "myelodysplastic syndrome"

With the basic usage command, the mandatory -d (download) flag is used to download and then retrieve myelodysplastic syndrome diagnostic dataset(s) associated with the GPL570 platform code (default selection). When using this command, the input and output files of geoCancerDiagnosticDatasetsRetriever will be found in the `~/geoCancerDiagnosticDatasetsRetriever_files/data/` and `~/geoCancerDiagnosticDatasetsRetriever_files/results/` directories, respectively.

For specialized options, allowing more fine-grained user control, the following options are made available:

-p <list of GPL platform codes>

A list of GPL platform codes may be specified prior to execution, for expanding diagnostic datasets retrieval for a particular cancer (such as myelodysplastic syndrome). For example:

    $  geoCancerDiagnosticDatasetsRetriever -d "myelodysplastic syndrome" -p "GPL570 GPL97 GPL96"

-f <user-specified absolute path to save results files>

A user-specified absolute path to save results files (overriding the default results directory) may by specified prior to execution. For example:

    $  geoCancerDiagnosticDatasetsRetriever -d "myelodysplastic syndrome" -p "GPL570 GPL97 GPL96" -f "/Myelodysplastic_syndrome_files/"

With this command, the input files will be found in the same directory as a basic usage run's input files (`~/geoCancerDiagnosticDatasetsRetriever_files/data/`. The output files will be found in the user-specified directory (for example, "/Myelodysplastic_syndrome_files/"), created in the user's home directory.

-k <option to keep temporary files>

This option allows a user to keep large temporary/output files instead of them being removed by default. For example:

    $  geoCancerDiagnosticDatasetsRetriever -d "myelodysplastic syndrome" -p "GPL570 GPL97 GPL96" -f "/Myelodysplastic_syndrome_files/" -k

HELP

Help information can be read by typing the following command:

    $ geoCancerDiagnosticDatasetsRetriever -h

This command will print the following instructions:

Usage: geoCancerDiagnosticDatasetsRetriever -h

Mandatory arguments:

    CANCER_TYPE           type of the cancer as query search term

    Optional arguments:
    -p                    list of GPL platform codes
    -f                    user-specified absolute path to save results files
    -k                    option to keep temporary files
    -h                    show help message and exit

AUTHORS

Abbas Alameer (Kuwait University) and Davide Chicco (University of Toronto)

For information, please contact Abbas Alameer at abbas.alameer(AT)ku.edu.kw or Davide Chicco at davidechicco(AT)davidechicco.it

COPYRIGHT AND LICENSE

Copyright 2021 by Abbas Alameer (Kuwait University) and Davide Chicco (University of Toronto)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, version 2 (GPLv2).