The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

App::Sandy::Command::Transcriptome - simulate command class. Simulate transcriptome sequencing

VERSION

version 0.20

SYNOPSIS

 sandy transcriptome [options] <fasta-file>

 Arguments:
  a fasta-file

 Options:
  -h, --help                     brief help message
  -u, --man                      full documentation
  -v, --verbose                  print log messages
  -p, --prefix                   prefix output [default:"out"]  
  -o, --output-dir               output directory [default:"."]
  -O, --output-format            bam, sam, fastq.gz, fastq [default:"fastq.gz"]
  -1, --join-paired-ends         merge R1 and R2 outputs in one file
  -x, --compression-level        speed compression: "1" - compress faster,
                                 "9" - compress better [default:"6"; Integer]
  -i, --append-id                append to the defined template id [Format]
  -I, --id                       overlap the default template id [Format]
  -j, --jobs                     number of jobs [default:"1"; Integer]
  -s, --seed                     set the seed of the base generator
                                 [default:"time()"; Integer]
  -n, --number-of-reads          set the number of reads
                                 [default:"1000000", Integer]
  -t, --sequencing-type          single-end or paired-end reads
                                 [default:"paired-end"]
  -q, --quality-profile          sequencing system profiles from quality
                                 database [default:"poisson"]
  -e, --sequencing-error         sequencing error rate for poisson
                                 [default:"0.001"; Number]
  -m, --read-mean                read mean size for poisson
                                 [default:"100"; Integer]
  -d, --read-stdd                read standard deviation size for poisson
                                 [default:"0"; Integer]
  -M, --fragment-mean            the fragment mean size for paired-end reads
                                 [default:"300"; Integer]
  -D, --fragment-stdd            the fragment standard deviation size for
                                 paired-end reads [default:"50"; Integer]
  -f, --expression-matrix        an expression-matrix entry from database

DESCRIPTION

Simulate transcriptome sequencing.

OPTIONS

--help

Print a brief help message and exits.

--man

Prints the manual page and exits.

--verbose

Prints log information to standard error

--prefix

Concatenates the prefix to the output-file name.

--output-dir

Creates output-file inside output-dir. If output-dir does not exist, it is created recursively

--output-format

Choose the output format. Available options are: bam, sam, fastq.gz, fastq. For bam option, --append-id is ignored, considering that the sequence identifier is splitted by blank character, so just the first field is included into the query name column (first column).

--join-paired-ends

By default, paired-end reads are put into two different files, prefix_R[12]_001.fastq(\.gz)?. If the user wants both outputs together, she can pass this option. If the --id does not have the escape character %R, it is automatically included right after the first field (blank separated values) as in id/%R - which resolves to id/1 or id/2. It is necessary to distinguish which read is R1/R2

--compression-level

Regulates the speed of compression using the specified digit (between 1 and 9), where "1" indicates the fastest compression method (less compression) and "9" indicates the slowest compression method (best compression). The default compression level is "6"

--append-id

Append string template to the defined template id. See Format

--id

Overlap the default defined template id: single-end %i.%U %U and paired-end %i.%U %U e.g. SR123.1 1 See Format

Format

A string Format is a combination of literal and escape characters similar to the way printf works. That way, the user has the freedom to customize the fastq sequence identifier to fit her needs. Valid escape characteres are:

Common escape characters

        ----------------------------------------------------------------------------
         Escape       Meaning
        ----------------------------------------------------------------------------
         %i           instrument id composed by SR + PID
         %I           job slot number
         %q           quality profile
         %e           sequencing error
         %x           sequencing error position
         %R           read 1, or 2 if it is the paired-end mate
         %U           read number
         %r           read size
         %m           read mean
         %d           read standard deviation
         %c           sequence id as chromossome, gene/transcript id
         %C           sequence id type (reference or alternate non reference allele) ***
         %s           read strand
         %t           read start position
         %n           read end position
         %a           read start position regarding reference genome ***
         %b           read end position regarding reference genome ***
         %v           structural variation position ***
        ----------------------------------------------------------------------------
        *** specific for structural variation (genome simulation only)

Paired-end specific escape characters

        ----------------------------------------------------------------------------
         Escape       Meaning
        ----------------------------------------------------------------------------
         %T           mate read start position
         %N           mate read end position
         %A           mate read start position regarding reference genome ***
         %B           mate read end position regarding reference genome ***
         %D           distance between the paired-reads
         %M           fragment mean
         %D           fragment standard deviation
         %f           fragment size
         %F           fragment strand
         %S           fragment start position
         %E           fragment end position
         %X           fragment start position regarding reference genome ***
         %Z           fragment end position regarding reference genome ***
        ----------------------------------------------------------------------------
        *** specific for structural variation (genome simulation only)
--jobs

Sets the number of child jobs to be created

--seed

Sets the seed of the base generator. The ability to set the seed is useful for those who want reproducible simulations. Pay attention to the number of jobs (--jobs) set, because each job receives a different seed calculated from the main seed. So, for reproducibility, the same seed set before needs the same number of jobs set before as well.

--read-mean

Sets the read mean if quality-profile is equal to 'poisson'. The quality-profile from database overrides the read-size

--read-stdd

Sets the read standard deviation if quality-profile is equal to 'poisson'. The quality-profile from database overrides the read-stdd

--number-of-reads

Sets the number of reads desired for each fragment end. That means, it will be the number of reads for each pair - 1 x N reads for single-end and 2 x N reads for paired-end. This is the default option for transcriptome sequencing simulation

--sequencing-type

Sets the sequencing type to single-end or paired-end

--fragment-mean

If the sequencing-type is set to paired-end, it sets the fragment mean

--fragment-stdd

If the sequencing-type is set to paired-end, it sets the fragment standard deviation

--sequencing-error

Sets the sequencing error rate if quality-profile is equal to 'poisson'. Valid values are between zero and one

--quality-profile

Sets the sequencing system profile for quality. The default value is a poisson distribution, but the user can choose among several profiles stored into the database or import his own data. See quality command for more details

--expression-matrix

By default, the gene/transcript is raffled using its length as weight. If you choose an expression-matrix, then the raffle will be made based on the gene/transcript expression. The expression-matrix entries are found into the database. See expression command for more details

AUTHORS

  • Thiago L. A. Miller <tmiller@mochsl.org.br>

  • J. Leonel Buzzo <lbuzzo@mochsl.org.br>

  • Felipe R. C. dos Santos <fsantos@mochsl.org.br>

  • Helena B. Conceição <hconceicao@mochsl.org.br>

  • Gabriela Guardia <gguardia@mochsl.org.br>

  • Fernanda Orpinelli <forpinelli@mochsl.org.br>

  • Pedro A. F. Galante <pgalante@mochsl.org.br>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2018 by Teaching and Research Institute from Sírio-Libanês Hospital.

This is free software, licensed under:

  The GNU General Public License, Version 3, June 2007