Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/chrisarg/bio-seqalignment-applications-sequencingsimulators-rnaseq-polyester

Enhancement the RNA simulator polyester to include polyA tails in the simulated sequences
https://github.com/chrisarg/bio-seqalignment-applications-sequencingsimulators-rnaseq-polyester

bioinformatics bioinformatics-tool rnaseq sequence simulator

Last synced: 19 days ago
JSON representation

Enhancement the RNA simulator polyester to include polyA tails in the simulated sequences

Awesome Lists containing this project

README

        

NAME
polyester_polyA.pl - A Perl application for enhancing the polyester RNA
sequencing simulation tool.

VERSION
version 0.02

SYNOPSIS
polyester_polyA.pl [options]

DESCRIPTION
The purpose of the application is to enhance the polyester RNA
sequencing simulation tool by including polyA tails in the reference RNA
being used to generate the simulated sequencing data. The application is
a wrapper around the R package polyester, which only accounts for the
processes of fragmentation, reverse complementation and sequencing when
generating data. Note that the Perl application does not (at this
moment) include the possibility of passing logspline R objects as
parameters to the R script and the the polyester "simulate_experiment"
function. The command line options are the same as the ones in the
polyester R package, with the exception of: * The addition of the
--taildist option, which is mandatory and specifies the tail
distribution to be used. * The addition of the --distparams option,
which is mandatory and specifies the parameters of the distribution. *
The addition of the --maxseqs option, which is optional and specifies
whether to break the single fasta file generated by the application into
multiple files of a specified maximum number of sequences. * The
addition of the --modformat option, which is optional and specifies the
format for storing modifications (one of JSON, YAML, or MessagePack). *
BONUS: provide a R script that can be used to control the polyester
simulation process from the command line (polyester.R) All other
parameters have the same interpretation and semantics as in the
polyester R package.

OPTIONS
--bias, -b [STRING]
Fragment selection bias (optional).

--distparams, -P [FLOAT1 FLOAT2 ...]
Distribution parameters (mandatory, list of numeric values).

--errormodel, -e [STRING]
Error model (optional).

--errorrate, -E [FLOAT]
Error probability (optional).

--fastafile, -f [PATH]
Fasta file path (mandatory).

--fcfile, -c [PATH]
Fold change file path (optional).

--fraglen, -F [INTEGER]
Fragment length (average) (optional).

--fragsd, -S [INTEGER]
Fragment length (standard deviation) (optional).

--gcbias, -g [INTEGER]
GC bias (optional).

--modformat, -m [INTEGER]
Case insensitive format for storing modifications (one of JSON,
YAML, or MessagePack) (optional).

--maxseqs, -m [INTEGER]
Maximum sequences per file (optional).

--numreps, -n [INTEGER1 INTEGER2 ...]
Number of replicates in each group (optional, list).

--outdir, -o [PATH]
Path to output directory (optional).

--paired, -p [TRUE|FALSE]
Paired reads (optional).

--readlen, -R [INTEGER]
Read length (optional).

--readsfile, -r [PATH]
Reads per transcript file path (optional).

--seed, -d [INTEGER]
Random seed (optional).

--strandspec, -s [TRUE|FALSE]
Strand specificity (optional).

--taildist, -t [STRING]
Tail distribution (mandatory).

--writeinfo, -w [INTEGER]
Save simulation info (optional).

EXAMPLES
polyester_polyA.pl --fastafile myseq.fasta --taildist gamma \
--distparams 125.0 1.0 0.0 250.0 --fraglen 100 --fragsd 10 \
--numreps 1 --strandspec TRUE --readlen 75 --paired F \
--maxseqs 1000 --modformat YAML --outdir /path/to/output

TODO
* Add the possibility of passing logspline R objects as parameters to
the R script and the polyester "simulate_experiment" function.

* Add the possibility of adding UMI tags to sequences.

* Add the possibility of adding sequencing adapters to sequences.

SEE ALSO
* polyester

Polyester is an R package designed to simulate RNA sequencing
experiments with differential transcript expression.Given a set of
annotated transcripts, Polyester will simulate the steps of an
RNA-seq experiment (fragmentation, reverse-complementing, and
sequencing) and produce files containing simulated RNA-seq reads.
Simulated reads can be analyzed using your choice of downstream
analysis tools. Polyester has a built-in wrapper function to
simulate a case/control experiment with differential transcript
expression and biological replicates. Users are able to set the
levels of differential expression at transcripts of their choosing.
This means they know which transcripts are differentially expressed
in the simulated dataset, so accuracy of statistical methods for
differential expression detection can be analyzed.

Polyester offers several unique features:

* Built-in functionality to simulate differential expression at the
transcript level * Ability to explicitly set differential expression
signal strength * Simulation of small datasets, since large RNA-seq
datasets can require lots of time and computing resources to analyze
* Generation of raw RNA-seq reads, as opposed to alignments or
transcript-level abundance estimates * Transparency/open-source code

AUTHOR
Christos Argyropoulos

COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by Christos Argyropoulos.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.