{"id":43423915,"url":"https://github.com/rajewsky-lab/mirdeep2","last_synced_at":"2026-02-02T18:57:54.898Z","repository":{"id":41243439,"uuid":"88153902","full_name":"rajewsky-lab/mirdeep2","owner":"rajewsky-lab","description":"Discovering known and novel miRNAs from small RNA sequencing data","archived":false,"fork":false,"pushed_at":"2024-08-27T09:41:50.000Z","size":7798,"stargazers_count":154,"open_issues_count":4,"forks_count":51,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-10-27T15:44:54.778Z","etag":null,"topics":["analysis","mapping","microrna","mirna","prediction","quantification","sequencing","smallrna"],"latest_commit_sha":null,"homepage":"","language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rajewsky-lab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-13T10:34:09.000Z","updated_at":"2025-08-28T13:49:46.000Z","dependencies_parsed_at":"2024-08-23T13:28:13.769Z","dependency_job_id":"f50633c1-da3b-409d-acee-15534b2abde4","html_url":"https://github.com/rajewsky-lab/mirdeep2","commit_stats":{"total_commits":78,"total_committers":6,"mean_commits":13.0,"dds":"0.41025641025641024","last_synced_commit":"c6440e298795579ad62351bba7aff9cc43c50c68"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/rajewsky-lab/mirdeep2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajewsky-lab%2Fmirdeep2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajewsky-lab%2Fmirdeep2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajewsky-lab%2Fmirdeep2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajewsky-lab%2Fmirdeep2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rajewsky-lab","download_url":"https://codeload.github.com/rajewsky-lab/mirdeep2/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rajewsky-lab%2Fmirdeep2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29017938,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-02T18:51:31.335Z","status":"ssl_error","status_checked_at":"2026-02-02T18:49:20.777Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","mapping","microrna","mirna","prediction","quantification","sequencing","smallrna"],"created_at":"2026-02-02T18:57:51.769Z","updated_at":"2026-02-02T18:57:54.891Z","avatar_url":"https://github.com/rajewsky-lab.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/rajewsky-lab/mirdeep2.svg?branch=master)](https://travis-ci.org/rajewsky-lab/mirdeep2)\n\n# miRDeep2 `README`\n\n## About\n\nAuthors: Sebastian Mackowiak \u0026 Marc Friedländer\n\nThis is miRDeep2 developed by Sebastian Mackowiak \u0026 Marc Friedländer.\nmiRDeep2 discovers active known or novel miRNAs from deep sequencing data\n(Solexa/Illumina, 454, ...).\n\n(minor edits to `README`, `TUTORIAL`, `CHANGELOG`, and `FAQ`, convertion to\nMarkdown, trailing whitespace removal \u0026 CI setup by Marcel Schilling)\n\n\n## Requirements\n\nLinux system, 2GB Ram, enough disk space dependent on your deep sequencing data\n\n\n## Testing version\n\nMacOSX with Xcode and gcc compiler installed. (This can be obtained from the\nappstore, if there are any issues with installing it please look for help\nonline).\n\nTo compile the Vienna package it may be necessary to have GNU grep installed\nsince the MacOSX grep is BSD based and sometimes not accepted by the installer.\nTo get a GNU grep you could for example install homebrew by typing\n\n```sh\nruby -e \"$(curl -fsSL \\\n  https://raw.githubusercontent.com/Homebrew/install/master/install)\"\n```\n\n(the link could be out of date, in that case look up online what to do)\n\nAfter that typing\n\n```sh\nbrew tap homebrew/dupes; brew install grep\n```\n\nwill install GNU grep as `ggrep` in `/usr/local/bin/`\n\n\n## Installation\n\n### Option 1: with the provided `install.pl` script\n\nType\n\n```sh\nperl install.pl\n```\n\n### Option 2. without the install mirdeep script\n\nFollow the instructions given below\n\n#### Dependencies\n\nFirst download all necessary packages listed here\n\n1. [bowtie short read aligner][bowtie-source]\n2. [Vienna package with RNAfold][vienna-source]\n3. [SQUID library][squid-source] goto Squid and download it\n4. [randfold][randfold-source]\n5. [Perl package PDF::API2][pdf-api2-source]\n\n[bowtie-source]: http://bowtie-bio.sourceforge.net/index.shtml\n[vienna-source]: http://www.tbi.univie.ac.at/~ivo/RNA/\n[squid-source]: http://eddylab.org/software.html\n[randfold-source]: http://bioinformatics.psb.ugent.be/software/details/Randfold\n[pdf-api2-source]: http://search.cpan.org/search?query=PDF%3A%3AAPI2\u0026mode=all\n\n#### Manual installation\n\nWhen packages are downloaded\n\n1. attach the miRDeep2 executable path to your PATH\n\n```sh\necho 'export PATH=$PATH:your_path_to_mirdeep2/src' \u003e\u003e ~/.bashrc\n```\n\n2. `unzip bowtie-0.11.3-bin-linux-x86_64.zip`\n\n3. put the bowtie directory into your `PATH` variable, *e.g.*\n\n```sh\necho 'export PATH=$PATH:your_path_tobowtie' \u003e\u003e ~/.bashrc\n```\n\n4. `tar xvvzf ViennaRNA-1.8.4.tar.gz`\n\n5. `cd` to the Vienna dir\n\n6. type\n\n```sh\n./configure --prefix=your_path_to_Vienna/install_dir\nmake\nmake install\n```\n\n7. add Vienna binaries to your `PATH` variable, *e.g.*\n\n```sh\necho 'export PATH=$PATH:your_path_to_Vienna/install_dir/bin' \u003e\u003e ~/.bashrc\n```\n\n8. `tar xxvzf squid-1.9g.tar.gz`\n\n9. `tar xvvzf randfold-2.0.tar.gz`\n\n10. `cd randfold2.0`\n\n11. edit Makefile, *e.g.* `emacs Makefile`:\n\nchange line with `INCLUDE=-I.` to\n`INCLUDE=-I. -I\u003cyour_path_to_squid-1.9g\u003e -L\u003cyour_path_to_squid-1.9g\u003e`,\n*e.g.* `INCLUDE=-I. -I/home/Pattern/squid-1.9g/ -L/home/Pattern/squid-1.9g/`\n\n12. `make`\n\n13. add randfold to your `PATH` variable, *e.g.*\n\n```sh\necho 'export PATH=$PATH:your_path_to_randfold' \u003e\u003e ~/.bashrc\n```\n\n14. `tar xvvzf PDF-API2-0.73.tar.gz`\n\n15. `cd` to your PDF_API2 directory\n\n16. then type in\n\n```sh\nperl Makefile.PL INSTALL_BASE=your_path_to_miRDeep2 LIB=your_path_to_miRDeep2/lib\nmake\nmake test\nmake install\n```\n\n17. add your library to the `PERL5LIB`, *e.g.*\n\n```sh\necho \\\n  'export PERL5LIB=PERL5LIB:your_path_to_miRDeep2/lib/perl5' \\\n  \u003e\u003e ~/.bashrc\n```\n\n\n18. `cd` to your mirdeep2 directory (the one containing `install.pl`)\n\n19. `touch install_successful`\n\n20. start a new shell session to apply the changes to environment variables\n\n#### Test installation\n\nTo test if everything is installed properly type in\n\n1. `bowtie`\n2. `RNAfold -h`\n3. `randfold`\n4. `make_html.pl`\n\nYou should not get any error messages. Otherwise something is not correctly\ninstalled.\n\n\n### Install Paths\n\nEverything that is download by the installer will be in a directory called\n`\u003cyour_path_to_mirdeep2\u003e/essentials`\n\n\n## Script Reference\n\nmiRDeep2 analyses can be performed using the three scripts `miRDeep2.pl`,\n`mapper.pl` and `quantifier.pl`.\n\n\n### `miRDeep2.pl`\n\n#### Description\n\nWrapper function for the `miRDeep2.pl` program package. The script runs all\nnecessary scripts of the miRDeep2 package to perform a microRNA detection deep\nsequencing data anlysis.\n\n#### Input\n\n* A FASTA file with deep sequencing reads,\n* a FASTA file of the corresponding genome,\n* a file of mapped reads to the genome in miRDeep2 ARF format,\n* an optional FASTA file with known miRNAs of the analysed species, and\n* an optional FASTA file of known miRNAs of related species.\n\n#### Output\n\n* A spreadsheet and\n* an HTML file\n\nwith an overview of all detected miRNAs in the deep sequencing input data.\n\n#### Options\n\n| option         | description                                                |\n|----------------|------------------------------------------------------------|\n| `‑a \u003cint\u003e`     | minimum read stack height that triggers analysis. Using this option disables automatic estimation of the optimal value. |\n| `‑b \u003cint\u003e`     | minimum score cut-off for predicted novel miRNAs to be displayed in the overview table. This score cut-off is by default 0. |\n| `‑c`           | disable randfold analysis                                  |\n| `‑t \u003cspecies\u003e` | species being analyzed - this is used to link to the appropriate UCSC browser |\n| `‑u`           | output list of UCSC browser species that are supported and exit |\n| `‑v`           | remove directory with temporary files                      |\n| `‑q \u003cfile\u003e`    | `miRBase.mrd` file from quantifier module to show miRBase miRNAs in data that were not scored by miRDeep2 |\n\n#### Examples:\n\nThe miRDeep2 module identifies known and novel miRNAs in deep sequencing data.\nThe output of the mapper module can be directly plugged into the miRDeep2\nmodule.\n\n##### Example use 1\n\nThe user wishes to identify miRNAs in mouse deep sequencing data, using default\noptions.\nThe `miRBase_mmu_v14.fa` file contains all miRBase mature mouse miRNAs, while\nthe `miRBase_rno_v14.fa` file contains all the miRBase mature rat miRNAs.\nThe `2\u003e` will pipe all progress output to the `report.log` file.\n\n```sh\nmiRDeep2.pl reads_collapsed.fa genome.fa reads_collapsed_vs_genome.arf \\\n  miRBase_mmu_v14.fa miRBase_rno_v14.fa precursors_ref_this_species.fa \\\n  -t Mouse 2\u003ereport.log\n```\n\nThis command will generate\n\n* a directory with PDFs showing the structures, read signatures and score\n  breakdowns of novel and known miRNAs in the data,\n* an HTML webpage that links to all results generated (`result.html`),\n* a copy of the novel and known miRNAs contained in the webpage but in text\n  format which allows easy parsing (`result.csv`),\n* a copy of the performance survey contained in the webpage but in text format\n  (`survey.csv`), and\n* a copy of the miRNA read signatures contained in the PDFs but in text format\n  (`output.mrd`).\n\n##### Example use 2\n\nThe user wishes to identify miRNAs in deep sequencing data from an animal with\nno related species in miRBase:\n\n```sh\nmiRDeep2.pl reads_collapsed.fa genome.fa reads_collapsed_vs_genome.arf \\\n  none none none 2\u003ereport.log\n```\n\nThis command will generate the same type of files as example use 1 above.\nNote that there it will in practice always improve miRDeep2 performance if\nmiRNAs from some related species is input, even if it is not closely related.\n\n---\n\n\n### `mapper.pl`\n\n#### Description\n\nProcesses reads and/or maps them to the reference genome.\n\n#### Input\n\nDefault input is\n\n* a file in FASTA, `seq.txt` or `qseq.txt` format.\n\nMore input can be given depending on the options used.\n\n#### Output\n\nThe output depends on the options used (see below).\n\nEither\n\n* a FASTA file with processed reads, or\n* an ARF file with with mapped reads, or\n* both\n\nare output.\n\n#### Options\n\n##### Read input file\n\n| option | description                     |\n|--------|---------------------------------|\n| `‑a`   | input file is `seq.txt` format  |\n| `‑b`   | input file is `qseq.txt` format |\n| `‑c`   | input file is FASTA format      |\n\n##### Preprocessing/mapping\n\n| option        | description                                                 |\n|---------------|-------------------------------------------------------------|\n| `‑h`          | parse to FASTA format                                       |\n| `‑i`          | convert RNA to DNA alphabet (to map against genome)         |\n| `‑j`          | remove all entries that have a sequence that contains letters other than `a`, `c`, `g`, `t`, u, `n`, `A`, `C`, `G`, `T`, `U`, or `N`. |\n| `‑k \u003cseq\u003e`    | clip 3' adapter sequence                                    |\n| `‑l \u003cint\u003e`    | discard reads shorter than `\u003cint\u003e` nts                      |\n| `‑m`          | collapse reads                                              |\n| `‑p \u003cgenome\u003e` | map to genome (must be indexed by `bowtie-build`). The `genome` string must be the prefix of the bowtie index. For instance, if the first indexed file is called `h_sapiens_37_asm.1.ebwt` then the prefix is `h_sapiens_37_asm`. |\n| `‑q`          | map with one mismatch in the seed (mapping takes longer)    |\n\n##### Output files\n\n| option    | description                        |\n|-----------|------------------------------------|\n| `‑s file` | print processed reads to this file |\n| `‑t file` | print read mappings to this file   |\n\n##### Other\n\n| option | description                                  |\n|--------|----------------------------------------------|\n| `‑u`   | do not remove directory with temporary files |\n| `‑v`   | outputs progress report                      |\n\n#### Examples\n\nThe mapper module is designed as a tool to process deep sequencing reads and/or\nmap them to the reference genome. The module works in sequence space, and can\nprocess or map data that is in sequence FASTA format.\nA number of the functions of the mapper module are implemented specifically\nwith Solexa/Illumina data in mind. For example on how to post-process mappings\nin color space, see example use 5:\n\n##### Example use 1\n\nThe user wishes to parse a file in `qseq.txt` format to FASTA format, convert\nfrom RNA to DNA alphabet, remove entries with non-canonical letters (letters\nother than `a`, `c`, `g`, `t`, `u`, `n`, `A`, `C`, `G`, `T`, `U`, or `N`), clip\nadapters, discard reads shorter than 18 nts and collapse the reads:\n\n ```sh\nmapper.pl reads_qseq.txt -b -h -i -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m \\\n  -s reads_collapsed.fa\n```\n\n##### Example use 2\n\nThe user wishes to map a FASTA file against the reference genome.\nThe genome has already been indexed by `bowtie-build`.\nThe first of the indexed files is named `genome.1.ebwt`:\n\n```sh\nmapper.pl reads_collapsed.fa -c -p genome -t reads_collapsed_vs_genome.arf\n```\n\n##### Example use 3\n\nThe user wishes to process the reads as in example use 1 and map the reads as\nin example use 2 in a single step, while observing the progress:\n\n```sh\nmapper.pl reads_qseq.txt -b -h -i -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m \\\n  -p genome -s reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v\n```\n\n##### Example use 4\n\nThe user wishes to parse a GEO file to FASTA format and process it as in\nexample use 1.\nThe GEO file is in tabular format, with the first column showing the sequence\nand the second column showing the read counts:\n\n```sh\ngeo2fasta.pl GSM.txt \u003e reads.fa\n\nmapper.pl reads.fa -c -h -i -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m \\\n  -s reads_collapsed.fa\n```\n\n##### Example use 5\n\nThe user has already removed 3' adapters in color space and has mapped the\nreads against the genome using the BWA tool. The BWA output file is named\n`reads_vs_genome.sam`. Notice that the BWA output contains extra fields that\nare not required for SAM format. Our converter requires these fields and thus\nmay not work with all types of SAM files. The user wishes to generate\n`reads_collapsed.fa` and `reads_vs_genome.arf` to input to miRDeep2:\n\n```sh\nbwa_sam_converter.pl reads_vs_genome.sam reads.fa reads_vs_genome.arf\n\nmapper.pl reads.fa -c -i -j -l 18 -m -s reads_collapsed.fa\n```\n\n---\n\n\n### `quantifier.pl`\n\n#### Description\n\nThe module maps the deep sequencing reads to predefined miRNA precursors and\ndetermines by that the expression of the corresponding miRNAs.\nFirst, the predefined mature miRNA sequences are mapped to the predefined\nprecursors. Optionally, predefined star sequences can be mapped to the\nprecursors too. By that the mature and star sequence in the precursors are\ndetermined.\nSecond, the deep sequencing reads are mapped to the precursors. The number of\nreads falling into an interval 2 nt upstream and 5 nt downstream of the\nmature/star sequence is determined.\n\n#### Input\n\n* A FASTA file with precursor sequences,\n* a FASTA file with mature miRNA sequences,\n* a FASTA file with deep sequencing reads, and\n* optionally a FASTA file with star sequences and the 3 letter code of the\n  species of interest.\n\n#### Output\n\n* A 2 column table file called `miRNA_expressed.csv` with miRNA identifiers and\n  its read count,\n* a file called `miRNA_not_expressed.csv` with all miRNAs having 0 read counts,\n* a signature file called `miRBase.mrd`,\n* a file called `expression.html` that gives an overview of all miRNAs the\n  input data, and\n* a directory called `pdfs` that contains for each miRNA a PDF file showing its\n  signature and structure.\n\n#### Options\n\n| option       | description                                                                                  |\n|--------------|----------------------------------------------------------------------------------------------|\n| -p [file.fa] | miRNA precursor sequences (around 70bp: One line per precursors sequence)\n| -m [file.fa] | mature miRNA sequences (around 22nt)\n| -P           | specify this option of your mature miRNA file contains 5p and 3p ids only\n|\t-c [file]    | config.txt file with different sample ids... or just the one sample id  -- deprecated\n|\t-s [star.fa] | optional star sequences from miRBase\n|\t-t [species] | e.g. Mouse or mmu\n|\t             | if not searching in a specific species all species in your files will be analyzed\n|\t             | else only the species in your dataset is considered\n|\t-y [time]    | optional otherwise its generating a new one\n|\t-d           | if parameter given pdfs will not be generated, otherwise pdfs will be generated\n|\t-o           | if parameter is given reads were not sorted by sample in pdf file, default is sorting\n|\t-k           | also considers precursor-mature mappings that have different ids, eg let7c\n|\t             | would be allowed to map to pre-let7a\n|\t-n           | do not do file conversion again\n|\t-x           | do not do mapping against precursor again\n|\t-g [int]     | number of allowed mismatches when mapping reads to precursors, default 1\n|\t-e [int]     | number of nucleotides upstream of the mature sequence to consider, default 2\n|\t-f [int]     | number of nucleotides downstream of the mature sequence to consider, default 5\n|\t-j           | do not create an output.mrd file and pdfs if specified\n|\t-W           | read counts are weighed by their number of mappings. e.g. A read maps twice so each position \n|              | gets 0.5 added to its read profile\n|\t-U           | use only unique read mappings; Caveat: Some miRNAs have multiple precursors. These will be \n|              | underestimated in their expression since the multimappers are excluded\n| -u\t         | list all values allowed for the species parameter that have an entry at UCSC\n\n#### Example usage\n\n```sh\nquantifier.pl -p precursors.fa -m mature.fa -r reads.fa\n```\n\n---\n\n\n### `make_html.pl`\n\n#### Description\n\nIt creates a file called `result.html` that gives an overview of miRDeep2\ndetected miRNAs (known and novel ones). The HTML file lists up each detected\nmiRNA and provides among others information on its miRDeep2 score, reads mapped\nto its mature, loop and star sequence, the mature, star and consensus precursor\nsequences themselves and provides links to BLAST, BLAT, mirBase for miRBase\nmiRNAs and to a PDF file that shows the signature and structure.\n\n#### Input\n\n* A miRDeep2 output.mrd file and\n* a miRDeep2 survey.csv file\n\n#### Output\n\n* A `result.html` file with an entry for each provisional miRNA that contains\n  information about its assigned Id, miRDeep2 score, estimated probability that\n  the miRNA candidate is a true positive, rfam alert, total read count, mature\n  read count, loop read count, star read count, significant randfold p-value,\n  miRBase miRNA, example miRBase miRNA with the same seed, BLAT, BLAST,\n  consensus mature sequence, consensus star sequence and consensus precursor\n  sequence. Furthermore, the miRBase miRNAs existent in the input data but not\n  scored by miRDeep2 are listed.\n* A directory called `pdfs` that contains for each provisional miRNA ID a PDF\n  with its signature and structure.\n* A file called `result.csv` (when option `-c` is used) that contains the same\n  entries as the HTML file.\n\n#### Options\n\n| option                | description                                         |\n|-----------------------|-----------------------------------------------------|\n| `‑v \u003cint\u003e`            | only output hairpins with score above `\u003cint\u003e`       |\n| `‑c`                  | also create overview in excel format                |\n| `‑k \u003cfile\u003e`           | supply file with known miRNAs                       |\n| `‑s \u003cfile\u003e`           | supply survey file if score cutoff is used to get information about how big is the confidence of resulting reads |\n| `‑f \u003cfile\u003e`           | miRDeep2 output MRD file                            |\n| `‑e`                  | report complete survey file                         |\n| `‑g`                  | report survey for current score cutoff              |\n| `‑w \u003cproject_folder\u003e` | automatically used when running webinterface, otherwise don't use it |\n| `‑r \u003cfile\u003e`           | Rfam file to check for already reported small RNA sequences |\n| `‑q \u003cfile\u003e`           | `miRBase.mrd` file produced by quantifier module    |\n| `‑x \u003cfile\u003e`           | `signature.arf` file with mapped reads to precursors |\n| `‑t \u003corg\u003e`            | specify the organism from which your sequencing data was obtained |\n| `‑u`                  | print all available UCSC input organisms            |\n| `‑d`                  | do not generate PDFs                                |\n| `‑y`                  | timestamp                                           |\n| `‑z`                  | switch is automatically used when script is called by `quantifier.pl` |\n| `‑o`                  | print reads in PDF signature sorted by their 3 letter code in front of their identifier |\n\n#### Example usage\n\n```sh\nmake_html.pl -f miRDeep_outfile -s survey.csv -c -e -y 123456789\n```\n\n---\n\n\n### `clip_adapters.pl`\n\n#### Description\n\nRemoves 3' end adaptors from deep sequenced small RNAs. The script searches for\noccurrences of the six first nucleotides of the adapter in the read sequence,\nstarting after position 18 in the read sequence (so the shortest clipped read\nwill be 18 nts). If no matches to the first six nts of the adapter are\nidentified in a read, the 3' end of the read is searched for shorter matches to\nthe 5 to 1 first nts of the adapter.\n\n#### Input\n\n* A FASTA file with the deep sequencing reads and the adapter sequence (both in\n  RNA or DNA alphabet).\n\n#### Output\n\n* A FASTA file with the clipped reads.\n\nFASTA IDs are retained. If no matches to the adapter prefixes are identified in\na given read, the unclipped read is output.\n\n#### Example usage\n\n```sh\nclip_adapters.pl reads.fa TCGTATGCCGTCTTCTGCTTGT \u003e reads_clipped.fa\n```\n\n#### Notes\n\nIt is possible to clip adapters using more sophisticated methods.\nUsers are encouraged to test other methods with the miRDeep2 modules.\n\n---\n\n\n### `collapse_reads.pl`\n\n#### Description\n\nCollapses reads in the FASTA file to ensure that each sequence only occurs\nonce.\nTo indicate how many times reads the sequence represents, a suffix is added to\neach FASTA identifier. *E.g.* a sequence that represents ten reads in the data\nwill have the `_x10` suffix added to the identifier.\n\n#### Input\n\n* A FASTA file, either in standard format or in the collapsed suffix format.\n\n#### Output\n\n* A FASTA file in the collapsed suffix format.\n\n#### Options\n\n| option | description      |\n|--------|------------------|\n| `‑a`   | outputs progress |\n\n#### Example usage\n\n```sh\ncollapse_reads.pl reads.fa \u003e reads_collapsed\n```\n\n#### Notes\n\nSince the script reads all FASTA entries into a hash using the sequence as key,\nit can potentially use more than 3 GB memory when collapsing very big datasets,\n\\\u003e50 million reads. In this case, the user can partition the reads\n(for instance based on the 5' nucleotide), collapse separately and concatenate.\n\n---\n\n\n### `excise_precursors_iterative.pl`\n\n#### Description\n\nThis script is a wrapper for `excise_precursors.pl`, which it calls one or more\ntimes, incrementing the height of the read stack required for initiating\nexcision until the number of excised precursors falls below a given threshold.\n\n#### Input\n\n* The reference genome in FASTA format,\n* the mapped reads in `.arf` format,\n* a filename that the excised precursors will be written to, and\n* the maximal number of precursors that should be reported.\n\n#### Output\n\n## The excised precursors in FASTA format.\n\n#### Options\n\n| option | description                |\n|--------|----------------------------|\n| `‑a`   | Output progress to screen. |\n\n#### Example usage\n\n```sh\nexcise_precursors_iterative.pl genome.fa reads_vs_genome.arf \\\n  potential_precursors.fa 50000 -a\n```\n\n---\n\n\n### `excise_precursors.pl`\n\n#### Description\n\nExcises precursors from the genome using the mapped reads as guidelines.\n\n#### Input\n\n* The reference genome in FASTA format and\n* the mapped reads in `.arf` format.\n\n#### Output\n\n* The excised precursors in FASTA format.\n\n## Options\n\n| option         | description                                                |\n|----------------|------------------------------------------------------------|\n| `‑a \u003cinteger\u003e` | Only excise if the highest local read stack is `\u003cinteger\u003e` reads high (default 2). |\n| `‑b`           | Output progress to screen.                                 |\n\n## Example usage\n\n```sh\nexcise_precursors.pl genome.arf reads_vs_genome.arf -b\n```\n\n---\n\n\n### `fastaparse.pl`\n\n#### Description\n\nPerforms simple filtering of entries in a FASTA file.\n\n#### Input\n\n* A FASTA file.\n\n#### Ouput\n\n* A filtered FASTA file.\n\n#### Options\n\n| option     | description                                                    |\n|------------|----------------------------------------------------------------|\n| `‑a \u003cint\u003e` | only output entries where the sequence is minimum int nts long |\n| `‑b`       | remove all entries that have a sequence that contains letters other than `a`, `c`, `g`, `t`, `u`, `n`, `A`, `C`, `G`, `T`, `U`, or `N`. |\n| `‑s`       | output progress                                                |\n\n#### Example usage\n\n```sh\nfastaparse.pl reads.fa -a 18 -s \u003e reads_no_short.fa\n```\n\n---\n\n\n### `fastaselect.pl`\n\n#### Description\n\nThis script only prints out the FASTA entries that match an ID in the ID file.\n\n#### Input\n\n* A FASTA file and a file with IDs, one ID per line.\n\n#### Output\n\n* A FASTA file containing the FASTA entries that match an ID.\n\n#### Options\n\n| option | description                                                        |\n|--------|--------------------------------------------------------------------|\n| `‑a`   | only prints out entries that has an id that is not present in the ID file. |\n\n#### Example usage\n\n```sh\nfastaselect.pl reads.fa reads_select.ids \u003e reads_select.fa\n```\n\n---\n\n\n### `find_read_count.pl`\n\n#### Description\n\nScans a file searching for the suffixes that are generated by\n`collapse_reads.pl` (e.g. `_x10`).\nIt sums up the integer values in the suffixes and outputs the sum. If a given\nid occurs multiple times in the file, it will multi-count the integer value of\nthe ID. It will also only count the first integer occurrence in a given line.\n\n#### Input\n\n* Any file containing the suffixes that are generated by `collapse_reads.pl`.\n\nThis will typically be a FASTA file or a list of IDs.\n\n#### Output\n\n* The sum of integer values (the total read count).\n\n\n#### Example usage\n\n```sh\nfind_read_count.pl reads_collapsed.fa\n```\n\n---\n\n\n### `geo2fasta.pl`\n\n#### Description\n\nParses GSM format files into FASTA format.\n\n#### Input\n\n* GSM files in tabular format.\n\nThe first column should be sequences and the second column the number of times\nthe sequence occurs in the data.\n\n#### Output\n\n* A FASTA file, one sequence per line (the sequences are expanded).\n\n\n#### Example usage\n\n```sh\ngeo2fasta.pl GSM.txt \u003e reads.fa\n```\n\n---\n\n\n### `illumina_to_fasta.pl`\n\n#### Description\n\nParses `seq.txt` or `qseq.txt` output from the Solexa/Illumina platform to\nFASTA format.\n\n#### Input\n\n* A `seq.txt` or\n* `qseq.txt` file.\n\nBy default `seq.txt`.\n\n#### Output\n\n* A FASTA file, one entry for each line of `seq.txt`.\n\nThe entries are named `seq` plus a running number that is incremented by one\nfor each entry. Any `.` characters in the `seq.txt` file is substituted with an\n`N`.\n\n#### Options\n\n| option | description          |\n|--------|----------------------|\n| `‑a`   | format is `qseq.txt` |\n\n#### Example usage\n\n```sh\nillumina_to_fasta.pl s_1.qseq.txt -a \u003e reads.fa\n```\n\n---\n\n\n### `miRDeep2_core_algorithm.pl`\n\n#### Description\n\nFor each potential miRNA precursor input, the miRDeep2 core algorithm either\ndiscards it or assigns it a log-odds score that reflects the probability that\nthe precursor is a genuine miRNA.\n\n#### Input\n\nDefault input is\n\n* an ARF file with the read signatures and\n* an RNAfold output file with the structures of the potential miRNA precursors.\n\n#### Output\n\n* A .mrd file with all potential miRNA precursors that are scored.\n\n#### Options\n\n| option | description                                                        |\n|--------|--------------------------------------------------------------------|\n| `‑h`   | print this usage                                                   |\n| `‑s`   | FASTA file with reference mature miRNAs from one or more related species |\n| `‑t`   | print filtered                                                     |\n| `‑u`   | limited output (only ids)                                          |\n| `‑v`   | cut-off (default 1)                                                |\n| `‑x`   | sensitive option for Sanger sequences                              |\n| `‑y`   | file with randfold p-values                                        |\n| `‑z`   | consider Drosha processing                                         |\n\n#### Example usage\n\n```sh\nmiRDeep2_core_algorithm.pl signature.arf potential_precursors.str \\\n  -s miRBase_related_species.fa -y potential_precursors.rand \u003e output.mrd\n```\n\n#### Notes\n\nThe `-z` option has not been thoroughly tested.\n\n---\n\n\n### `parse_mappings.pl`\n\n#### Description\n\nPerforms simple filtering of entries in an `.arf` file.\n\n#### Input\n\nDefault input is\n\n* an `.arf` file.\n\n#### Output\n\n* A filtered `.arf` file.\n\n#### Options\n\n| option      | description                                                   |\n|-------------|---------------------------------------------------------------|\n| `‑a \u003cint\u003e`  | Discard mappings of edit distance higher than this            |\n| `‑b \u003cint\u003e`  | Discard mappings of read queries shorter than this            |\n| `‑c \u003cint\u003e`  | Discard mappings of read queries longer than this             |\n| `‑d \u003cfile\u003e` | Discard read queries not in this file                         |\n| `‑e \u003cfile\u003e` | Discard read queries in this file                             |\n| `‑f \u003cfile\u003e` | Discard reference dbs not in this file                        |\n| `‑g \u003cfile\u003e` | Discard reference dbs in this file                            |\n| `‑h`        | Discard remaining suboptimal mappings                         |\n| `‑i \u003cint\u003e`  | Discard remaining suboptimal mappings and discard any reads that have more remaining mappings than this |\n| `‑j`        | Remove any unmatched nts in the very 3' end                   |\n| `‑k`        | Output progress to standard output                            |\n\n#### Example usage\n\n```sh\nparse_mappings.pl reads_vs_genome.arf -a 0 -b 18 -c 25 -i 5 \\\n  \u003e reads_vs_genome_parsed.arf\n```\n\n---\n\n\n### `perform_controls.pl`\n\n#### Description\n\nPerforms a designated number of rounds of permuted controls (for details, see\nFriedländer et al., Nature Biotechnology, 2008).\n\n#### Input\n\nThe permutation controls estimate the number of false positives produced by a\n`miRDeep2_core_algorithm.pl` run.\nThe input to `perform_controls.pl` should be\n\n* a file containing the exact command line used to initiate the\n  `miRDeep2_core_algorithm.pl` run,\n* the structure file input to `miRDeep2_core_algorithm.pl`, and\n* the desired rounds of controls.\n\n#### Output\n\n* A file in `.mrd` format.\n\nThe output of each control run is separated by a line `permutation integer`.\nThe mean number of entries output by the control runs gives an estimate of the\nfalse positives produced. The further contents (besides the number of entries)\nof the `.mrd` output by `perform_controls.pl` is not biologically meaningful.\n\n#### Options\n\n| option | description               |\n|--------|---------------------------|\n| `‑a`   | Output progress to screen |\n\n#### Example usage\n\n```sh\nperform_controls.pl line potential_precursors.str 100 \\\n  \u003e output_controls.mrd\n```\n\n---\n\n\n### `permute_structure.pl`\n\n#### Description\n\nIn a file output by RNAfold, each entry can be partitioned into an 'id' part\nand an 'other' part, consisting of the dot-bracket structure, sequence, mfe\netc. This scripts reads all 'id' parts into a hash and pairs them with 'other'\nparts from random entries. This is used by the `perform_controls.pl` script.\n\n#### Input\n\n* An RNAfold output file.\n\n#### Output\n\n* An RNAfold output file with IDs moved to random entries.\n\n#### Example usage\n\n```sh\npermute_structure.pl potential_precursors.str \\\n  \u003e potential_precursors_permuted.str\n```\n\n---\n\n\n### `prepare_signature.pl`\n\n#### Description\n\nPrepares the signature file to be input to the `miRDeep2_core_algorithm.pl`\nscript.\n\n#### Input\n\n* A FASTA file with deep sequencing reads and\n* a FASTA file with precursors.\n\n#### Output\n\n* A signature file in `.arf` format.\n\n#### Options\n\n| option      | description                                                   |\n|-------------|---------------------------------------------------------------|\n| `‑a \u003cfile\u003e` | FASTA file with the sequences of known mature miRNAs for the species. These sequences will not influence the miRDeep scoring, but will subsequently make it easy to estimate sensitivity of the run. |\n| `‑b`        | Output progress to screen                                     |\n\n#### Example usage\n\n```sh\nprepare_signature.pl reads_collapsed.fa potential_precursors.fa \\\n  -a miRBase_this_species.fa \u003e signature.arf\n```\n\n---\n\n\n### `rna2dna.pl`\n\n#### Description\n\nSubstitutes `u`s and `U`s to `T`s.\nThis is useful since `bowtie` does not match `U`s to `T`s.\n\n#### Input\n\n* A FASTA file.\n\n#### Output\n\n* A substituted FASTA file.\n\n\n#### Example usage\n\n```sh\nrna2dna.pl reads_RNA_alphabet.fa \u003e reads_DNA_alphabet.fa\n```\n\n---\n\n\n### `select_for_randfold.pl`\n\n#### Description\n\nThis script identifies potential precursors whose structure is basically\nconsistent with Dicer recognition.\nSince running randfold is time-consuming, it is practical only to estimate\np-values for those potential precursors that actually fold into hairpin\nstructures.\n\n#### Input\n\n* An ARF file with the read signatures and\n* an RNAfold output file with the structures of the potential miRNA precursors.\n\n#### Output\n\n* A list of ids, separated by newlines.\n\n#### Example usage\n\n```sh\nselect_for_randfold.pl signature.arf potential_precursors.str \\\n  \u003e potential_precursors_for_randfold.ids\n```\n\n---\n\n\n### `survey.pl`\n\n#### Description\n\nSurveys miRDeep2 performance at score cut-offs from -10 to 10.\n\n#### Input\n\nDefault input is\n\n* a `.mrd` file output by the `miRDeep2_core_algorithm.pl` script.\n\n#### Output\n\n* A .csv file with performace statistics.\n\n#### Options\n\n| option      | description                                         |\n|-------------|-----------------------------------------------------|\n| `‑a \u003cfile\u003e` | file outputted by controls                          |\n| `‑b \u003cfile\u003e` | mature miRNA FASTA reference file for the species   |\n| `‑c \u003cfile\u003e` | signature file                                      |\n| `‑d \u003cint\u003e`  | read stack height necessary for triggering excision |\n\n#### Example usage\n\n```sh\nsurvey.pl output.mrd -a output_controls.mrd -b miRBase_this_species.fa \\\n  -c signature.arf -d 2 \u003e survey.csv\n```\n\n---\n\n\n### `convert_bowtie_output.pl`\n\n#### Description\n\nIt converts a `bowtie` `bwt` mapping file to a `mirdeep` `arf` file.\n\n#### Input\n\n* A file in `bwt` format.\n\n#### Output\n\n* A file in `mirdeep` `arf` format.\n\n---\n\n\n### `bwa_sam_converter.pl`\n\n#### Description\n\nIt converts a `bwa` `sam` mapping file to a `mirdeep` `arf` file.\n\n#### Input\n\n* A `bwa` created file in `sam` format.\n\n#### Output\n\n* A file in `mirdeep` `arf` format.\n\n---\n\n\n### `samFLAGinfo.pl`\n\n#### Description\n\nIt gives information about the `bwa` FLAG in a `bwa` created mapping file in\n`sam` format.\n\n#### Input\n\n* A FLAG number created by `bwa`.\n\n#### Output\n\n* Information about the alignment created by `bwa`.\n\n---\n\n\n### `clip_adapters.pl`\n\n#### Description\n\nRemoves 3' end adaptors from deep sequenced small RNAs.\nThe script searches for occurrences of the six first nucleotides of the adapter\nin the read sequence, starting after position 18 in the read sequence (so the\nshortest clipped read will be 18 nts). If no matches to the first six nts of\nthe adapter are identified in a read, the 3' end of the read is searched for\nshorter matches to the 5 to 1 first nts of the adapter.\n\n#### Input\n\n* A FASTA file with the deep sequencing reads and\n* the adapter sequence (both in RNA or DNA alphabet).\n\n#### Output\n\n* A FASTA file with the clipped reads.\n\nFASTA IDs are retained. If no matches to the adapter prefixes are identified in\na given read, the unclipped read is output.\n\n#### Example usage\n\n```sh\nclip_adapters.pl reads.fa TCGTATGCCGTCTTCTGCTTGT \u003e reads_clipped.fa\n```\n\n#### Notes\n\nIt is possible to clip adapters using more sophisticated methods. Users are\nencouraged to test other methods with the miRDeep2 modules.\n\n---\n\n\n### `sanity_check_genome.pl`\n\n#### Description\n\nIt checks the supplied genome FASTA file for its correctness.\nIdentifier lines are not allowed to contain whitespaces and must be unique.\nSequence lines are not allowed to contain characters others than\n`A`, `C`, `G`, `T`, `N`, `a`, `c`, `g`, `t`, or `n`.\n\n#### Input\n\n* A genome file in FASTA format\n\n---\n\n\n### `sanity_check_mapping_file.pl`\n\n#### Description\n\nIt checks the supplied mapping file for its correctness.\nEach line in the file must be in the ARF format.\n\n#### Input\n\n* A mapping file in ARF format.\n\n---\n\n\n### `sanity_check_mature_ref.pl`\n\n#### Description\n\nIt checks the supplied `mature_miRNA` FASTA file for its correctness.\nIdentifier lines are not allowed to contain whitespaces and must be unique.\nSequence lines are not allowed to contain characters others than `A`, `C`, `G`,\n`T`, `N`, `a`, `c`, `g`, `t`, or `n`.\n\n#### Input\n\n* A mature miRNA file in FASTA format.\n\n---\n\n\n### `sanity_check_reads_ready.pl`\n\n#### Description\n\nIt checks the supplied reads file for its correctness.\nEach identifier line must have the format of '\u003ename_uniqueNumber_xnumber` e.g.\n`\u003exyz_1_x20`. See also file `format_descriptions.txt` for more detailed\ninformations.\n\n#### Input\n\n* A mapping file in ARF format.\n\n---\n\n\n### `extract_miRNAs.pl`\n\n#### Description\n\nExtracts mature and precursor sequences from miRBase fasta files for \nspecies of interest.\n\n#### Input\n\n* A fasta file from miRBAase\n* One or more species three letter code abbreviations\n\n#### Output \n* A fasta file in a proper format usable by quantifier.pl and miRDeep2.pl.\n* Multiline sequences from input files are put on a single line and MacOS and Windows linebreaks/carriage returns are removed\n\n#### Example usage\n\n```sh\nextract_miRNAs.pl mature_miRBase.fa hsa \u003e mature_hsa.fa\nextract_miRNAs.pl hairpin_miRBase.fa hsa \u003e hairpin_hsa.fa\nextract_miRNAs.pl mature_miRBase.fa mmu,chi \u003e mature_other.fa\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frajewsky-lab%2Fmirdeep2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frajewsky-lab%2Fmirdeep2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frajewsky-lab%2Fmirdeep2/lists"}