https://github.com/veupathdb/repeat-masker-nextflow
nextflow workflow to modernize current repeatMasker workflow steps
https://github.com/veupathdb/repeat-masker-nextflow
Last synced: 4 months ago
JSON representation
nextflow workflow to modernize current repeatMasker workflow steps
- Host: GitHub
- URL: https://github.com/veupathdb/repeat-masker-nextflow
- Owner: VEuPathDB
- License: apache-2.0
- Created: 2022-04-13T16:46:53.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2026-02-13T20:44:58.000Z (4 months ago)
- Last Synced: 2026-02-14T03:36:45.893Z (4 months ago)
- Language: Perl
- Size: 2.67 MB
- Stars: 1
- Watchers: 15
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
THIS REPO IS 🚧 UNDER CONSTRUCTION 🚧 and NOT Used in ANY production CODE
# Nextflow Conversion of repeatMaskerTask.pm
***
RepeatMasker
***
```mermaid
flowchart TD
p0((Channel.fromPath))
p1([splitFasta])
p2[repeatMasker:runRepeatMasker]
p3[repeatMasker:cleanSequences]
p4([collectFile])
p5(( ))
p6([collectFile])
p7(( ))
p0 --> p1
p1 -->|seqs| p2
p2 --> p3
p3 --> p4
p3 --> p6
p4 --> p5
p6 --> p7
```
### Get Started
* Install Nextflow
`curl https://get.nextflow.io | bash`
* Run the script
`nextflow run VEuPathDB/tRNAScan -with-trace -c -r main`
### Description of nextflow configuration parameters:
| param | value type | description |
| ------------- | ------------- | ------------ |
| inputFilePath | string| Path to the input fasta file. |
| trimDangling | boolean | Would you like to remove sections of masked repeats or not? |
| dangleMax | integer | Number of nucleotides required between sections of repeats to stop removal process. |
| outputFileName | string | How you would like the output file named. |
| outputDir | string | Where you would the the output file to be stored. |
| rmParams | string | Additional arguments to be passed to RepeatMasker |
| errorFileName | string | How you would like the error file named |
| libraryPath | string | rmParams "-species" will not work due to dfam files not containing full library. These would be to large to include in container. Instead, I will be adding a parameter, libraryPath. This will be used to set an environment variable LIBDIR, which repeatMasker will use. If you are running this locally, or only want to use the curated versions of the databases, you can just leave it as /opt/RepeatMasker/Libraries. If you want to data that does not come standard, please supply the path to the RepeatMasker libraries. |
- **trimDangling Explained**: The trimDangling process will move through a masked sequence, once from the forward and once from the reverse directions, while looking for sections of 9 or more 'N's. dangleMax specifies the minimum number of nucleotides required to halt this process. Once this is done in the forward direction, it will begin in the reverse.
- **Example**:
If dangleMax was set to 9...
AAAAAANNNNNNNNNAAANNNNNNNNNAAAAAAAAANNNNNNNNNNNNAAAAAAAAAAAAAAANNNNNNAAAAAAAAANNNNNNNNNAAAAAANNNNNNNNNAAAAAA
6 9 3 9 9 12 15 6 9 9 6 9 6
After the forward pass ...
AAAAAAAAANNNNNNNNNNNNAAAAAAAAAAAAAAANNNNNNAAAAAAAAANNNNNNNNNAAAAAANNNNNNNNNAAAAAA
9 12 15 6 9 9 6 9 6
After the reverse pass...
AAAAAAAAANNNNNNNNNNNNAAAAAAAAAAAAAAANNNNNNAAAAAAAAA
9 12 15 6 9