https://github.com/nylander/degap_fasta_alignment
Remove columns in large fasta formatted alignments containing all-missing data.
https://github.com/nylander/degap_fasta_alignment
Last synced: 11 months ago
JSON representation
Remove columns in large fasta formatted alignments containing all-missing data.
- Host: GitHub
- URL: https://github.com/nylander/degap_fasta_alignment
- Owner: nylander
- License: mit
- Created: 2025-02-25T16:28:10.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-31T15:48:12.000Z (about 1 year ago)
- Last Synced: 2025-03-31T17:01:06.562Z (about 1 year ago)
- Language: C
- Size: 15.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog
- License: LICENSE
- Authors: AUTHORS
Awesome Lists containing this project
README
# DFA - Degap Fasta Alignments
- Last modified: mån mar 31, 2025 05:46
- Sign: JN
## Description
Remove columns in large fasta-formatted alignments containing all-missing data.
Designed to make minimal memory footprint.
## Installation
See [INSTALL](INSTALL)
## Usage
$ dfa [options] infile(s).
## Options:
- `-h` show help
- `-V` print version
- `-v` be verbose
- `-m missing_chars` characters treated as missing (default: `Nn?Xx-`)
- `-w wrap_length` wrap sequences to length `wrap_length` (default: `60`)
`infile` should be in fasta format.
## Examples
Input example ([in.fas](data/in.fas))
$ cat data/in.fas
>Apa
ANN-C
>Bpa
ANT-C
$ dfa data/in.fas
>Apa
ANC
>Bpa
ATC
$ dfa -m - data/in.fas
>Apa
ANNC
>Bpa
ANTC
## Scripts
### [`remove_empty_alignment_columns.py`](scripts/remove_empty_alignment_columns.py)
Remove columns from multiple sequence alignment containing all missing data.
Written in python with low memory footprint in mind.
See output from `remove_empty_alignment_columns.py -h` for usage.
### [`degap_fasta_alignment.pl`](scripts/degap_fasta_alignment.pl)
Script written i perl for removing gaps (missing) characters in fasta
alignments. The original script is taken from [fastagap
repository](https://github.com/nylander/fastagap/blob/main/degap_fasta_alignment.pl).
This script have some extra options for removing gaps (missing data). Note,
however, the script can not handle very large alignments.
See output from `degap_fasta_alignment.pl -h` for usage.
## License and copyright
Copyright 2025 Johan Nylander
[MIT License](LICENSE)