https://github.com/abhijeetsingh1704/dupremover
Removes duplicate sequences in multifasta file
https://github.com/abhijeetsingh1704/dupremover
fasta fasta-format fasta-sequences unique
Last synced: about 2 months ago
JSON representation
Removes duplicate sequences in multifasta file
- Host: GitHub
- URL: https://github.com/abhijeetsingh1704/dupremover
- Owner: abhijeetsingh1704
- License: gpl-3.0
- Created: 2020-05-18T12:44:14.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-05-27T14:53:52.000Z (over 3 years ago)
- Last Synced: 2025-04-02T03:43:13.245Z (6 months ago)
- Topics: fasta, fasta-format, fasta-sequences, unique
- Language: Python
- Size: 49.8 KB
- Stars: 2
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# "DupRemover" - Duplicate remover
### version 1.0.3
Removes duplicate sequences in multifasta file.-------------------------
DupRemover finds duplicate sequences and keeps unique sequence while concatenating all the fasta headers together in a nucleotide or amino acid multifasta file.
## Dependencies
Biopython >=1.78DupRemover can install biopython>=1.78 package, if biopython is not installed.
Please upgrade to biopython>=1.78 if older version is installed
## Help
```
python3 DupRemover.py -h
```
```
usage: DupRemover.py [-h] -i INPUT [-o OUTPUT] [-v Y/y or N/n] [-V]Removes duplicate sequences in multifasta file, and append fasta header to unique sequence
Citation: Singh, Abhijeet. 2020. DupRemover: A Simple Program to Remove Duplicate Sequences from Multi-Fasta File
GitHub: https://github.com/abhijeetsingh1704/DupRemover; DOI: 10.13140/RG.2.2.23842.86724.optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
input fasta file
-o OUTPUT, --output OUTPUT
output fasta file (default: Uniq_)
-v Y/y or N/n, --verbose Y/y or N/n
print progress to the terminal (default: verbose)
-V, --version show program's version number and exit
```
## Usage
python3 DupRemover.py /path/to/input_file /path/to/output_file
```
python3 DupRemover.py -i Mixed_sequences.fasta -o Unique_sequences.fasta
```
example output
```
[Program] : DupRemover
[Date] : 2021-03-27 14:40:21
[Input file] : Mixed_sequences.fasta
[Output file] : Unique_sequences.fasta
-------------------------
AHI13756.1 FthFS, partial [uncultured Arthrobacter sp.] =|= AHI13756.1 FthFS, partial [uncultured Arthrobacter sp.] =|= AHI13756.1 FthFS, partial [uncultured Arthrobacter sp.]
LRNIVIGLGGPTEGVPREAGFEITVASEVMAVFCLATGLEDLRTRLGRMTIGYTYDKKPVTVDDLGAAGAMTTLLKDAIKPNLVQTIGGTPAFIHGGPFANIAHGCNSAIATNTARSLAEVVVTEAGFGADLGAEKFMDIKARYAGCDPSAVVIVATIRALKMHGGVAKDQLKGENVQAVRDGMVNLARHASNVRKFGIHPVIAVNKFATDTADELAVVTEWAAENNIECAVADVWGQGGAGAGDLAAAVLRAIEAPSDFAPLYELEKPVEEKILTVVKEIYGGTEVDYTPAAKRVLEQIHANGWDNLPVAHI13755.1 FthFS, partial [uncultured bacterium]
LGIDPRRITFRRVMDMNDRSLRHIVVGLGGPGQGTVREDGFDITVASEIMAVFCLATDIEDLTARLARITVGYTWDRRPVTVADLKVEGALALLLKDALKPNLVQTIAGTPALVHGGPFANIAHGCNSVIATTLGRDLADVVVTEAGFGADLGAEKYMDITSRVADVAPDAVVVVATIRALKMHGGVPRERLDEPNLAGLEAGTANLQRHVRNLGKFGFSPVVAINRFTTDTAEEIEWLLHWCSEEGVDAAVADVWAQGGGGPGGDDLAAKVLAALKRNVEFKPLYPLQMGVAEKIRVVVREIYGADDVEFSVPALRRLEEIEANGWDSVPVAHI13754.1 FthFS, partial [uncultured bacterium]
ITSSHNLLSALVDNHIHWGGEPKLDAVRTSWRRVMDMNDRSLRNIVSGLGGPGNGSPSETGFDITVASEVMAILCLATDAEDLEARLSRIIVGYTREKKAVTAADIKATGAMMALLRDAMLPNLVQTLENNPCLVHGGPFANIAHGCNSVIATRAALKMANYVVTEAGFGADLGAEKFLNIKCRQAGLA-------------------------
[input seq] : 5
[Output seq] : 3
[Duplicates] : 2
```#### Citation
If you use DupRemover, please cite as:Singh A. DupRemover: a simple program to remove duplicate sequences from multi-fasta file. ResearchGate 2020. https://doi.org/10.13140/RG.2.2.23842.86724; Available at https://github.com/abhijeetsingh1704/Duplicate-remover
#### LICENSE
Duplicate-remover is licensed under the
GNU General Public License v3.0
Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.