{"id":21845884,"url":"https://github.com/abhijeetsingh1704/dupremover","last_synced_at":"2025-08-17T10:39:43.635Z","repository":{"id":68618845,"uuid":"264936573","full_name":"abhijeetsingh1704/DupRemover","owner":"abhijeetsingh1704","description":"Removes duplicate sequences in multifasta file","archived":false,"fork":false,"pushed_at":"2022-05-27T14:53:52.000Z","size":51,"stargazers_count":2,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-02T03:43:13.245Z","etag":null,"topics":["fasta","fasta-format","fasta-sequences","unique"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abhijeetsingh1704.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-18T12:44:14.000Z","updated_at":"2024-12-13T08:24:28.000Z","dependencies_parsed_at":"2023-09-13T17:35:01.731Z","dependency_job_id":null,"html_url":"https://github.com/abhijeetsingh1704/DupRemover","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abhijeetsingh1704/DupRemover","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijeetsingh1704%2FDupRemover","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijeetsingh1704%2FDupRemover/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijeetsingh1704%2FDupRemover/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijeetsingh1704%2FDupRemover/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abhijeetsingh1704","download_url":"https://codeload.github.com/abhijeetsingh1704/DupRemover/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhijeetsingh1704%2FDupRemover/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270837408,"owners_count":24654378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-17T02:00:09.016Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fasta","fasta-format","fasta-sequences","unique"],"created_at":"2024-11-27T23:11:44.502Z","updated_at":"2025-08-17T10:39:43.616Z","avatar_url":"https://github.com/abhijeetsingh1704.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \"DupRemover\" - Duplicate remover\n### version 1.0.3\nRemoves duplicate sequences in multifasta file.\n\n-------------------------\n\nDupRemover finds duplicate sequences and keeps unique sequence while concatenating all the fasta headers together in a nucleotide or amino acid multifasta file.\n\n## Dependencies\nBiopython \u003e=1.78\n\nDupRemover can install biopython\u003e=1.78 package, if biopython is not installed.\n\nPlease upgrade to biopython\u003e=1.78 if older version is installed\n\n## Help\n```\npython3 DupRemover.py -h\n```\n```\nusage: DupRemover.py [-h] -i INPUT [-o OUTPUT] [-v Y/y or N/n] [-V]\n\nRemoves duplicate sequences in multifasta file, and append fasta header to unique sequence\n\nCitation: Singh, Abhijeet. 2020. DupRemover: A Simple Program to Remove Duplicate Sequences from Multi-Fasta File\nGitHub: https://github.com/abhijeetsingh1704/DupRemover; DOI: 10.13140/RG.2.2.23842.86724.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INPUT, --input INPUT\n                        input fasta file\n  -o OUTPUT, --output OUTPUT\n                        output fasta file (default: Uniq_\u003cinput_fasta_file\u003e)\n  -v Y/y or N/n, --verbose Y/y or N/n\n                        print progress to the terminal (default: verbose)\n  -V, --version         show program's version number and exit\n```\n  \n\n## Usage\npython3 DupRemover.py /path/to/input_file  /path/to/output_file\n  \n  ```\n  python3 DupRemover.py -i Mixed_sequences.fasta -o Unique_sequences.fasta\n  ```\nexample output\n```\n[Program]       : DupRemover\n[Date]          : 2021-03-27 14:40:21\n[Input file]    : Mixed_sequences.fasta\n[Output file]   : Unique_sequences.fasta\n-------------------------\nAHI13756.1 FthFS, partial [uncultured Arthrobacter sp.] =|= AHI13756.1 FthFS, partial [uncultured Arthrobacter sp.] =|= AHI13756.1 FthFS, partial [uncultured Arthrobacter sp.]\nLRNIVIGLGGPTEGVPREAGFEITVASEVMAVFCLATGLEDLRTRLGRMTIGYTYDKKPVTVDDLGAAGAMTTLLKDAIKPNLVQTIGGTPAFIHGGPFANIAHGCNSAIATNTARSLAEVVVTEAGFGADLGAEKFMDIKARYAGCDPSAVVIVATIRALKMHGGVAKDQLKGENVQAVRDGMVNLARHASNVRKFGIHPVIAVNKFATDTADELAVVTEWAAENNIECAVADVWGQGGAGAGDLAAAVLRAIEAPSDFAPLYELEKPVEEKILTVVKEIYGGTEVDYTPAAKRVLEQIHANGWDNLPV\n\nAHI13755.1 FthFS, partial [uncultured bacterium]\nLGIDPRRITFRRVMDMNDRSLRHIVVGLGGPGQGTVREDGFDITVASEIMAVFCLATDIEDLTARLARITVGYTWDRRPVTVADLKVEGALALLLKDALKPNLVQTIAGTPALVHGGPFANIAHGCNSVIATTLGRDLADVVVTEAGFGADLGAEKYMDITSRVADVAPDAVVVVATIRALKMHGGVPRERLDEPNLAGLEAGTANLQRHVRNLGKFGFSPVVAINRFTTDTAEEIEWLLHWCSEEGVDAAVADVWAQGGGGPGGDDLAAKVLAALKRNVEFKPLYPLQMGVAEKIRVVVREIYGADDVEFSVPALRRLEEIEANGWDSVPV\n\nAHI13754.1 FthFS, partial [uncultured bacterium]\nITSSHNLLSALVDNHIHWGGEPKLDAVRTSWRRVMDMNDRSLRNIVSGLGGPGNGSPSETGFDITVASEVMAILCLATDAEDLEARLSRIIVGYTREKKAVTAADIKATGAMMALLRDAMLPNLVQTLENNPCLVHGGPFANIAHGCNSVIATRAALKMANYVVTEAGFGADLGAEKFLNIKCRQAGLA\n\n-------------------------\n[input seq]     : 5\n[Output seq]    : 3\n[Duplicates]    : 2\n```\n\n#### Citation\nIf you use DupRemover, please cite as:\n\nSingh A. DupRemover: a simple program to remove duplicate sequences from multi-fasta file. ResearchGate 2020. https://doi.org/10.13140/RG.2.2.23842.86724; Available at https://github.com/abhijeetsingh1704/Duplicate-remover\n\n\n#### LICENSE\nDuplicate-remover is licensed under the\nGNU General Public License v3.0\nPermissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhijeetsingh1704%2Fdupremover","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabhijeetsingh1704%2Fdupremover","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhijeetsingh1704%2Fdupremover/lists"}