https://github.com/applicativesystem/pacbio-nanopore-polyatgc-remove
polyATGC pacbio/oxford nanopore estimator
https://github.com/applicativesystem/pacbio-nanopore-polyatgc-remove
bioinformatics genome-analysis genome-annotation pacbio-data pacbio-hifi-sequencing-reads pacbio-sequencing pacbiohifi
Last synced: 7 months ago
JSON representation
polyATGC pacbio/oxford nanopore estimator
- Host: GitHub
- URL: https://github.com/applicativesystem/pacbio-nanopore-polyatgc-remove
- Owner: applicativesystem
- License: mit
- Created: 2024-08-20T19:09:34.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-08-20T19:47:45.000Z (10 months ago)
- Last Synced: 2024-11-10T20:18:16.930Z (8 months ago)
- Topics: bioinformatics, genome-analysis, genome-annotation, pacbio-data, pacbio-hifi-sequencing-reads, pacbio-sequencing, pacbiohifi
- Language: Python
- Homepage:
- Size: 1.95 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pacbio-nanopore-polyATGC-trimmer
- A regular expression based polyATGC trimmer from the long reads or the fastq reads extremely fast and returns a fasta and also a dataframe for the sequence classification and this takes the current as 10 continuous bases.
- A sample on how to run the code is given below. if you are using the long reads for machine learning then it directly returns a dataframe for ingestion to machine leaning.```
longreadpolyATGCtrimmer("/Users/gauravsablok/Desktop/CodeCheck/fasta_sample_datasets/test_sample_short.fasta",
polyATGCstretch_type="G")
ids sequences stretch_count trimmed_sequences_new
0 >1 GCAGCGTACGTGGTTGGATCAATTAGTGGGGCACATTTGAATCCAG... [27, 30] GCAGCGTACGTGGTTGGATCAATTAGTGCACATTTGAATCCAGCTT...
1 >2 GCAGCGTACGTGGTTGGATCAATTAGTGGGGCACATTTGAATCCAG... [27, 30] GCAGCGTACGTGGTTGGATCAATTAGTGCACATTTGAATCCAGCTT...
2 >3 GCAGCGTACGTGGTTGGATCAATTAGTGGGGCACATTTGAATCCAG... [27, 30] GCAGCGTACGTGGTTGGATCAATTAGTGCACATTTGAATCCAGCTT...
3 >4 CGAAAATTACTTCGGTACAATGCTTGTATACATGGGCAAAGCACAC... [33, 36] CGAAAATTACTTCGGTACAATGCTTGTATACATCAAAGCACACGGT...
```
Gaurav Sablok \
Academic Staff Member \
University of Potsdam \
Potsdam,Germany