https://github.com/opengene/cfdnapattern
Pattern Recognition for Cell-free DNA
https://github.com/opengene/cfdnapattern
bioinformatics cfdna ngs pattern
Last synced: 3 months ago
JSON representation
Pattern Recognition for Cell-free DNA
- Host: GitHub
- URL: https://github.com/opengene/cfdnapattern
- Owner: OpenGene
- License: mit
- Created: 2016-07-28T03:09:46.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2018-08-03T03:40:52.000Z (about 7 years ago)
- Last Synced: 2025-03-24T08:42:24.942Z (7 months ago)
- Topics: bioinformatics, cfdna, ngs, pattern
- Language: Python
- Size: 1.15 MB
- Stars: 58
- Watchers: 14
- Forks: 21
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CfdnaPattern
Pattern Recognition for Cell-free DNA# Predict a fastq is cfdna or not
```shell
# predict a single file
python predict.py# predict files
python predict.py ...# predict files with wildcard
python predict.py *.fq
```***warning: this tool doesn't work for trimmed fastq***
## prediction output
For each file given in the command line, this tool will output a line `: `, like
```
cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R1_001.fastq.gz
cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-cfdna-001_S1_R2_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R1_001.fastq.gz
not-cfdna: /fq/160220_NS500713_0040_AHVNG2BGXX/20160220-gdna-002_S2_R2_001.fastq.gz
```
Add `-q` or `--quite` to enable quite output mode, in which it will only output:
* a file with name of `cfdna`, but prediction is `not-cfdna`
* a file without name of `cfdna`, but prediction is `cfdna`# Train a model
This tool has a pre-trained model (`cfdna.model`), which can be used for prediction. But you still can train a model by yourself.
* prepare/link all your fastq files in some folder
* for files from `cfdna`, include `cfdna` (case-insensitive) in the filename, like `20160220-cfdna-015_S15_R1_001.fq`
* for files from `genomic DNA`, include `gdna` (case-insensitive) in the filename, like `20160220-gdna-002_S2_R1_001.fq`
* for files from `FFPE DNA`, include `ffpe` (case-insensitive) in the filename, like `20160123-ffpe-040_S0_R1_001.fq`
* run:
```shell
python train.py /fastq_folder/*.fq
```# Citation
If you used CfdnaPattern for your publication, please cite: https://doi.org/10.1109/TCBB.2017.2723388Full options:
```shell
python training.py [options]Options:
--version show program's version number and exit
-h, --help show this help message and exit
-m MODEL_FILE, --model=MODEL_FILE
specify which file to store the built model.
-a ALGORITHM, --algorithm=ALGORITHM
specify which algorithm to use for classfication,
candidates are svm/knn/rbf/rf/gnb/benchmark, rbf means
svm using rbf kernel, rf means random forest, gnb
means Gaussian Naive Bayes, benchmark will try every
algorithm and plot the score figure, default is knn.
-c CFDNA_FLAG, --cfdna_flag=CFDNA_FLAG
specify the filename flag of cfdna files, separated by
semicolon. default is: cfdna
-o OTHER_FLAG, --other_flag=OTHER_FLAG
specify the filename flag of other files, separated by
semicolon. default is: gdna;ffpe
-p PASSES, --passes=PASSES
specify how many passes to do training and validating,
default is 10.
-n, --no_cache_check if the cache file exists, use it without checking the
identity with input files
```