https://github.com/christopher-hakkaart/testdata

Test data for ont
https://github.com/christopher-hakkaart/testdata

Last synced: 3 months ago
JSON representation

Test data for ont

Host: GitHub
URL: https://github.com/christopher-hakkaart/testdata
Owner: christopher-hakkaart
Created: 2021-12-08T08:20:18.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-12-14T01:07:36.000Z (over 2 years ago)
Last Synced: 2025-01-13T12:46:40.074Z (5 months ago)
Size: 41.7 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Test data origins

## Random gene of interest with known SV

```bash
EDIL3
chr5:83940554-84384880 (-)
id = NM_005711.5
```

## Make bed file

``` bash
touch GRCh38_EDIL3.bed
echo -e "chr5\t83940554\t84384880\tEDIL3\t0\t-\n" >> GRCh38_EDIL3.bed
```

## Find reads mapped to EDIl3 convert to fastq, and gzip

```bash
samtools view -b A04.bam "chr5:83940554-84384880" > EDIL3.bam```
samtools index EDIL3.bam
samtools fastq EDIL3.bam > NA12878_DNA.fastq
gzip NA12878_DNA.fastq
```

## Make new reference genome

``` bash
bedtools getfasta -name -fi Homo_sapiens_assembly38.fasta -bed GRCh38_EDIL3.bed > GRCh38_EDIL3.fa
samtools faidx GRCh38_EDIL3.fa
```

## Test bench, truth and high confidence regions

Copied data in [hap.py](https://github.com/Illumina/hap.py#happy) example.

- NA12878_chr21.vcf.gz
- NA12878_chr21.vcf.gz.tbi
- PG_Conf_chr21.bed.gz
- PG_Conf_chr21.bed.gz.tbi
- PG_NA12878_chr21.vcf.gz
- PG_NA12878_chr21.vcf.gz.tbi

## test_benchmark

The test_benchmark?.csv files in this folder and are subject to change.
They are replicated dummy files to test functionality.

## Notes for future development
TODO: Add second chromosome and reads to use as base for development of variant calling for each chromosome separately.

TODO: Fix error that occurs with default chromosome naming from bedtools – temporary solution is to arbitrarily name chromosome removing characters that were causing the problem.

TODO: Add notes on how SV data were derived!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/christopher-hakkaart/testdata

Awesome Lists containing this project

README