{"id":19857009,"url":"https://github.com/griffithlab/htlv_integration_sites","last_synced_at":"2025-02-28T21:45:32.777Z","repository":{"id":258844080,"uuid":"875786721","full_name":"griffithlab/htlv_integration_sites","owner":"griffithlab","description":"HTLV-1 integration site analysis notes and scripts","archived":false,"fork":false,"pushed_at":"2025-02-18T19:28:21.000Z","size":701,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-18T20:32:03.524Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/griffithlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-20T20:27:39.000Z","updated_at":"2025-02-18T19:28:25.000Z","dependencies_parsed_at":"2025-01-28T17:31:14.787Z","dependency_job_id":"11b4e6d1-701f-4faf-b7de-3b0a1bc71779","html_url":"https://github.com/griffithlab/htlv_integration_sites","commit_stats":null,"previous_names":["griffithlab/htlv_integration_sites"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fhtlv_integration_sites","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fhtlv_integration_sites/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fhtlv_integration_sites/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/griffithlab%2Fhtlv_integration_sites/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/griffithlab","download_url":"https://codeload.github.com/griffithlab/htlv_integration_sites/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241246113,"owners_count":19933299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T14:17:11.774Z","updated_at":"2025-02-28T21:45:32.770Z","avatar_url":"https://github.com/griffithlab.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"### Example of an HTLV-1 integration site analysis\n\nShort hand sample names: CTCF-1, CTCF-7, CTCF-8, P12-10B, P12-14, P12-5, P12-8\n\n#### High level questions\n\nWhat are the experimental details here?  Humanized mice (humanized how?) are infected with different strains of HTLV-1?  Four different strains here?  And we are looking for genome integrations in mice cells? or human cells?   How were the cells obtained for genomic DNA isolation (is this just from blood?).  The goal here is to identify the viral integration sites and quantify them to assess clonality?  Is there an expectation for degree of clonality we might observe.  Are we expecting to see many different unique integration sites in each sample?\n\nBrief answers:\nGenomic DNA was isolated from humanized mouse spleen that was infected with HTLV-1 p12(wt control) or CTCF mutant virus. p12-10B, p12-14 and CTCF-7, CTCF-8 are mouse ID numbers. We want to quantify the viral integration sites in infected human T cells to assess clonality. I expect to see many unique integration sites but don’t know what kinds of clonality that would be observed.\n\nCD34+ cells were injected in liver at 1d of life. Infected with HTLV. 2 strains – p12 and CTCF.  Analysis focused on human cell DNA. Samples were obtained from spleen. The goal here is to identify the viral integration sites and quantify them to assess clonality. Using Gini index value. \n\n#### Samples in each batch\n\n- Batch 1: CTCF-7, CTCF-8, P12-10B,P12-14\n- Batch 2: CTCF-1, CTCF-3, P12-5, P12-8\n\n#### Set ENVs\n\nIf needed update the following in `envs.txt` so they can be sourced when needed\n```bash\nexport WORKING_DIR=/storage1/fs1/mgriffit/Active/griffithlab/adhoc/ratner_p01/htlv_integration_sites\n\nexport FASTQ_NAMES=(\"Ratner_CTCF-1_SIC_934_196_CGTATCTCAA_AATACTAATA_S156_\" \"Ratner_CTCF-3_SIC_935_196_GTCCTGCCGA_AATACTAATA_S157_\" \"Ratner_CTCF-7_SIC_934_SIC2_Ratner_196_CGTATCTCA_AATACTAATA_S2_\" \"Ratner_CTCF-8_SIC_935_SIC2_Ratner_196_GTCCTGCCG_AATACTAATA_S3_\" \"Ratner_P12-10B_SIC_936_SIC2_Ratner_196_CCGGGACAC_AATACTAATA_S4_\" \"Ratner_P12-14_SIC_937_SIC2_Ratner_196_GGCTGGGAT_AATACTAATA_S5_\" \"Ratner_P12-5_SIC_936_196_CCGGGACACA_AATACTAATA_S158_\" \"Ratner_P12-8_SIC_937_196_GGCTGGGATA_AATACTAATA_S159_\")\n\nexport PAIRS=(\"R1\" \"R2\")\n\nexport SEQS=(\"TTAGTACACA\" \"AATCATGTGT\" \"TGACAATGAC\" \"ACTGTTACTG\")\n```\n\n#### Download the data\n\n```bash\nsh $WORKING_DIR/git/htlv_integration_sites/scripts/download_raw_data.sh\n\n```\n \n#### Get the file base names:\n\n```bash\ncd $WORKING_DIR\nls -1 fastqs/* | perl -ne 'chomp; if ($_ =~ /(.*)\\_\\S+\\_\\S+\\.fastq\\.gz$/){print \"$1_\\n\"}' | sort | uniq -c \nls -1 fastqs/*| perl -ne 'chomp; if ($_ =~ /(.*)\\_\\S+\\_\\S+\\.fastq\\.gz$/){print \"$1_\\n\"}' | sort | uniq \n\n```\n\n#### Investigate the four supplies possible integration characteristic sequences:\nTTAGTACACA / AATCATGTGT\nTGACAATGAC / ACTGTTACTG\n\n```bash\n\ncd $WORKING_DIR\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n  for PAIR in \"${PAIRS[@]}\"; do\n    for SEQ in \"${SEQS[@]}\"; do\n      ANSWER=$(zcat fastqs/${FASTQ_NAME}${PAIR}_001.fastq.gz | awk 'NR % 4 == 2' | grep $SEQ | wc -l)\n      echo \"$FASTQ_NAME $PAIR $SEQ $ANSWER\"\n    done\n  done\ndone\n```\n\nBased on this analysis it seems that for these data in the RAW read sequences we only really see the \"TTAGTACACA\" sequence and only in Read 1 files\n\n#### Create unique read lists of these read identities and store them for later use\n\n```bash\ncd $WORKING_DIR\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    echo -e \"\\nProcessing FASTQ: $FASTQ_NAME (R1 only)\"\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    echo \"Will name output using sample name: $SAMPLE\"\n    zcat fastqs/${FASTQ_NAME}R1_001.fastq.gz | awk 'NR % 4 == 1 {read_name = substr($1, 2)} NR % 4 == 2 {print read_name, $0}' | grep -P 'TTTAGTACACA' | cut -f 1 -d ' ' | sort | uniq \u003e readlists/${SAMPLE}_ltr_integration_seq_read_ids.txt\ndone\n```\n\n#### Details of reference sequences and alignments produced\n\nGRCh38 reference copied from: `/storage1/fs1/bga/Active/gmsroot/gc2560/core/GRC-human-build38_human_95_38_U2AF1_fix/all_sequences.fa`\nHTLV-1 reference obtained from: `https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/863/585/GCF_000863585.1_ViralProj15434/GCF_000863585.1_ViralProj15434_genomic.fna.gz`\n\nGRCh38 and HTLV-1 references catted together and BWA index and alignment done with BWA version 0.7.17-r1198-dirty (bryanfisk/bwa:latest)\n\n```bash\nisub -i 'bryanfisk/bwa:latest' -m 32 -n 8\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\ncd $WORKING_DIR\n\necho \"${FASTQ_NAMES[@]}\"\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    echo -e \"\\nProcessing FASTQ: $FASTQ_NAME (R1 and R2)\"\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    echo \"Will name output using sample name: $SAMPLE\"\n    /usr/local/bwa/bwa mem -K 20000000 -t 8 -Y $WORKING_DIR/references/GRCh38+HTLV-1.fa $WORKING_DIR/fastqs/${FASTQ_NAME}R1_001.fastq.gz $WORKING_DIR/fastqs/${FASTQ_NAME}R2_001.fastq.gz | samtools view -o $WORKING_DIR/bams/${SAMPLE}.bam -Shb /dev/stdin\ndone\n\nexit\n```\n\n#### Sorting and index BAMs\nAlignments converted to bam, sorted, and indexed with samtools version 1.11\n\n```bash\nisub -i 'bryanfisk/bwa:latest' -m 32 -n 2\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\ncd $WORKING_DIR\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    echo \"Will name output using sample name: $SAMPLE\"\n    samtools sort -o $WORKING_DIR/bams/${SAMPLE}.sorted.bam -O BAM $WORKING_DIR/bams/${SAMPLE}.bam\n    samtools index $WORKING_DIR/bams/${SAMPLE}.sorted.bam\ndone\n\nexit\n```\n\n#### Duplicate marking step\nDuplicates marked with picard version 2.22.8\n\n```bash\ncd $WORKING_DIR/tools\nwget https://github.com/broadinstitute/picard/releases/download/2.22.8/picard.jar\n\nisub -i 'bryanfisk/bwa:latest' -m 32 -n 2\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\ncd $WORKING_DIR\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    echo \"Will name output using sample name: $SAMPLE\"\n    java -Xmx16g -jar $WORKING_DIR/tools/picard.jar MarkDuplicates I=$WORKING_DIR/bams/${SAMPLE}.sorted.bam O=$WORKING_DIR/bams/${SAMPLE}.markedsorted.bam M=$WORKING_DIR/metrics_files/${SAMPLE}.markdup.metrics\n    samtools index $WORKING_DIR/bams/${SAMPLE}.markedsorted.bam\ndone\n\nexit\n```\n\n#### Flagstat step\nRun samtools flagstats on dup marked BAMs\n\n```bash\nisub -m 32\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\ncd $WORKING_DIR\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    echo \"Will name output using sample name: $SAMPLE\"\n    samtools flagstat $WORKING_DIR/bams/${SAMPLE}.markedsorted.bam \u003e $WORKING_DIR/metrics_files/${SAMPLE}.flagstat.metrics\ndone\n\nexit\n```\n\n\n#### LTR integration site read filtering of BAM\n\nProduce a version of the duplicate marked BAM that is limited to only those alignments involving reads that contained the characterstic integration site sequence (TTTAGTACACA|TGTGTACTAAA) identified above\n\n```bash\n\nisub -m 32\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\n\ncd $WORKING_DIR\nrm -f tmp/*\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n\n    echo -e \"\\nProducing integration site read list filtered BAM for $SAMPLE\"\n    echo \"samtools view -H bams/${SAMPLE}.markedsorted.bam \u003e tmp/${SAMPLE}.markedsorted.bam.header\"\n    samtools view -H bams/${SAMPLE}.markedsorted.bam \u003e tmp/${SAMPLE}.markedsorted.bam.header\n\n    echo \"samtools view bams/${SAMPLE}.markedsorted.bam | grep -F -f readlists/${SAMPLE}_ltr_integration_seq_read_ids.txt \u003e tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.sam\"\n    samtools view bams/${SAMPLE}.markedsorted.bam | grep -F -f readlists/${SAMPLE}_ltr_integration_seq_read_ids.txt \u003e tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.sam\n\n    echo \"cat tmp/${SAMPLE}.markedsorted.bam.header tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.sam | samtools view -Sb - \u003e tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam\"\n    cat tmp/${SAMPLE}.markedsorted.bam.header tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.sam | samtools view -Sb - \u003e tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam\n\n    echo \"samtools sort tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam -o bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam\"\n    samtools sort tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam -o bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam\n\n    echo \"samtools index bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam\"\n    samtools index bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam\ndone\n\n```\n\n#### identify reads involving a primary or supplementary alignment to the virus sequence (`NC_001436.1`)\n\n```bash\nisub -m 32\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\n\ncd $WORKING_DIR\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n\n    echo -e \"\\nProcessing sample: $SAMPLE\"\n    echo \"Obtaining reads with supplemetary alignments to the virus\"\n    echo \"samtools view -f 2048 bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam 'NC_001436.1' | cut -f 1 | sort | uniq \u003e readlists/${SAMPLE}_supplementary_virus_hit_read_ids.txt\"\n    samtools view -f 2048 bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam 'NC_001436.1' | cut -f 1 | sort | uniq \u003e readlists/${SAMPLE}_supplementary_virus_hit_read_ids.txt\n    wc -l readlists/${SAMPLE}_supplementary_virus_hit_read_ids.txt\n\n    echo -e \"\\nObtaining reads with primary alignments to the virus\"\n    echo \"samtools view -F 256 bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam 'NC_001436.1' | cut -f 1 | sort | uniq \u003e readlists/${SAMPLE}_primary_virus_hit_read_ids.txt\"\n    samtools view -F 256 bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam 'NC_001436.1' | cut -f 1 | sort | uniq \u003e readlists/${SAMPLE}_primary_virus_hit_read_ids.txt\n    wc -l readlists/${SAMPLE}_primary_virus_hit_read_ids.txt\n\n    echo -e \"\\nCreating a unique list of reads with either supplementary or primary alignments to the virus\"\n    echo \"cat readlists/${SAMPLE}_supplementary_virus_hit_read_ids.txt readlists/${SAMPLE}_primary_virus_hit_read_ids.txt | sort | uniq \u003e readlists/${SAMPLE}_virus_hit_read_ids.txt\"\n    cat readlists/${SAMPLE}_supplementary_virus_hit_read_ids.txt readlists/${SAMPLE}_primary_virus_hit_read_ids.txt | sort | uniq \u003e readlists/${SAMPLE}_virus_hit_read_ids.txt\n    wc -l readlists/${SAMPLE}_virus_hit_read_ids.txt\n    echo -e \"\\n\"\ndone\nexit\n```\n\n\n#### Use the viral alignment read list to produce filtered BAM files with only those reads that have such alignments\n\n```bash\nisub -m 32\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\n\ncd $WORKING_DIR\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    echo -e \"\\nProducing viral read list filtered BAM for $SAMPLE\"\n    echo \"samtools view -H bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam \u003e tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam.header\"\n    samtools view -H bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam \u003e tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam.header\n\n    echo \"samtools view bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam | grep -F -f readlists/${SAMPLE}_virus_hit_read_ids.txt \u003e tmp/${SAMPLE}.markedsorted.viralreads.sam\"\n    samtools view bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam | grep -F -f readlists/${SAMPLE}_virus_hit_read_ids.txt \u003e tmp/${SAMPLE}.markedsorted.viralreads.sam\n\n    echo \"cat tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam.header tmp/${SAMPLE}.markedsorted.viralreads.sam | samtools view -Sb - \u003e bams/${SAMPLE}.markedsorted_with_hits_to_viral.bam\"\n    cat tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam.header tmp/${SAMPLE}.markedsorted.viralreads.sam | samtools view -Sb - \u003e bams/${SAMPLE}.markedsorted_with_hits_to_viral.bam\ndone\n\nexit\n```\n\n**NOTE**: It is possible that the above approach of limited to reads with a viral alignment could be too strict.\nA read may correspond to an integration event, have the characteristic LTR sequence, but not produce an alignment to the virus genome \nWe should try the analysis, with and without this requirement and gauge impact\n\nUse bedtools (v2.25.0) to create bed representations of the BAM alignments to facilitate integration site counting.\nAt the same time apply filters to: require alignment, remove duplicates, prevent counting on the virus seq itself\n\nDo this two ways: (1) with the marked-duplicate BAM, (2) with the BAM created from marked-duplicate BAM that also limits to viral hit reads produced above\n\n##### (1) with the marked-duplicate BAM\n\n```bash\nisub -m 32\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\n\ncd $WORKING_DIR\n\nrm -f tmp/*\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    echo -e \"\\nProducing integration site counts $SAMPLE\"\n    echo \"samtools view -H bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam \u003e tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam.header\"\n    samtools view -H bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam \u003e tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam.header\n    echo \"samtools view -f 1 -F 1024 -q 20 bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam \u003e tmp/${SAMPLE}.markedsorted_filtered.sam\"\n    samtools view -f 1 -F 1024 -q 20 bams/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam \u003e tmp/${SAMPLE}.markedsorted_filtered.sam\n    echo \"cat tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam.header tmp/${SAMPLE}.markedsorted_filtered.sam | samtools view -Sb - \u003e tmp/${SAMPLE}.markedsorted_filtered.bam\"\n    cat tmp/${SAMPLE}.markedsorted_ltr_integration_seq_reads.bam.header tmp/${SAMPLE}.markedsorted_filtered.sam | samtools view -Sb - \u003e tmp/${SAMPLE}.markedsorted_filtered.bam\n    echo \"samtools sort tmp/${SAMPLE}.markedsorted_filtered.bam -o bams/${SAMPLE}.markedsorted_filtered.bam\"\n    samtools sort tmp/${SAMPLE}.markedsorted_filtered.bam -o bams/${SAMPLE}.markedsorted_filtered.bam\n    echo \"samtools index bams/${SAMPLE}.markedsorted_filtered.bam\"\n    samtools index bams/${SAMPLE}.markedsorted_filtered.bam\n    echo \"bedtools bamtobed -i bams/${SAMPLE}.markedsorted_filtered.bam \u003e beds/${SAMPLE}.markedsorted_filtered.bed\"\n    bedtools bamtobed -i bams/${SAMPLE}.markedsorted_filtered.bam \u003e beds/${SAMPLE}.markedsorted_filtered.bed\n    echo \"bedtools merge -i beds/${SAMPLE}.markedsorted_filtered.bed -c 1 -o count | grep -v 'NC_001436.1' \u003e beds/${SAMPLE}.markedsorted_filtered_merged.bed\"\n    bedtools merge -i beds/${SAMPLE}.markedsorted_filtered.bed -c 1 -o count | grep -v 'NC_001436.1' \u003e beds/${SAMPLE}.markedsorted_filtered_merged.bed\n    echo \"cp beds/${SAMPLE}.markedsorted_filtered_merged.bed counts/${SAMPLE}.markedsorted_filtered_merged.bed.tsv\"\n    cp beds/${SAMPLE}.markedsorted_filtered_merged.bed counts/${SAMPLE}.markedsorted_filtered_merged.bed.tsv\ndone\n\nexit\n```\n\n##### (2) with the BAM created from marked-duplicate BAM that also limits to viral hit reads produced above\n\n```bash\nisub -m 32\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\n\ncd $WORKING_DIR\n\nrm -f tmp/*\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    echo -e \"\\nProducing integration site counts $SAMPLE\"\n    echo \"samtools view -H bams/${SAMPLE}.markedsorted_with_hits_to_viral.bam \u003e tmp/${SAMPLE}.markedsorted_with_hits_to_viral.bam.header\"\n    samtools view -H bams/${SAMPLE}.markedsorted_with_hits_to_viral.bam \u003e tmp/${SAMPLE}.markedsorted_with_hits_to_viral.bam.header\n    echo \"samtools view -f 1 -F 1024 -q 20 bams/${SAMPLE}.markedsorted_with_hits_to_viral.bam \u003e tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.sam\"\n    samtools view -f 1 -F 1024 -q 20 bams/${SAMPLE}.markedsorted_with_hits_to_viral.bam \u003e tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.sam\n    echo \"cat tmp/${SAMPLE}.markedsorted_with_hits_to_viral.bam.header tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.sam | samtools view -Sb - \u003e tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam\"\n    cat tmp/${SAMPLE}.markedsorted_with_hits_to_viral.bam.header tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.sam | samtools view -Sb - \u003e tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam\n    echo \"samtools sort tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam -o bams/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam\"\n    samtools sort tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam -o bams/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam\n    echo \"samtools index bams/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam\"\n    samtools index bams/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam\n    echo \"bedtools bamtobed -i bams/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam \u003e beds/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bed\"\n    bedtools bamtobed -i bams/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bam \u003e beds/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bed\n    echo \"bedtools merge -i beds/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bed -c 1 -o count | grep -v 'NC_001436.1' \u003e beds/${SAMPLE}.markedsorted_with_hits_to_viral_filtered_merged.bed\"\n    bedtools merge -i beds/${SAMPLE}.markedsorted_with_hits_to_viral_filtered.bed -c 1 -o count | grep -v 'NC_001436.1' \u003e beds/${SAMPLE}.markedsorted_with_hits_to_viral_filtered_merged.bed\n    echo \"cp beds/${SAMPLE}.markedsorted_with_hits_to_viral_filtered_merged.bed counts/${SAMPLE}.markedsorted_with_hits_to_viral_filtered_merged.bed.tsv\"\n    cp beds/${SAMPLE}.markedsorted_with_hits_to_viral_filtered_merged.bed counts/${SAMPLE}.markedsorted_with_hits_to_viral_filtered_merged.bed.tsv\ndone\n\nexit\n```\n\n#### Create single file with all counts for all samples to facilitate creation of visualizations\n\n```bash\nisub -m 32\nsource $WORKING_DIR/git/htlv_integration_sites/envs.txt\n\ncd $WORKING_DIR\n\nrm -f tmp/*\n\nfor FASTQ_NAME in \"${FASTQ_NAMES[@]}\"; do\n    SAMPLE=$(echo $FASTQ_NAME | awk -F_ '{print $2}')\n    awk -v sample=\"$SAMPLE\" '{print $0 \"\\t\" sample}' counts/${SAMPLE}.markedsorted_with_hits_to_viral_filtered_merged.bed.tsv \u003e tmp/${SAMPLE}.markedsorted_with_hits_to_viral_filtered_merged.bed.tsv\ndone\necho -e \"chromosome\\tstart_pos\\tend_pos\\tcount\\tsample\" \u003e tmp/header.tsv\ncat tmp/header.tsv tmp/*markedsorted_with_hits_to_viral_filtered_merged.bed.tsv \u003e counts/ALL.markedsorted_with_hits_to_viral_filtered_merged.bed.tsv\n\nexit\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgriffithlab%2Fhtlv_integration_sites","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgriffithlab%2Fhtlv_integration_sites","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgriffithlab%2Fhtlv_integration_sites/lists"}