{"id":32611490,"url":"https://github.com/ctlab/quant3p","last_synced_at":"2025-10-30T13:59:18.932Z","repository":{"id":23795536,"uuid":"27171334","full_name":"ctlab/quant3p","owner":"ctlab","description":"A set of scripts for 3' RNA-seq quantification","archived":false,"fork":false,"pushed_at":"2022-06-03T09:15:08.000Z","size":352,"stargazers_count":1,"open_issues_count":3,"forks_count":2,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-05-19T00:08:55.641Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ctlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-11-26T10:19:06.000Z","updated_at":"2019-12-02T17:22:54.000Z","dependencies_parsed_at":"2022-08-20T01:31:22.305Z","dependency_job_id":null,"html_url":"https://github.com/ctlab/quant3p","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ctlab/quant3p","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2Fquant3p","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2Fquant3p/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2Fquant3p/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2Fquant3p/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ctlab","download_url":"https://codeload.github.com/ctlab/quant3p/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctlab%2Fquant3p/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281818072,"owners_count":26566859,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-30T02:00:06.501Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-30T13:58:24.385Z","updated_at":"2025-10-30T13:59:18.925Z","avatar_url":"https://github.com/ctlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"quant3p\n=======\n\nA set of scripts for 3' RNA-seq quantification\n\n## Install\n\nTo install run `python setup.py install`.\n\nTo install locally, without root privelegies, run `python setup.py install --user`. Don't forget to add `.local/bin` to your `PATH`. You can do it by adding line `export PATH=\"$HOME/.local/bin:$PATH\"` to the `~/.profile` file.\n\n## Quick start\n\nYou can quickly run analysis for the example data.\n\n```bash\nquant3p -n example -g example/mm10.slice.gtf example/bam/*.bam \n```\n\nParameters are:\n* `-n NAME` sets the name of experiment to be `NAME`,\n* `-g GTF` tells to use `GTF` as the annotation,\n* `example/bam/*.bam` is a list of bam-files to process.\n\nThis command produces one file `example.cnt` that contatins gene counts, that looks like this:\n```\n                        2h_rep1  2h_rep2  4h_rep1  4h_rep2\n100039246               0        0        0        0\n11744                   112      118      72       86\n14673                   105      114      52       67\n211623                  0        0        0        0\n231842                  47       39       16       17\n232157                  1799     1947     2060     2075\n66039                   1        2        0        0\n78653                   147      148      90       131\nNM_001270503_dup1       0        0        0        0\nNM_001270503_dup2       0        0        0        0\nNM_207229_dup1          0        0        0        0\nNM_207229_dup2          0        0        0        0\n__no_feature            463      489      389      433\n__ambiguous             0        0        0        0\n__too_low_aQual         0        0        0        0\n__not_aligned           0        0        0        0\n__alignment_not_unique  289      190      231      194\n```\n\nIf you want to look at intermediate files, you can use parameter `--keep-temp`.\n\n## Step by step guide\n\n`quant3p` consists of four steps that can be useful by itself:\n* Finding potential exons by calling peaks with MACS2,\n* Fixing annotation by associating the potential exons with annotated genes,\n* Fixing multimapper tags in the alignment files by selecting only alignment that maps only to annotated exons,\n* Counting reads with HTSeq.\n\n### Finding potential exons\n\nTo find potential exons we call peaks in combined RNA-seq data. We do it separetely for positive and negative strands\nto use strand-specificity of our data. It's done by calling `macs2-stranded`.\n\n```bash\nmacs2-stranded -n example example/bam/*.bam\n```\n\nHere:\n* `-n example` sets the name of experiment to be `example`,\n* `example/bam/*.bam` is a list of bam-files to process.\n\nFor the example it should produce 15 peaks (file `example_peaks.narrowPeak`):\n```\nchr14  25846959   25847377   example.pos_peak_1   372   +  13.33333  42.28070   37.23716   163\nchr14  25871156   25871413   example.pos_peak_2   114   +  9.52381   16.27811   11.43326   80\nchr14  25886465   25886948   example.pos_peak_3   1319  +  25.60976  138.30992  131.94069  273\nchr14  26026235   26026718   example.pos_peak_4   1324  +  25.81967  138.85394  132.45966  273\nchr14  26165849   26166332   example.pos_peak_5   1324  +  25.81967  138.85394  132.45966  273\nchr5   140752996  140753373  example.pos_peak_6   718   +  22.40260  77.17718   71.88202   214\nchr6   83341577   83341950   example.pos_peak_7   724   +  6.68317   77.78582   72.48792   197\nchr6   83342407   83342877   example.pos_peak_8   1071  +  8.43220   112.89129  107.17482  249\nchr6   83343311   83343810   example.pos_peak_9   1078  +  8.51155   113.64171  107.87556  272\nchr6   83351316   83351531   example.pos_peak_10  173   +  9.23567   22.26187   17.36432   79\nchr6   83353068   83353297   example.pos_peak_11  397   +  14.96815  44.79282   39.73446   80\nchr6   83358160   83358528   example.pos_peak_12  1767  +  34.81308  185.13048  176.73187  183\nchr5   140758332  140758782  example.neg_peak_1   1403  -  28.53982  148.70955  140.31094  182\nchr5   140806999  140807270  example.neg_peak_2   73    -  7.81250   13.03361   7.39012    75\nchr6   83346833   83347040   example.neg_peak_3   200   -  13.04348  25.95999   20.05134   37\n```\n\n### Fixing annotaion\n\nNext we fix annotation by adding RNA-seq peaks as exons to the transcripts if the peak overlaps with a region of 5Kbp downstream of the transcript's 3' end.\n\n```bash\ngtf-extend -g example/mm10.slice.gtf -p example_peaks.narrowPeak -o mm10.slice.fixed.gtf --extns-out mm10.slice.extensions.gtf\n```\nHere:\n* `-g example/mm10.slice.gtf` tells to use `example/mm10.slice.gtf ` as the annotation,\n* `-p example_peaks.narrowPeak` tells to use `example_peaks.narrowPeak` as peaks,\n* `-o mm10.slice.fixed.gtf` tells to put fixed annotation into `mm10.slice.fixed.gtf` file,\n* `--extns-out mm10.slice.extensions.gtf` tells to put newly added exons into `mm10.slice.extensions.gtf` files.\n\nIn the example we added 4 new exons (file `mm10.slice.extensions.gtf`):\n```\nchr6  extension  exon  83341578   83341950   724   +  .  transcript_id  \"NM_145571\";  gene_id  \"232157\"\nchr6  extension  exon  83342408   83342877   1071  +  .  transcript_id  \"NM_145571\";  gene_id  \"232157\"\nchr6  extension  exon  83343312   83343810   1078  +  .  transcript_id  \"NM_145571\";  gene_id  \"232157\"\nchr5  extension  exon  140758333  140758782  1403  -  .  transcript_id  \"NM_010302\";  gene_id  \"14673\"\n```\n\n### Fixing multimappers\n\nMain idea here is that we expect RNA-seq reads to be aligned to the genes.\nSo, we discard alignments of multimappers to the unannotated regions and \nfix multimapping annotation (`NH` SAM field) for such reads. Some of them\ncan be uniquely mapped to one gene.\n\n```bash\nfix-mm -g example/mm10.slice.gtf example/bam/2h_rep1.bam -o 2h_rep1.fixed.bam\n```\nHere:\n* `-g example/mm10.slice.gtf ` tells to use `example/mm10.slice.gtf ` as the annotation,*\n* `example/bam/2h_rep1.bam` tells to fix `example/bam/2h_rep1.bam` file,\n* `-o 2h_rep1.fixed.bam` tells to put fixed bam-file into `2h_rep1.fixed.bam`,\n\nIn the `2h_rep1.bam` file all the 301 multimappers can be uniquely mapped to a single position covered by an exon.\n\n### Counting with HTSeq\n\nCounting with HTSeq is rather straightworward:\n```bash\nsamtools view 2h_rep1.fixed.bam | htseq-count -s yes -t exon - mm10.slice.fixed.gtf\n```\n\nHere we use the fixed bam file (`2h_rep1.fixed.bam`) and annotaion (`mm10.slice.fixed.gtf`).\n\nOutput should be like this:\n```\n100039246               0\n11744                   112\n14673                   105\n211623                  0\n231842                  47\n232157                  1799\n66039                   1\n78653                   147\nNM_001270503_dup1       0\nNM_001270503_dup2       0\nNM_207229_dup1          0\nNM_207229_dup2          0\n__no_feature            463\n__ambiguous             0\n__too_low_aQual         0\n__not_aligned           0\n__alignment_not_unique  289\n```\n\nIf we run `htseq-count` for the original data:\n```bash\nsamtools view -h example/bam/2h_rep1.bam | htseq-count -s yes -t exon - example/mm10.slice.gtf\n```\n\nA lot of reads will not be counted:\n```\n100039246               0\n11744                   0\n14673                   0\n211623                  0\n231842                  44\n232157                  17\n66039                   0\n78653                   56\nNM_001270503_dup1       0\nNM_001270503_dup2       0\nNM_207229_dup1          0\nNM_207229_dup2          0\n__no_feature            2020\n__ambiguous             0\n__too_low_aQual         0\n__not_aligned           0\n__alignment_not_unique  826\n```\n\n## Dependencies\n\n`macs2-stranded` depends on `samtools` and `macs2` executables.  `macs2` version should be \u003e= 2.0.10\n\n`gtf-extend` and `fix-mm` needs `pybedtools` and `pysam` python packages and `bedtools` with `samtools` installed\n\n\nAll the dependencies can be installed using conda: `conda install -c bioconda macs2 pybedtools pysam HTseq`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctlab%2Fquant3p","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fctlab%2Fquant3p","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctlab%2Fquant3p/lists"}