{"id":20196395,"url":"https://github.com/dn070017/quanteval","last_synced_at":"2025-10-11T08:03:55.071Z","repository":{"id":130414783,"uuid":"104848970","full_name":"dn070017/QuantEval","owner":"dn070017","description":" QuantEval is an analysis pipeline which evaluate the reliability of quantification tools.","archived":false,"fork":false,"pushed_at":"2020-07-28T03:03:06.000Z","size":5158,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-10T10:59:23.009Z","etag":null,"topics":["connected-components","quantification-evaluation-methods","transcriptome-assembly"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dn070017.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-09-26T07:09:32.000Z","updated_at":"2022-10-10T13:49:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"834ded0a-e1a2-4c7f-a40d-68f91fea82e0","html_url":"https://github.com/dn070017/QuantEval","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dn070017/QuantEval","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dn070017%2FQuantEval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dn070017%2FQuantEval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dn070017%2FQuantEval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dn070017%2FQuantEval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dn070017","download_url":"https://codeload.github.com/dn070017/QuantEval/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dn070017%2FQuantEval/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279006746,"owners_count":26084148,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["connected-components","quantification-evaluation-methods","transcriptome-assembly"],"created_at":"2024-11-14T04:23:50.316Z","updated_at":"2025-10-11T08:03:55.027Z","avatar_url":"https://github.com/dn070017.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":" # QuantEval\n ## About\n QuantEval was released for two purposes: 1. a user can use the scripts in QuantEval to reproduce all the analyses in the following study (Hsieh et al.); 2. a user can follow the example in QuantEval to conduct the same analyses on his own study. For the first purpose, there are three modes in the QuantEval main program. (1) \u003cb\u003eReference Mode\u003c/b\u003e, (2) \u003cb\u003eContig Mode\u003c/b\u003e and (3) \u003cb\u003eMatch Mode\u003c/b\u003e. The first two modes read the quantification results and build a ambiguity cluster based on connected components for the reference transcripts and contig sequences. The match mode builds relations between contigs and reference transcripts. For the second purpose, the users are encouraged to follow the provided example to conduct new analyses on his own data.\n ## Reference\n \u003e Ping-Han Hsieh, Yen-Jen Oyang and Chien-Yu Chen. Effect of de novo transcriptome assembly on transcript quantification. Scientific Reports volume 9, Article number: 8304 (2019).\n ## Requirement\n - QuantEval main program:\n    - Python3 (3.5.2)\n    - Python packages: pandas (0.20.3), numpy (1.12.1)\n - Generate figures and table:\n    - R (3.3.0)\n    - R pacakges: gridExtra, grid, stats, tidyverse, plyr, ggplot2, reshape2\n - Utilities:\n    - Bowtie2 (2.3.0), BLASTn (2.5.0), Flux Simulator (1.2.1), RSEM (1.2.31), Kallisto (0.43.0), rnaSPAdes (3.11.1), Salmon (0.8.2), Trans-ABySS (1.5.5), TransRate (1.0.3), Trinity (2.4.0)\n\n ## Manual\n - Run QuantEval individually:\n ```shell\n python3 ./scripts/QuantEval.py --reference --contig --match --input input.json\n ```\n The first three parameters (\u003cb\u003e--reference, --contig, --match\u003c/b\u003e) indicate which mode to run and the \u003cb\u003einput.json\u003c/b\u003e file specifies the input parameters for the QuantEval main program. The three modes can be run independantly, but one has to run both reference mode and contig mode \u003cb\u003ebefore\u003c/b\u003e running the match mode. It is recommended to run three modes in sequential. Because the main program of QuantEval \u003cb\u003edoes not\u003c/b\u003e include a wrapper for quantification/sequence alignment/contig evaluation, which are essesntial steps for QuantEval main program, one might need to run quantification algorithms (i.e. RSEM/Kallisto/Salmon), sequence alignment (BLASTn) and contig evaluation (Transrate) by themselves in order to get similar analysis results in the reference research. \n ___\n\n Below, we use an example dataset to explain how to use QuantEval. This example contains the following files:\n - ref.fasta: the reference transcripts (In real applications, you will not have this file for the speices without reference transcripts)\n - contig.fasta: the contigs assembled by short reads, e.g. read_1.fastq and read_2.fastq\n - read_1.fastq read_2.fastq\n\n Before running QuantEval,\n - Run pairwise BLASTn for reference/contig mode:\n ```shell\n # reference mode\n blastn -db ref.fasta -query ref.fasta -outfmt 6 -evalue 1e-5 -perc_identity 95 -out ./blastn/ref.self.tsv \n \n # contig mode\n blastn -db contig.fasta -query contig.fasta -outfmt 6 -evalue 1e-5 -perc_identity 95 -out ./blastn/contig.self.tsv\n ```\n - Run BLASTn for the mapping of reference and contig sequence (match mode)\n ```shell\n blastn -db ref.fasta -query contig.fasta -outfmt 6 -out ./blastn/contig_to_ref.tsv \n ```\n - Run quantification for reference/contig mode with default parameters:\n ```shell\n # RSEM\n rsem-prepare-reference --bowtie2 ref.fasta ./rsem/rsem.index\n rsem-calculate-expression --paired-end --strandedness none --bowtie2 --time ref_read/read_1.fastq ref_read/read_2.fastq ./rsem/rsem.index ./rsem/rsem \n\n # Kallisto\n kallisto index -i ./kallisto/kallisto.index -k 31 ref.fasta\n kallisto quant -i ./kallisto/kallisto.index -o ./kallisto ref_read/read_1.fastq ref_read/read_2.fastq\n\n # Salmon\n salmon index -i ./salmon/salmon.index -t ref.fasta --type quasi -k 31\n salmon quant -i ./salmon/salmon.index -l A -1 ref_read/read_1.fastq -2 ref_read/read_2.fastq -o ./salmon \n ```\n - Run TransRate for reference/contig mode: \n ```\n # reference mode\n transrate --assembly ref.fasta --output ./transrate/ref --left ref_read/read_1.fastq --right ref_read/read_2.fastq\n\n # contig mode\n transrate --assembly contig.fasta --output ./transrate/ref --left contig_read/read_1.fastq --right contig_read/read_2.fastq\n ```\n - Example of input.json:\n ```json\n{\n    \"ref_fasta\": \"ref.fasta\",\n    \"ref_blastn\": \"./blastn/ref.self.tsv\",\n    \"ref_gtf\": \"ref.gtf\",\n    \"ref_xprs_file\": [\"./answer/answer_xprs.tsv\",\n                      \"./kallisto/ref/abundance.tsv\",\n                      \"./rsem/ref/rsem.isoforms.results\",\n                      \"./salmon/ref/quant.sf\"],\n    \"ref_xprs_label\": [\"answer\", \"kallisto\", \"rsem\", \"salmon\"],\n    \"ref_xprs_header\": [true, true, true, true],\n    \"ref_xprs_name_col\": [1, 1, 1, 1], \n    \"ref_xprs_tpm_col\": [2, 5, 6, 4], \n    \"ref_xprs_count_col\": [3, 4, 5, 5],\n    \"ref_transrate\": \"./transrate/ref/contigs.csv\",\n    \"contig_fasta\": \"contig.fasta\",\n    \"contig_blastn\": \"./blastn/contig.self.tsv\",\n    \"contig_xprs_file\": [\"./kallisto/contig/abundance.tsv\",\n                         \"./rsem/contig/rsem.isoforms.results\",\n                         \"./salmon/contig/quant.sf\"],\n    \"contig_xprs_label\": [\"kallisto\", \"rsem\", \"salmon\"],\n    \"contig_xprs_header\": [true, true, true],\n    \"contig_xprs_name_col\": [1, 1, 1], \n    \"contig_xprs_tpm_col\": [5, 6, 4], \n    \"contig_xprs_count_col\": [4, 5, 5],\n    \"contig_transrate\": \"./transrate/contig/contigs.csv\",\n    \"match_blastn\": \"./blastn/contig_to_ref.tsv\",\n    \"output_dir\": \"./QuantEval/\"\n}\n```\n___\n \n- Run example:\n```\ncd example\npython3 ../scripts/QuantEval.py --reference --contig --match --input ./example.json\n```\n___\n\nOne can also import the functions in utilities.py to built their own analysis pipeline.\n- Construct connected component for reference only:\n```python\nfrom utilities import construct_sequence, filter_blastn, intersect_match, construct_grap\nimport copy\n\ninput_file = dict()\ninput_file[\"contig_ref_file\"] = [\"./answer/answer_xprs.tsv\",\n                                 \"./kallisto/abundance.tsv\",\n                                 \"./rsem/rsem.isoforms.results\",\n                                 \"./salmon/quant.sf\"]\ninput_file[\"ref_xprs_label\"]: [\"answer\", \"kallisto\", \"rsem\", \"salmon\"],\ninput_file[\"ref_xprs_header\"]: [true, true, true, true],\ninput_file[\"ref_xprs_name_col\"]: [1, 1, 1, 1], \ninput_file[\"ref_xprs_tpm_col\"]: [2, 5, 6, 4], \ninput_file[\"ref_xprs_count_col\"]: [3, 4, 5, 5],   \nref_seq_dict = construct_sequence('ref.fasta')\nref_self_blastn = filter_blastn('./blastn/ref.self.tsv')\nread_expression(input_file, ref_seq_dict, 'ref')\nref_self_match_dict = intersect_match(ref_self_blastn, ref_seq_dict, copy.deepcopy(ref_seq_dict))\nref_uf, ref_component_dict = construct_graph(ref_seq_dict, ref_self_match_dict)\n\nprint(ref_uf.component_label)\nprint(ref_uf.parent)\n```\n___\n \n- Run all the analysis in the study (\u003cb\u003etime consuming\u003c/b\u003e):\n```shell\n./pipelines/run_analysis.sh\n```\n___\n \n- Output format\n\n| column | description |\n|--------|-------------|\n| match_name | alignment (ref.contig.strand) |\n| contig_name | target contig name |\n| ref_name | target ref name |\n| accuracy | accuracy of the alignment |\n| recovery | recovery of the alignment |\n| ***contig/ref***\\_length | length of ***contig/ref*** |\n| ***contig/ref***\\_tr\\_***transrate_score*** | ***transrate score*** of ***contig/ref*** |\n| ***contig/ref***\\_xprs\\_***tpm/count***_***quantifier*** | quantification result of ***contig/ref*** |\n| ***contig/ref***\\_component | label of connected component of ***contig/ref*** |\n| ***contig/ref***\\_component_size | number of sequences in the connected component of ***contig/ref*** |\n| ***contig/ref***\\_component_contribute_xprs_***tpm/count***\\_***quantifier*** | proportion of ***TPM/read count (RPEA)*** in the connected component of ***contig/ref*** |\n| ***contig/ref***\\_component_relative_xprs_***tpm/count***\\_***quantifier*** | ***TPM/count*** of ***contig/ref*** / highest ***TPM/count*** in the same connected component |\n| ***contig/ref***\\_component_max_xprs_***tpm/count***\\_***quantifier*** | highest ***TPM/count*** of ***contig/ref*** in the same connected component |\n| ***contig/ref***\\_component_avg_xprs_***tpm/count***\\_***quantifier*** | average ***TPM/count*** of ***contig/ref*** in the same connected component |\n| ***contig/ref***\\_component_tot_xprs_***tpm/count***\\_***quantifier*** | total ***TPM/count*** of ***contig/ref*** in the same connected component |\n| ref_gene_contribute_xprs_***tpm/count***\\_***quantifier*** | proportion of ***TPM/read count*** in the gene of ref |\n| ref_gene_relative_xprs_***tpm/count***\\_***quantifier*** | ***TPM/count*** of ref / highest ***TPM/count*** in the same gene |\n| ref_gene_max_xprs_***tpm/count***\\_***quantifier*** | highest ***TPM/count*** of ref in the same gene |\n| ref_gene_avg_xprs_***tpm/count***\\_***quantifier*** | average ***TPM/count*** of ref in the same gene |\n| ref_gene_tot_xprs_***tpm/count***\\_***quantifier*** | total ***TPM/count*** of ref in the same gene |\n| length_difference | the difference of length between contig and reference |\n| xprs_***tpm/count***\\_error_***quantifier*** | quantificaion error for the estimated abundance of contig | \n\u003e Note that this is the superset of the output fields (the match mode). The content of output will be different depends on reference/contig/match mode (e.g. one can only find the columns start with ***contig*** from contig mode), but one can find all the description on the table above.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdn070017%2Fquanteval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdn070017%2Fquanteval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdn070017%2Fquanteval/lists"}