{"id":15046534,"url":"https://github.com/nalomran/pyreqtl","last_synced_at":"2025-10-27T14:31:19.413Z","repository":{"id":62583145,"uuid":"298248512","full_name":"nalomran/PyReQTL","owner":"nalomran","description":"A collection of Python modules equivalent to R ReQTL Toolkit aims to identify the association between expressed SNVs with their gene expression using RNA-sequencing data.","archived":false,"fork":false,"pushed_at":"2022-01-20T21:02:50.000Z","size":3249,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-04-23T01:20:21.107Z","etag":null,"topics":["bioinformatics","bioinformatics-analysis","bioinformatics-tool","data-science","gene-expression","matrixeqtl","numpy","pandas","python","python3","r","rna-seq","rna-seq-analysis","rpy2","scipy","snvs"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nalomran.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-24T10:45:18.000Z","updated_at":"2023-10-18T06:11:52.000Z","dependencies_parsed_at":"2022-11-03T21:47:08.854Z","dependency_job_id":null,"html_url":"https://github.com/nalomran/PyReQTL","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nalomran%2FPyReQTL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nalomran%2FPyReQTL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nalomran%2FPyReQTL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nalomran%2FPyReQTL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nalomran","download_url":"https://codeload.github.com/nalomran/PyReQTL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219861084,"owners_count":16556007,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","bioinformatics-analysis","bioinformatics-tool","data-science","gene-expression","matrixeqtl","numpy","pandas","python","python3","r","rna-seq","rna-seq-analysis","rpy2","scipy","snvs"],"created_at":"2024-09-24T20:53:12.934Z","updated_at":"2025-10-27T14:31:18.172Z","avatar_url":"https://github.com/nalomran.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyReQTL\n\n![GitHub](https://img.shields.io/github/license/nalomran/PyReQTL)\n[![Maintainability](https://api.codeclimate.com/v1/badges/10f81663bd87cbf2178a/maintainability)](https://codeclimate.com/github/nalomran/PyReQTL/maintainability)\n[![Total alerts](https://img.shields.io/lgtm/alerts/g/nalomran/PyReQTL.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/nalomran/PyReQTL/alerts/)\n[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/nalomran/PyReQTL.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/nalomran/PyReQTL/context:python)\n![install_with: pip](https://img.shields.io/badge/install%20with-pip-green)\n![pypi_package: passing](https://img.shields.io/badge/Pypi%20Package-passing-brightgreen)\n![GitHub last commit](https://img.shields.io/github/last-commit/nalomran/PyReQTL)\n![GitHub commit activity](https://img.shields.io/github/commit-activity/w/nalomran/PyReQTL)\n![GitHub Release Date](https://img.shields.io/github/release-date/nalomran/PyReQTL)\n\n**Index: [Introduction](#introduction) | [Features](#features) | [Asking for Help](#asking-for-help) | \n[Getting Started](#getting-started) | [Operating System Requirement](#operating-system-requirement) | \n[Python Version](#python-version) | [Prerequisites](#prerequisites) | [Installation](#installation) | [Quick example](#quick-example) |\n[Running the Modules](#running-the-modules) | \n[Authors \u0026 Acknowledgment for the ReQTL R Toolkit](#authors--Acknowledgment-for-the-ReQTL-R-toolkit]) | [Paper](#paper) |\n[References](#references)**\n\n## Introduction\nPyReQTL is a python package/library that consist of collection of python modules developed for individuals or research \ngroup who prefer to perform ReQTL analysis using python instead of R programming language. Hence, this library are \nreplica of the R toolkit **\"ReQTL\"** but implemented in Python (please see the section References for direct link of the\nReQTL R toolkit GitHub repo page). The original R toolkit requires a number of R packages such as \nMatrixEQTL and among others Python lacks equivalent libraries. Furthermore, despite that many of the code-base in this \nparticular project were smoothly translated into python, there were needs to interface with R and to its \nenvironment which includes importing R packages to utilize their functions from within Python and thus \nthe Python library \"rpy2\" and R are requirements to these modules.\n\nBriefly, PyReQTL does as for the ReQTL identify the correlation between expressed SNVs and their gene expression using \nRNA-sequencing data and each module transforms the sequencing files into ReQTL input files which will eventually be used \nas inputs for the R package MatrixEQTL to find the significant variation-expression relationships. \n\n\n## Features\n  - Faster than the R ReQTL version \n  - For efficiency and performance, it uses both Pandas DataFrames and numpy arrays\n  - rpy2 is more efficient and better integrated than using subprocess to interface with R \n   \n\n## Asking for Help\n\nIf you have an issue (error or bug) when executing the library or any of the modules or have any other enquires please \nfeel free to send me an [email](mailto:nawafalomran@hotmail.com)\n\n\n## Getting Started\n\nPlease follow the instructions in the sections that follow. You need to have a terminal to be able to execute the \nmodules via the command line interface (cli). In case you are getting an error or bug please refer to the section \n**[Asking for Help](#asking-for-help)**.\n\n\n## Operating System Requirement\n\nPyReQTL modules were developed on *macOS Catalina v.10.15.2* and should work also on *Linux OS*.\n\n\n## Python Version \nBoth Python version *3.7* and *3.8* were successfully tested on PyReQTL. You need Python version \u003e=3.5.\n\n**Note:**\n\n- I would recommend checking your default python version which it may be python v2.7  so running ```python```\nmay start version *2.6* or *2.7* and therefore you need to run ```python3``` to start python 3 version \nor you may set your python v3 as your default python version.\n- Note that PyReQTL were not tested on Windows and it may or may not work.\n\n## Prerequisites\n\n* The following are the core Python libraries (dependencies) required for PyReQTL modules:\n\n  ```\n  pandas\n  numpy\n  scipy\n  rpy2\n  ```\n  \n* Optional Python Library:\n  ```\n  mypy\n  ```\n\n* You will also need to install the latest version of *R (4.0.2)* from the the official language site \nat: https://cloud.r-project.org\n  \n* #### **Important notes:** \n  1. Python will install the required R packages for you but you may be asked to choose a secure mirror for installing \n  those R packages.\n  \n  2. You may also be asked when executing the modules that require R packages to upgrade some R dependencies, you may \n  ignore it by typing```no```.\n  \n  3. Some bioconductor packages will also be required for the bioinformatics related analysis.\n\n\n## Installation \n\n1.  Install the package with pip: \n\n    ```bash\n    pip install pyreqtl\n    ```   \n\n2.  You could also clone the project with the command git clone:\n      ```bash\n      git clone https://github.com/nalomran/PyReQTL.git\n      ```    \n    \n    -  Once cloned, you may now try to install the dependencies by running the command:\n          ```bash\n            sudo pip install -r requirements.txt\n          ```   \n   \n    - Then, issue the  ```ls ``` command to list the files that end with .py (i.e. the PyReQTL modules) (OPTIONAL)\n       ```bash\n        ls -1 *.py    \n       ```\n   \n## Quick example\n\n```python\nfrom PyReQTL import gene_matrix\ngene_exp, gene_loc = gene_matrix.build_gene_matrix(gene_dir='data', prefx_out='ReQTL_test')\n\n# file is saved for gene expression in output/ReQTL_test_gene-exp_matrix.txt\n# file is saved for gene location in output/ReQTL_test_gene-exp-loc_matrix.txt\n\ngene_exp.head(3)\n# sample          gene_id  sample_1  sample_10  sample_11  sample_12  sample_13  sample_14  sample_15  ...  sample_25  sample_3  sample_4  sample_5  sample_6  sample_7  sample_8  sample_9\n# 0       ENSG00000000457  0.502402  -0.502402  -0.395725  -1.020076  -0.736316  -1.768825  -1.426077  ...   1.020076  1.198380 -0.194028  1.426077 -1.198380 -0.293381  1.768825  0.096559\n# 1       ENSG00000000460  0.502402   0.395725   0.000000  -1.426077  -0.502402  -0.736316  -0.869424  ...   1.020076  1.198380 -1.198380  1.426077 -1.020076 -1.768825  1.768825  0.096559\n# 2       ENSG00000000938 -0.096559   1.426077  -0.293381   0.293381  -0.736316  -1.198380  -0.395725  ...   1.020076 -0.194028  1.198380  0.736316  0.194028 -1.768825  0.502402 -1.426077\n# [3 rows x 26 columns]\n\ngene_loc.head(3)\n# GeneID Reference  Start     End\n# 0  ENSG00000237613      chr1  34554   36081\n# 1  ENSG00000238009      chr1  89295  133723\n# 2  ENSG00000239945      chr1  89551   91105\n# [3 rows x 4 columns])\n```\n```bash  \npython -m PyReQTL.gene_matrix -i data -o ReQTL_test -c True \n```\n\n## Running the Modules\n\nIf you choose to clone the repo from Github, then you are given two options to run the modules:\n 1. Running the prepared shell script *\"run_all.sh\"* found at the parent's directory of *PyReQTL*. This script is \n prepared to automatically run all modules sequentially with just a single command.\n \n 2. Running each individual python module separately.\n\n\n### - Option One (run all the modules sequentially)\n   \nRun the whole analysis at once:\n\n1. type ```cd ..``` to go back to the main directory in which the shell script lives\n\n    - You can modify the script based on your needs\n \n2. run the following command:\n\n```bash\nsh run_all.sh\n```\n\n### - Option Two (run each module separately)\n\n\n#### 1. build_gen_exp_matrix module\n\n##### **Note**: \n\n1. please refer to the link in the **[References](#references)** section for the description of each module\n2. make sure you are in the ```PyReQTL``` directory in order to run each individual python module\n\n- Run the following command to go the PyReQTL directory:\n ```bash\ncd PyReQTL/\n```\n    \n##### Inputs + Options\n\n  + -i: a directory contain all the raw gene expression files from Stringtie\n\n  + -o: the prefix for the gene expression and gene location files\n\n##### Output\n  * the gene expression values to be fed into MatrixEQTL\n\n  * the gene location for MatrixEQTL\n\n#### Command Example \n```bash\npython build_gen_exp_matrix.py -i ../data -o ReQTL_test\n```\n\n***\n\n#### 2. build_VAF_matrix module\n\n##### Inputs + Options\n\n  + -r: a directory containing the .csv files from the output of readCounts\n    module\n\n  + -o: the prefix for the SNV matrix and SNV location files\n\n##### Output\n\n  + the SNV locations for MatrixEQTL\n\n  + the SNV variant allele fraction matrix for MatrixEQTL\n\n##### Command Example\n```bash\npython build_VAF_matrix.py -r ../data -o ReQTL_test\n```\n\n***\n\n\n#### 3. harmonize_matrices module\n\n##### Input\n\n + -v: the path to the variant matrix generated from the build_VAF_matrix.py\n\n + -g: the path to the gene expression matrix generated from the build_gen_exp_matrix.py\n\n + -c: the path to the covariate matrix located in the data folder\n\n##### Output\n *  three output files consists of the three input files that have mutual samples\n \n\n##### Command Example\n```bash\npython harmonize_matrices.py \\\n    -v output/ReQTL_test_VAF_matrix.txt \\\n    -g output/ReQTL_test_gene-exp_matrix.txt \\\n    -c ../data/covariates_matrix.txt\n```\n##### Note: \n1. the covariates matrix file is supplied in the \"data\" folder\n2. this is an optional module but it is recommended to avoid possible error or bug\n\n***\n\n#### 4. run_matrix_ReQTL module\n\nThis module requires the R package MatrixEQTL. Please refer to the section References for more \ninformation about this package \n\n##### Input\n\n  + -s: the SNV or variant matrix file created from harmonize_matrices\n\n  + -sl: the SNV location matrix file created from build_VAF_matrix\n\n  + -ge: the gene expression file matrix created from build_gene-exp_matrix\n\n  + -gl: the gene locations file created from build_gene-exp_matrix\n\n  + -c: the covariates matrix file created from \"harmonize_matrices.\n    [OPTIONAL]! you can access the file under data directory\n\n  + -o: the prefix for the path to the output files\n\n  + -ct: logical (T or F) specifying whether to split the output into cis\n    or trans\n\n  + -pcis: p-value thresholds for the cis output files\n\n  + -ptran: p-value thresholds for the trans output files\n\n  + -p: p-value thresholds for the unified output file\n\n\n##### Output\n\n  + one output could be cis ReQTLs and trans ReQTLs or one with all of the\n    unified ReQTLs. This depends on what you choose for the value of parameter\n    \"-ct\"\n\n  + QQ plot of p-values\n\n##### Commands Example\n\n* splitting *cis* and *trans*\n```bash\npython run_matrix_ReQTL.py \\\n      -s output/ReQTL_test_VAF_matrix_harmonized.txt \\\n      -sl output/ReQTL_test_VAF_loc_matrix.txt \\\n      -ge output/ReQTL_test_gene-exp_matrix_harmonized.txt \\\n      -gl output/ReQTL_test_gene-exp-loc_matrix.txt \\\n      -c output/covariates_matrix_harmonized.txt \\\n      -ct T \\\n      -o \"ReQTL_test\" \\\n      -pcis 0.001 \\\n      -ptra 0.00001\n```\n\n* unified *cis* and *trans*\n```bash\npython run_matrix_ReQTL.py \\\n       -s output/ReQTL_test_VAF_matrix_harmonized.txt \\\n       -sl output/ReQTL_test_VAF_loc_matrix.txt \\\n       -ge output/ReQTL_test_gene-exp_matrix_harmonized.txt \\\n       -gl output/ReQTL_test_gene-exp-loc_matrix.txt \\\n       -c output/covariates_matrix_harmonized.txt \\\n       -ct F \\\n       -o \"ReQTL_test\" \\\n       -p 0.001\n```\n***\n\n#### 5. annotate_cis_trans module\n\n#### Input\n\n  + -r: the path to the ReQTL analysis result file\n\n  + -ga: the path to the file gene location annotations\n\n  + -o: the prefix for the output annotated result\n\n\n\n#### Output\n\n  * the ReQTLs annotated as cis or trans\n\n##### Commands Example\n```bash\npython annotate_cis_trans.py \\\n    -r output/ReQTL_test_all_ReQTLs.txt \\\n    -ga ../data/gene_locations_hg38.txt \\\n    -o ReQTL_test\n```\n***\n\n## Authors \u0026 Acknowledgment for the ReQTL R Toolkit\n\n\n- Liam Spurr, **Nawaf Alomran**, Pavlos Bousounis, Dacian Reece-Stremtan, Prashant N M, Hongyu Liu, Piotr Słowiński, Muzi Li, Qianqian Zhang, Justin Sein, Gabriel Asher, Keith A. Crandall, Krasimira Tsaneva-Atanasova, and Anelia Horvath \n\n- We would like to thank the Matrix EQTL team (Shabalin, et al. 2012) for their sample code and R package \nupon which *run\\_matrix_ReQTL.R* is based.\n\n\n## Paper\n- https://academic.oup.com/bioinformatics/article/36/5/1351/5582649\n\n## References\n\n1. The ReQTL R toolkit github repository: https://github.com/HorvathLab/ReQTL\n\n2. MatrixEQTL R package: http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/\n\n3. readCounts module: https://github.com/HorvathLab/NGS/tree/master/readCounts","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnalomran%2Fpyreqtl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnalomran%2Fpyreqtl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnalomran%2Fpyreqtl/lists"}