{"id":13710287,"url":"https://gitlab.com/piroonj/eligos2","last_synced_at":"2025-05-06T18:35:08.385Z","repository":{"id":50317397,"uuid":"17222059","full_name":"piroonj/eligos2","owner":"piroonj","description":"Epitranscriptional/(Epigenomical) Landscape Inferring from Glitches of ONT Signals (version 2)","archived":false,"fork":false,"pushed_at":null,"size":null,"stargazers_count":5,"open_issues_count":2,"forks_count":1,"subscribers_count":null,"default_branch":"master","last_synced_at":"2024-08-03T23:18:22.742Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":null,"metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-01T21:02:55.007Z","updated_at":"2024-07-24T12:39:27.071Z","dependencies_parsed_at":"2022-09-23T08:10:48.551Z","dependency_job_id":null,"html_url":"https://gitlab.com/piroonj/eligos2","commit_stats":null,"previous_names":[],"tags_count":2,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com/repositories/piroonj%2Feligos2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com/repositories/piroonj%2Feligos2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com/repositories/piroonj%2Feligos2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com/repositories/piroonj%2Feligos2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com/owners/piroonj","download_url":"https://gitlab.com/piroonj/eligos2/-/archive/master/eligos2-master.zip","host":{"name":"gitlab.com","url":"https://gitlab.com","kind":"gitlab","repositories_count":4515822,"owners_count":6528,"icon_url":"https://github.com/gitlab.png","version":null,"created_at":"2022-05-30T11:31:42.605Z","updated_at":"2024-07-18T11:24:13.055Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/gitlab.com/owners"}},"keywords":[],"created_at":"2024-08-02T23:00:54.065Z","updated_at":"2024-11-13T20:31:43.133Z","avatar_url":null,"language":null,"funding_links":[],"categories":["Software packages"],"sub_categories":["RNA modification analysis"],"readme":"# ELIGOS2\n![eligos](images/eligos_logo_web.png)\n## **E**pitranscriptional/(Epigenomical) **L**andscape **I**nferring from **G**litches of **O**NT **S**ignals (**version 2**)\n\n## **SUMMARY**\n\nOxford Nanopore Technology (ONT) offers the sequencing platform enables us to sequence DNA and RNA in native from without amplification. Therefore, any existed modifications on the native sequences are preserved, resulting in the recorded ionic signals. The alterations of the signal, which differ from canonical base calling model, lead to the missed interpretation of base caller as sequencing errors. We use the errors to identify the positions of modifications on the RNA transcripts or DNA sequences.\n\n**ELIGOS** is developed to identify the position of modification on the native RNA sequences from the distinction of error at specific base (ESB) between the native RNA sequences with the reference. We employ the standard statistical analysis, Fisher's exact test to evaluate the distinction of error. The reference can be unmodified RNA sequences derived from in vitro transcription, cDNA sequences or our develop background error model (rBEM), which mimic the systematic errors of unmodified RNA sequences. \n\n**ELIGOS can be currently applied to perform :**\n\n**1.** Differential epitranscriptome analysis between two different conditions (DNA,RNA) (see example 1)\n\n**2.** Epitranscriptome profiling (RNA) (see example 2)\n\n**3.** Identification of DNA modifications such as DNA adduct (see example 3)\n\n**Please cite**: \n1. Piroon Jenjaroenpun, Thidathip Wongsurawat, Taylor D Wadley, Trudy M Wassenaar, Jun Liu, Qing Dai, Visanu Wanchai, Nisreen S Akel, Azemat Jamshidi-Parsian, Aime T Franco, Gunnar Boysen, Michael L Jennings, David W Ussery, Chuan He, Intawat Nookaew, Decoding the epitranscriptional landscape from native RNA sequences. Nucleic Acids Research, 2020, gkaa620, https://doi.org/10.1093/nar/gkaa620\n2. Intawat Nookaew, Piroon Jenjaroenpun, Hua Du, Pengcheng Wang, Jun Wu, Thidathip Wonsurawat, Sun Hee Moon, En Huang, Yinsheng Wang, Gunnar Boysen, Detection and discrimination of DNA adducts differing in size, regiochemistry and functional group by nanopore sequencing., Chemical Research in Toxicology, 2020, https://doi.org/10.1021/acs.chemrestox.0c00202\n\n**SRA Fast5**:\u003c/br\u003e\nhttps://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP166020\n\n**Journal cover**: \n\n![cover](images/two_cover.png)\n\n---\n## **INSTALLATION**\n### **ELIGOS installation on Linux/Mac/Windows(WSL)**\n1. **Miniconda3 installation**\n[Click link](https://docs.conda.io/en/latest/miniconda.html \"Miniconda3 Installation\")\n\n2. **Download ELIGOS via GitLab**\n    ```bash\n    ## Download ELIGOS from GitLab\n    git clone https://gitlab.com/piroonj/eligos2.git\n\n    ## Go to ELIGOS folder\n    cd eligos2\n    ```\n3. **Creating conda environment for ELIGOS**\n* Creating an environment with commands:\n    ```bash\n    ## Install environment\n    # this conda create might get package conflicts\n    # conda create -n eligos2 -c bioconda -c conda-forge -c anaconda python=3.6 pysam=0.13 pandas=0.23.4 pybedtools=0.8.0 bedtools=2.25 rpy2=2.8.5 r-base=3.4.1 tqdm=4.40.2 numpy=1.11.3\n\n    # I recently try to exclude version specification and it works for me.\n    conda install mamba\n    conda update mamba\n    mamba create -n eligos2 -c conda-forge -c bioconda -c r python=3.10 pysam pandas pybedtools bedtools rpy2 r-base tqdm numpy \n    \n    ## Activate ELIGOS environment\n    conda activate eligos2\n\n    ## Install samplesizeCMH module\n    Rscript -e 'install.packages(\"samplesizeCMH\", repos=\"https://cloud.r-project.org\")'\n\n    ## Export ELIGOS to system environment\n    export PATH=$PWD:$PWD/Scripts:$PATH\n\n    ## Run ELIGOS\n    eligos2 -h\n    ```\n* Creating an environment from an environment.yml file:\n    ```bash\n    ## Install environment\n    conda env create -f eligos2.linux.yml\n\n    ## Activate ELIGOS environment\n    conda activate eligos2\n\n    ## Install samplesizeCMH module\n    Rscript -e 'install.packages(\"samplesizeCMH\", repos=\"https://cloud.r-project.org\")'\n\n    ## Export ELIGOS to system environment\n    export PATH=$PWD:$PWD/Scripts:$PATH\n\n    ## Run ELIGOS\n    eligos2 -h\n    ```\n### ELIGOS installation on Docker or Singularity containers\n1. Install Docker or Singularity on computer. Click on link: [Docker](https://docs.docker.com/install/ \"Docker Installation\") or [Singularity ](https://sylabs.io/singularity/ \"Singularity Installation\") \n2. Install ELIGOS\n* Install ELIGOS from DockerHub\n    ```bash\n    ## Install ELIGOS from DockerHub\n    docker pull piroonj/eligos2\n\n    ## Run ELIGOS\n    docker run --rm -v $PWD:$PWD -w $PWD piroonj/eligos2 eligos2 -h\n    ```\n* Install ELIGOS from Dockerfile\n    ```bash\n    ## Download ELIGOS from GitLab\n    git clone https://gitlab.com/piroonj/eligos2.git\n\n    ## Go to DockerFiles folder\n    cd eligos2/DockerFiles\n\n    ## Build ELIGOS on docker\n    docker build -t piroonj/eligos2:latest .\n\n    ## Run ELIGOS\n    docker run --rm -v $PWD:$PWD -w $PWD piroonj/eligos2 eligos2 -h\n    ```\n\n* Install ELIGOS via Singularity image \n    ```bash\n    ## Create Singularity images of ELIGOS from DockerHub\n    singularity build eligos2.sif docker://piroonj/eligos2:latest\n\n    ## Run ELIGOS\n    singularity exec eligos2.sif eligos2 -h\n    ```\n\n\n---\n\n## **Usage**\n**Main functions**\n```\neligos2 -h\n\nusage: eligos2 [-h] [-v]\n              {map_preprocess,build_genedb,rna_mod,pair_diff_mod,bedgraph,filter}\n              ...\n\nELIGOS is a package of tools for the identification of modified nucleotides\nbased on nanopore sequencing data.\n\nELIGOS command groups:\n\tmap_preprocess           Preprocess mapped reads\n\tbuild_genedb             Build bam files from gene database\n\trna_mod                  Identify RNA modification against rBEM5+2 model\n\tpair_diff_mod            Identify RNA modification against control condition\n\tbedgraph                 Filtering and creating BedGraph for IGV plot\n\tfilter                   Filtering for Eligos result\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -v, --version         show ELIGOS version and exit.\n```\n\n**Index BAM and Reference sequence**\n\nBefore using eligos2, BAM files and reference sequence need to be indexed.\n```bash\n## index BAM\nsamtools index file.bam\n\n## index reference sequence\nsamtools faidx file.fasta\n```\n\n---\n## **Example**\n### 1. **Differential ESB analysis yeast meiosis transcriptome to identify m6A and enriched RRACH motif**\n\nThe native RNA sequences of yeast transcriptome of m6A methyl transferase knockout (∆ime4) strain and wild type grew under meiosis state from Lui et. al. (https://doi.org/10.1038/s41467-019-11713-9) is used for this example.\nThe subset of 531 transcripts, which pre-identified to contain differential %ESB sites of adenine, is used for the example.\n\n#### Run ELIGOS comparing between native RNA of Wild-type and native RNA of Knock-out \n**1.1 Download data** : The example file contains two mapped reads (BAM file) of ∆ime4 and wild type, gene locations (BED file), and reference (FASTA file).\n\n1.dRNA_m6A_yeast_knockout.tar.gz [Download](https://drive.google.com/file/d/1J4QEbHeenT5OQkpboOaFRg2WsaFM_Fx6/view?usp=sharing \"Data 1\") \n```\n1.dRNA_m6A_yeast_knockout/\n├── sc_KO.selected.bam\n├── sc_KO.selected.bam.bai\n├── sc_WT.selected.bam\n├── sc_WT.selected.bam.bai\n├── yeast.S288C.genes.selected_set.bed\n├── yeast.S288C.genome.fa.gz\n├── yeast.S288C.genome.fa.gz.fai\n└── yeast.S288C.genome.fa.gz.gzi\n```\n\n**1.2 Run ELIGOS command**\n```bash\n## Index reference sequence\nsamtools faidx yeast.S288C.genome.fa.gz\n\n## Run ELIGOS compare between samples when Wild-type (-tbam) and Knock-out (-cbam)\neligos2 pair_diff_mod -tbam sc_WT.selected.bam -cbam sc_KO.selected.bam -reg yeast.S288C.genes.selected_set.bed -ref yeast.S288C.genome.fa.gz -t 34 --pval 0.001 --oddR 1.2 --esb 0 -o results\n\n## Extract potential base A modified using eligos2 filter and filter out homopolymer sequence\neligos2 filter -i results/sc_WT.selected_vs_sc_KO.selected_on_yeast.S288C.genes.selected_set_baseExt0.txt -sb A --homopolymer --esb 0 --oddR 1.2 --pval 0.001\n\n## Show output file\nhead sc_WT.selected_vs_sc_KO.selected_on_yeast.S288C.genes.selected_set_baseExt0.A.filtered.txt\n```\n**1.3 Extract potential modified A sequences with 6bp up/down stream expansion**\n```bash\n## Extract fasta from compressed fasta file\ngunzip -c yeast.S288C.genome.fa.gz \u003e yeast.S288C.genome.fa\nsamtools faidx yeast.S288C.genome.fa\n\n## Convert Eligos output to fasta file with 6bp. up/down extension\ntable2fa_eligos.mergeextend.sh results_5_test_pval_and_adjP/sc_WT.selected_vs_sc_KO.selected_on_yeast.S288C.genes.selected_set_baseExt0.A.filtered.txt yeast.S288C.genome.fa\n\n## Check sequence output\nhead sc_WT.selected_vs_sc_KO.selected_on_yeast.S288C.genes.selected_set_baseExt0.A.filtered.fa\n```\n\n**1.4 De-novo motif discovery with** [**BaMM motif**](https://bammmotif.soedinglab.org/) \n\nThe sequences of the differential adenine (above) can be used to identify consensus motif using BaMM motif discovery (https://doi.org/10.1093/nar/gky431) through web service (https://bammmotif.soedinglab.org/job/denovo/).\n\nBaMM motif web-site\n\n\u003cimg src=\"images/bamm.web.png\" width=\"500\"\u003e\n\nUse De-novo motif discovery\n\n\u003cimg src=\"images/motif_discov.png\" width=\"300\"\u003e\n\nUpload above output fasta file and BaMM!. We will obtain consensus RRACH motif as the figure below.\n\n\u003cimg src=\"images/sc.Motif1.png\" width=\"300\"\u003e\n\n---\n### 2. **Detection of RNA modification of MYC and JUNB transcript using rBEM_k5+2 model**\n\nThe native RNA sequences of human transcriptome from Workman et. al. (https://doi.org/10.1038/s41592-019-0617-2) is used for this example.\nThe two oncogenes transcript of MYC and JUNB is used for this example.\n\n**2.1 Download data** : The example file contains mapped reads (BAM file), gene locations (BED file), cDNA (BCF file), and chormosomes (FASTA file).\n\n2.dRNA_m6A_MYC_JUNB.tar.gz [Download](https://drive.google.com/file/d/1WBHmUfIlRTF1MwDLRp4vEHZWquzLtDIP/view?usp=sharing \"Data 2\") \n\n```\n2.dRNA_m6A_MYC_JUNB\n├── chr8_chr19.fa.gz\n├── chr8_chr19.fa.gz.fai\n├── chr8_chr19.fa.gz.gzi\n├── myc_junb.bed\n├── rna_consortium.myc_junb.bam\n├── rna_consortium.myc_junb.bam.bai\n├── rna_consortium.myc_junb.bcf\n└── rna_consortium.myc_junb.bcf.csi\n```\n\n**2.2 Run eligos comparing between native RNA and rBEM5+2 model**\n```bash\n## Run ELIGOS \neligos2 rna_mod -i rna_consortium.myc_junb.bam -bcf rna_consortium.myc_junb.bcf -reg myc_junb.bed -ref chr8_chr19.fa.gz -o results_myc_junb --pval 1e-5 --oddR 5 --esb 0.2\n```\n**2.3 Create BedGraph file format with filtering options**\n```bash\n## Create BedGraph file of ESB signal with selecting A bases and filtering out homopolymer sequence\neligos2 bedgraph -i results_myc_junb/rna_consortium.myc_junb_vs_model_on_myc_junb_baseExt0.txt --select_base A --signal ESB --homopolymer\n\n## Check output of ESB fequency in Native RNA reads (test)\nhead results_myc_junb/rna_consortium.myc_junb_vs_model_on_myc_junb_baseExt0.A.ESB_test.bdg\n\n## Check output of ESB fequency in rBEM5+2 model (ctrl)\nhead results_myc_junb/rna_consortium.myc_junb_vs_model_on_myc_junb_baseExt0.A.ESB_ctrl.bdg\n```\n\nThe genertaed bed graph file  can be use to visulaize the position of differental ESB of adenine in IGV (http://software.broadinstitute.org/software/igv/).\n\n![oddR](images/junb_myc.png)\n\nPS: dRNA ESB frequency show in \u003cspan style=\"color: #00CCCC;\"\u003ecyan\u003c/span\u003e color \u003cbr/\u003e rBEM5+2 ESB frequency shows in \u003cspan style=\"color: #FF00FF;\"\u003emargenta\u003c/span\u003e color\n\n---\n### 3. **Differential ESB analysis of synthetic plasmid to identify DNA adduct**\n\nThe native DNA sequences of a synthetic plasmid, containing one N2-Ethyl Deoxyguanine is used for this example.   \n\n**3.1 Download data** : The example file contains two mapped reads (BAM file) of N2Et and WT, gene locations (BED file), cDNA (BCF file), and plasmid (FASTA file).\n\n3.DNA_N2Et_adduct.tar.gz [Download](https://drive.google.com/file/d/14HltQGe2_JuEk9kkeCRjHXZ-m1g4AX4E/view?usp=sharing \"Data 3\") \n\n```\n3.DNA_N2Et_adduct\n├── N2Et.DNA.bam\n├── N2Et.DNA.bam.bai\n├── WT.DNA.bam\n├── WT.DNA.bam.bai\n├── WT.plasmid.bed\n├── WT.plasmid.fa.gz\n├── WT.plasmid.fa.gz.fai\n└── WT.plasmid.fa.gz.gzi\n```\n\n**3.2 Run ELIGOS comparing between N2Et modified DNA and native DNA**\n```bash\n## Filter out mapped reads shorter than 200 bases\neligos2 map_preprocess -aln 200 -i N2Et.DNA.bam\neligos2 map_preprocess -aln 200 -i WT.DNA.bam\n\n## Run ELIGOS\neligos2 pair_diff_mod -tbam N2Et.DNA.preprocess.bam -cbam WT.DNA.preprocess.bam -reg WT.plasmid.bed -ref WT.plasmid.fa.gz -o results --oddR 0 --esb 0 --pval 1 --adjPval 1\n```\n**3.3 Create BedGraph file format with filtering options**\n```bash\n## Create BedGraph file of odd ratio signal with filtering out homopolymer sequence\neligos2 bedgraph -i results/N2Et.DNA.preprocess_vs_WT.DNA.preprocess_on_WT.plasmid_baseExt0.txt --signal oddR --homopolymer\n\n## Check output\nhead results/N2Et.DNA.preprocess_vs_WT.DNA.preprocess_on_WT.plasmid_baseExt0.oddR.bdg\n```\n\nOdd ratio levels of individual nucleotide from the generated BedGraph file is plotted below and show the correct identiifation of N2-Ethyl Deoxyguanine position on the synthetic plasmid.\n\n![oddR](images/result_odd.png)\n\n---\n### 4. **Identification of RNA modifications from replicates using the Cochran-Mantel-Haenszel (CMH) Test**\n\nTo perform, comparison between two conditions with biological replicates using the CMH test\n\n**4.1 Download data** : The example file containing three rna_mod results (TSV file) of RNA-seq under Ethanol codition and three rna_mod results of Glucose condition from previous step (eligos2 rna_mod).\n\n4.example_YPL061W_baseExt0.tar.gz [Download](https://drive.google.com/file/d/1Pgvi6qFTE0tQOP0eCn17-A36C6LXdhUd/view?usp=sharing \"Data 4\") \n\n```\ntar -xvzf 4.example_YPL061W_baseExt0.tar.gz\n\n4.example_YPL061W_baseExt0\n├── drna_ethanol_11.mn200_vs_model_on_YPL061W_baseExt0.txt\n├── drna_ethanol_21.mn200_vs_model_on_YPL061W_baseExt0.txt\n├── drna_ethanol_31.mn200_vs_model_on_YPL061W_baseExt0.txt\n├── drna_glucose_12.mn200_vs_model_on_YPL061W_baseExt0.txt\n├── drna_glucose_22.mn200_vs_model_on_YPL061W_baseExt0.txt\n└── drna_glucose_32.mn200_vs_model_on_YPL061W_baseExt0.txt\n```\n\n**4.2 Run multiple samples testing**\n```bash\n## Run multi_samples_test\neligos2 multi_samples_test --test_mods 4.example_YPL061W_baseExt0/drna_ethanol_*.txt --ctrl_mods 4.example_YPL061W_baseExt0/drna_glucose_*.txt --prefix ethanol_vs_glucose\n\nwc -l ethanol_vs_glucose.CMH_testing.txt\n```\n\n---\n## Result description\n\n| column | Description |\n| ------ | ------ |\n| chrom | reference name |\n| start_loc | mapped start position |\n| end_loc | mapped end position |\n| strand | mapped direction |\n| name | region/gene name from bed file |\n| ref | reference sequence | \n| homo_seq | overlapping with homopolymer sequence |\n| kmer5 | Base 5 mers: the reference in third position |\n| majorAllel | The base with highest freqeuncy (major allele) |\n| majorAllelFreq | The highest frequency of major allele | \n| kmer7 | Base 7 mers: the reference in third position |\n| test_err_1 | Counts of error base in test sample |\n| model_err_1 | Counts of error base in error profile from rBEM model |\n| test_cor_1 | Counts of correct base in test sample |\n| model_cor_1 | Counts of correct base in error profile from rBEM model |\n| oddR | odds ratios |\n| pval | P-value |\n| adjPval | adjusted P-value |\n| baseExt | Base expansion (0, 1, or 2 bases) |\n| total_reads | Total read counts |\n| ESB_test | The percent error of specific base of test sample |\n| ESB_ctrl | The percent error of specific base of test rBEM model |\n\n---\n\n## License \u0026 copyright\n\nLicensed under the [MIT License](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/gitlab.com%2Fpiroonj%2Feligos2","html_url":"https://awesome.ecosyste.ms/projects/gitlab.com%2Fpiroonj%2Feligos2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/gitlab.com%2Fpiroonj%2Feligos2/lists"}