{"id":23028845,"url":"https://github.com/nchenche/orfmap","last_synced_at":"2025-06-19T01:38:44.618Z","repository":{"id":201754784,"uuid":"290749181","full_name":"nchenche/orfmap","owner":"nchenche","description":null,"archived":false,"fork":false,"pushed_at":"2023-11-17T10:28:54.000Z","size":178754,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-02T20:25:30.684Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nchenche.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-08-27T10:44:46.000Z","updated_at":"2023-11-16T15:53:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"e9f62035-3df9-4d3f-9515-0801b736509c","html_url":"https://github.com/nchenche/orfmap","commit_stats":null,"previous_names":["nchenche/orfmap"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/nchenche/orfmap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nchenche%2Forfmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nchenche%2Forfmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nchenche%2Forfmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nchenche%2Forfmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nchenche","download_url":"https://codeload.github.com/nchenche/orfmap/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nchenche%2Forfmap/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260665121,"owners_count":23044265,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-15T14:12:59.971Z","updated_at":"2025-06-19T01:38:39.578Z","avatar_url":"https://github.com/nchenche.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ORFMap\nORFMap - A tool aimed at scanning a genome for stop-codons delimited sequences (ORFs) and annotating them.\n\n## Summary\n* \u003cp\u003e\u003ca href=\"#Description\"\u003eDescription\u003c/a\u003e\u003c/p\u003e\n* \u003cp\u003e\u003ca href=\"#installation\"\u003eInstallation\u003c/a\u003e\u003c/p\u003e\n* \u003cp\u003e\u003ca href=\"#usage_descr\"\u003eUsage description\u003c/a\u003e\u003c/p\u003e\n* \u003cp\u003e\u003ca href=\"#usage_ex\"\u003eSome usage examples\u003c/a\u003e\u003c/p\u003e\n\n\u003ch2\u003e\u003ca name=\"descr\"\u003eDescription\u003c/a\u003e\u003c/h2\u003e\n\nFrom a genomic fasta file and its associated GFF, the program first scans the genome to retrieve all sequences\ndelimited by stop codons. Only sequences of at least 60 nucleotides long are kept by default.\n\nThose so-called ORF sequences are then annotated depending upon GFF element type(s) used as a reference.\nThe CDS element type is always used as a reference but others can be added.\n\nBy default an ORF sequence has 5 possible annotations:\n\n| ORF annotation | Condition |\n| --- | --- |\n| c_CDS |if the ORF overlap with a CDS in the same phase |\n| nc_5-CDS | if the 5' extremity of the c_CDS is at least 60 nucleotides long |\n| nc_3-CDS | if the 3' extremity of the c_CDS is at least 60 nucleotides long |\n| nc_ovp-CDS | if the ORF overlap with a CDS in a different phase |\n| nc_intergenic | if the ORF do not overlap with anything |\n \n**Note:** \nIf an ORF sequence is tagged as 'c_CDS', this sequence is further processed to be cut at its 5' and 3' extremities that do not overlap with the CDS. If their length is above or equal to 60 nucleotides, then those subsequences can be assigned as nc_5-CDS and/or nc_3-CDS.\n \u003cbr\u003e\u003c/br\u003e\n \u003cbr\u003e\u003c/br\u003e\n \nThe user can also specify what GFF element type(s) can be used as reference(s) to annotate ORF sequences in addition to the CDS type. For instance, if the user adds the tRNA element type, ORF sequences could now be assigned as nc_ovp-tRNA if they overlap with a tRNA. Thus 6 assignments would now be possible for an ORF sequence:\n\n| ORF annotation | Condition |\n| --- | --- |\n| c_CDS |if the ORF overlap with a CDS in the same phase |\n| nc_5-CDS | if the 5' extremity of the c_CDS is at least 60 nucleotides long |\n| nc_3-CDS | if the 3' extremity of the c_CDS is at least 60 nucleotides long |\n| nc_ovp-CDS | if the ORF overlap with a CDS in a different phase |\n| nc_ovp-tRNA | if the ORF overlap with a tRNA |\n| nc_intergenic | if the ORF do not overlap with anything |\n\n**Note on default parameters**:\n* CDS is the only element type used as a reference to annotate ORF sequences.\n* the minimum nucleotide number required to consider an ORF sequence is set at 60 nucleotides\n* an ORF sequence is considered as overlapping with an element (e.g. CDS) if at least 70 % of its sequence overlap with the element or if this element is totally included within the ORF sequence\n\n\n\u003ch2\u003e\u003ca name=\"installation\"\u003eInstallation\u003c/a\u003e\u003c/h2\u003e\n\n### 1. Download and uncompress the latest release archive\n\n#### Download the latest release\nLatest release: \n[ ![](./documentation/images/download-flat/16x16.png \"Click to download the latest release\")](https://github.com/nchenche/orfmap/releases/latest/)\n\n#### Uncompress the archive\nIf you downloaded:\n* the *.zip* file: ```unzip orfmap-x.x.x.zip```\n* the *.tar.gz* file: ```tar xzvf orfmap-x.x.x.tar.gz```\n\n\n### 2. Create an isolated environment\nAlthough not strictly necessary, this step is highly recommended (it will allow you to work on different projects without having\nany conflicting library versions).\n \n#### Install virtualenv\n``` python\npython3 -m pip install virtualenv\n```\n\n#### Create a virtual python3 environment\n```bash\nvirtualenv -p python3 my_env\n```\n\n#### Activate the created environment\n```bash\nsource my_env/bin/activate\n```\n\nOnce activated, any python library you'll install using pip will be installed solely in this isolated environment.\nEvery time you'll need to work with libraries installed in this environment (i.e. work on your project), you'll have\nto activate it. \n\nOnce you're done working on your project, simply type `deactivate` to exit the environment.\n\n\n### 3. Install ORFMap in your isolated environment\n\nBe sure you're virtual environment is activated, and then follow the procedure described below.\n\n#### Go to the ORFMap directory\n \n```bash\ncd orfmap-x.x.x/\n```\n\n#### Install \n\n```python\npython setup.py install\n```\n\nor \n```python\npip install .\n```\n\n\n\u003ch2\u003e\u003ca name=\"usage_descr\"\u003eUsage description\u003c/a\u003e\u003c/h2\u003e\n\nTo see all options available:\n\n```\nrun_orfmap -h\n```\n\nThis command will show:\n\n\u003cpre\u003eusage: run_orfmap [-h] -fna [FNA] -gff [GFF] [-chr [CHR]] [-types_only TYPES_ONLY [TYPES_ONLY ...]]\n                  [-types_except TYPES_EXCEPT [TYPES_EXCEPT ...]] [-o_include O_INCLUDE [O_INCLUDE ...]] [-o_exclude O_EXCLUDE [O_EXCLUDE ...]]\n                  [-orf_len [ORF_LEN]] [-co_ovp [CO_OVP]] [-out [OUT]] [--show-types] [--show-chrs]\n\nGenomic mapping of pseudo-ORF\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -fna [FNA]            Genomic fasta file (.fna)\n  -gff [GFF]            GFF annotation file (.gff)\n  -chr [CHR]            Chromosome name\n  -types_only TYPES_ONLY [TYPES_ONLY ...]\n                        Type feature(s) to use as reference(s) (\u0026apos;CDS\u0026apos; in included by default).\n  -types_except TYPES_EXCEPT [TYPES_EXCEPT ...]\n                        Type feature(s) to not consider as reference(s) (None by default).\n  -o_include O_INCLUDE [O_INCLUDE ...]\n                        Type feature(s) and/or Status attribute(s) desired to be written in the output (all by default).\n  -o_exclude O_EXCLUDE [O_EXCLUDE ...]\n                        Type feature(s) and/or Status attribute(s) desired to be excluded (None by default).\n  -orf_len [ORF_LEN]    Minimum number of nucleotides required to define a sequence between two consecutive stop codons as an ORF sequence (60\n                        nucleotides by default).\n  -co_ovp [CO_OVP]      Cutoff defining the minimum CDS overlapping ORF fraction required to label on ORF as a CDS. By default, an ORF sequence\n                        will be tagged as a CDS if at least 70 per cent of its sequence overlap with the CDS sequence.\n  -out [OUT]            Output directory\n  --show-types          Print all type features\n  --show-chrs           Print all chromosome names\n\u003c/pre\u003e\n\nExcept -fna and -gff arguments that are mandatory, all others are optional.\n\n\n### Basic run\n\nORFMap requires two input files: \n* a genomic fasta file (-fna)\n* its associated GFF file (-gff).\n\n\nThe most basic run can be executed by typing:\n\n```\nrun_orfmap -fna mygenome.fna -gff mygenome.gff\n```\n\nAll of the ORF sequences are annotated relative to the CDS element type only. Thus 5 possible annotations are possible:\n\n| ORF annotation | Condition |\n| --- | --- |\n| c_CDS |if the ORF overlap with a CDS in the same phase |\n| nc_5-CDS | if the 5' extremity of the c_CDS is at least 60 nucleotides long |\n| nc_3-CDS | if the 3' extremity of the c_CDS is at least 60 nucleotides long |\n| nc_ovp-CDS | if the ORF overlap with a CDS in a different phase |\n| nc_intergenic | if the ORF do not overlap with anything |\n\n\nThe output will be two separated files with the prefix \"mapping_orf_\":\n* mapping_orf_mygenome.fa: \ta proteic fasta file of all the ORFs sequences found\n* mapping_orf_mygenome.gff:\tA GFF file describing all the ORFs sequences found\n  \nBy default, the two output files will contain all possible 5 annotations mentionned above.\n\n\n\u003ch2\u003e\u003ca name=\"usage_ex\"\u003eSome usage examples\u003c/a\u003e\u003c/h2\u003e\n\nBy default, all element types (except 'region' and 'chromosome') found in the GFF file are used as reference\nto annotate ORF sequences. If an ORF sequence overlaps with more than 2 elements, then the ORF sequence will be assigned\naccording to the element with which it overlaps the most. For instance, let's say an ORF sequence overlaps at 85% with\na tRNA and at 90% with a sRNA, then the ORF will be assigned as nc-ovp_sRNA.\nNote that the CDS element type always has the priority relative to any other element types. Therefore, if an ORF \nsequence overlaps at 72% with a CDS and at 95% with an other element that is not a CDS, then the ORF will be assigned as\nc_CDS. When an ORF sequence entirely overlaps with multiple elements, then the choice for its  assignment is quite\narbitrary : the ORF will be assigned depending on the first element met in the GFF. That case could appear for \nintrinsically related elements such as gene, exon and mRNA. For example, let's say an ORF sequence equally overlaps with\nan exon and a gene region (but there's no overlap with the CDS part). Since the gene normally appears firt in the GFF \nfile, the ORF will be assigned as nc-ovp_gene. In order to avoid those special cases, an option allows the user specify \nelement types that should not be considered as reference for the ORF assignment. \n\n\n\n\nIn the case where an ORF sequence overlaps \n\n##### Use tRNA and snRNA element as a reference to annotate ORF sequences:\n```\nrun_orfmap -fna mygenome.fna -gff mygenome.gff -types_only tRNA snRNA -out myResults\n```\n\n\n\n##### Use tRNA and snRNA element as a reference to annotate ORF sequences:\n```\nrun_orfmap -fna mygenome.fna -gff mygenome.gff -types_only tRNA snRNA -out myResults\n```\n\n##### Write in output files only ORF sequences mapped as nc_ovp-tRNA and nc_ovp-snRNA:\n```\nrun_orfmap -fna mygenome.fna -gff mygenome.gff -types_only tRNA snRNA -o_include nc_ovp-tRNA nc_ovp-snRNA -out myResults\n```\n\n##### Write in output files all ORF sequences except those mapped as c_CDS:\n```\nrun_orfmap -fna mygenome.fna -gff mygenome.gff -type tRNA snRNA -o_exclude c_CDS -out myResults\n```\n\n##### or:\n```\nrun_orfmap -fna mygenome.fna -gff mygenome.gff -type tRNA snRNA -o_exclude coding -out myResults\n```\n\n\u003cem\u003eNote\u003c/em\u003e:\n\u003cp\u003e\n-o_include and -o_exclude take either feature types or a status attribute as arguments.\nFeature types have to be amongst the possible annotations for ORF sequences (e.g. c_CDS, nc_5-CDS, nc_intergenic...)\n while status attribute is either 'coding' or 'non-coding' ('coding' refers to c_CDS and 'non-coding' refers to the other ones).\n \u003c/p\u003e\n\n\n##### Assign ORF seqences if stop-to-stop length is at least 50 nucleotides:\n```\nrun_orfmap -fna mygenome.fna -gff mygenome.gff -orf_len 50\n```\n\n##### Consider an ORF sequence as overlapping with any element if at least 60 % of its sequence overlap with the element:\n```\nrun_orfmap -fna mygenome.fna -gff mygenome.gff -co_ovp 0.6\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnchenche%2Forfmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnchenche%2Forfmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnchenche%2Forfmap/lists"}