{"id":26572692,"url":"https://github.com/linsalrob/genetic_codes","last_synced_at":"2025-03-23T00:35:32.105Z","repository":{"id":213445452,"uuid":"734133332","full_name":"linsalrob/genetic_codes","owner":"linsalrob","description":"Python code for translating sequences using different NCBI translation tables and genetic codes.","archived":false,"fork":false,"pushed_at":"2024-04-26T06:21:28.000Z","size":1535,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-26T07:31:32.641Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linsalrob.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-12-21T00:29:37.000Z","updated_at":"2024-04-26T07:31:33.602Z","dependencies_parsed_at":"2023-12-21T04:41:25.477Z","dependency_job_id":"746b5fd4-db81-47f1-a38d-936526bc865c","html_url":"https://github.com/linsalrob/genetic_codes","commit_stats":null,"previous_names":["linsalrob/genetic_codes"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linsalrob%2Fgenetic_codes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linsalrob%2Fgenetic_codes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linsalrob%2Fgenetic_codes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linsalrob%2Fgenetic_codes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linsalrob","download_url":"https://codeload.github.com/linsalrob/genetic_codes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245040215,"owners_count":20551297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-23T00:34:38.562Z","updated_at":"2025-03-23T00:35:32.081Z","avatar_url":"https://github.com/linsalrob.png","language":"C","funding_links":[],"categories":["Sequence Analysis"],"sub_categories":["Sequence Translation"],"readme":"[![Edwards Lab](https://img.shields.io/badge/Bioinformatics-EdwardsLab-03A9F4)](https://edwards.flinders.edu.au)\n[![DOI](https://www.zenodo.org/badge/60999054.svg)](https://www.zenodo.org/badge/latestdoi/60999054)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n![GitHub language count](https://img.shields.io/github/languages/count/linsalrob/genetic_codes)\n[![PyPi](https://img.shields.io/pypi/pyversions/pygenetic-code?label=PyPi%20Versions)](https://pypi.org/project/pygenetic-code/)\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/pygenetic_code/README.html)\n\n# Genetic Codes\n\nA Python and C library with no external dependencies for translating DNA sequences into protein sequences using different translation tables (aka genetic codes).\n\nThe [NCBI Genetic Codes](https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1) are central to working with alternate genetic codes. This Python tool kit includes a library that exposes the genetic codes so you can query a codon and get its variants or query a code and get its table. We also provide fast mechanisms to translate DNA sequences into protein sequences using the translation table of your choice.\n\n# Current genetic codes:\n1. The Standard Code (transl_table=1). By default all transl_table in GenBank flatfiles are equal to id 1, and this is not shown. When transl_table is not equal to id 1, it is shown as a qualifier on the CDS feature.\n2. The Vertebrate Mitochondrial Code (transl_table=2)\n3. The Yeast Mitochondrial Code (transl_table=3)\n4. The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code (transl_table=4)\n5. The Invertebrate Mitochondrial Code (transl_table=5)\n6. The Ciliate, Dasycladacean and Hexamita Nuclear Code (transl_table=6)\n9. The Echinoderm and Flatworm Mitochondrial Code (transl_table=9)\n10. The Euplotid Nuclear Code (transl_table=10)\n11. The Bacterial, Archaeal and Plant Plastid Code (transl_table=11)\n12. The Alternative Yeast Nuclear Code (transl_table=12)\n13. The Ascidian Mitochondrial Code (transl_table=13)\n14. The Alternative Flatworm Mitochondrial Code (transl_table=14)\n15. Blepharisma Nuclear Code (transl_table=15)\n16. Chlorophycean Mitochondrial Code (transl_table=16)\n21. Trematode Mitochondrial Code (transl_table=21)\n22. Scenedesmus obliquus Mitochondrial Code (transl_table=22)\n23. Thraustochytrium Mitochondrial Code (transl_table=23)\nIt is the similar to the bacterial code (transl_table 11) but it contains an additional stop codon (TTA) and also has a different set of start codons.\n24. Rhabdopleuridae Mitochondrial Code (transl_table=24)\n25. Candidate Division SR1 and Gracilibacteria Code (transl_table=25)\n26. Pachysolen tannophilus Nuclear Code (transl_table=26)\n27. Karyorelict Nuclear Code (transl_table=27)\n28. Condylostoma Nuclear Code (transl_table=28)\n29. Mesodinium Nuclear Code (transl_table=29)\n30. Peritrich Nuclear Code (transl_table=30)\n31. Blastocrithidia Nuclear Code (transl_table=31)\n33. Cephalodiscidae Mitochondrial UAA-Tyr Code (transl_table=33)\n\n# Installation\n\nWe recommend installing `pygenetic_code` with bioconda:\n\n```bash\nmamba create -n pygenetic_code -c bioconda pygenetic_code\npygenetic_code --version\n```\n\nAlternatively, you can install `pygenetic_code` with pip.\n\n```python\npip install pygenetic_code\npygenetic_code --version\n```\n\n# Usage\n\nThere is a command line application, Python example code, and a library that you can use. The command line application and examples show you how to use the library.\n\n## Example code\n\nThese examples show you how to incorporate `pygenetic_code` into your own Python code. \n\nWe have a very simple translate function that you can use if you want to translate one (or more) ORFs. The signature is\n\n```python\ntranslate(dna_sequence, translation_table)\n```\n\nand we have a simple example that translates a sequence:\n\n```bash\npython examples/translate_a_sequence.py\n```\n\nWe can also translate DNA sequences in all six reading frames, and here is an example that reads a fasta file and translates all six frames using the bacterial genetic code (translation table 11):\n\n```bash\npython examples/translate_sequence_in_all_frames.py -f tests/JQ995537.fna -t 11\n```\n\nor an alternate genetic code (translation table 15):\n\n```bash\npython examples/translate_sequence_in_all_frames.py -f tests/JQ995537.fna -t 15\n```\n\nOr you can translate the _E. coli_ K-12 sequence, and so you can identify all the ORFs in that genome:\n\n```bash\npython examples/translate_sequence_in_all_frames.py -f tests/U00096.3.fna.gz -t 11\n```\n\n(yes, you can use gzip files without decompressing them). \n\nThis will take about 0.1 seconds to do the actual translation, but starting python and all the other overheads make it almost 3/4 second to run.\n\nYou can also look at the effect of translation tables on the same sequences by running \n\n```bash\npython examples/average_translation_length.py -f tests/JQ995537.fna # for crassphage\npython examples/average_translation_length.py -f tests/U00096.3.fna.gz # for E. coli K-12\n```\n\nWe recommend using our easy Python wrappers to access the translate functions\n\n```python\nfrom pygenetic_code import translate, six_frame_translation\n```\n\nBut you can also access our C library directly, using the `PyGeneticCode` module (see below)\n\n\n## Command line applications\n\n`pygenetic_code` translates DNA sequences either in one reading frame or in all six reading frames using the translation table of your choice.\n\nTo translate a sequence in the current reading frame, you can use\n\n```python\npygenetic_code --translate \n```\n\n\nFirst, make sure you have a DNA sequence. We provide a few in [tests/](tests/) including [a very short sequence](tests/seq.fasta), [crAssphage](tests/JQ995537.fna), and [E. coli])(tests/U00096.3.fna.gz). \n\n## Library\n\n### Using the C library directly in Python\n\nYou can import the C library by importing PyGeneticCode. \n\nThere are two main methods that you can call:\n\nThe first function just returns the translation of your DNA sequence in 5' -\u0026gt; 3' format, so for example, this is the method you might use to translate an ORF.\n\n```python\nPyGeneticCode.translate(DNA_sequence, translation_table)\n```\n\n(See [examples/translate_a_sequence.py](examples/translate_a_sequence.py_) for an example.\n\nThe second method returns all the 6 frame translations.\n\n```python\nPyGeneticCode.translate_six_frames(DNA_sequence, translation_table, verbose)\n```\n\n(See [examples/translate_sequence_in_all_frames.py](examples/translate_sequence_in_all_frames.py) for an example invocation.)\n\nThe DNA sequence is the DNA sequence you want to translate. The translation table must be one of the valid translation tables (see [pygenetic_code/genetic_code.translation_tables](pygenetic_code/genetic_code.translation_tables) for the valid tables).\n\n## Translate a codon\n\nAnother way to access the code in your python application is to access the `translate_codon()` function, that has this signature:\n\n```python\namino_acid = translate_codon(codon, translation_table=1, one_letter=False)\n```\n\nThe `codon` is the codon that you want to translate as either an RNA (e.g. `AUG`) or DNA (e.g. `ATG`) sequence. The `translation_table` is your required translation table (see the [NCBI website](https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1) for valid tables), and `one_letter` is whether to return a three letter amino acid code (e.g. `Met` or `Ter`) or a one letter amino acid code (e.g. `M` or `*`).\n\nThe library provides other ways to access the genetic codes, and those are exemplified in the `pytest` files in [tests/](tests)\n\n\n## Viewing translation tables\n\nYou can print the translation tables using the `pygenetic_code` command. There are currently a couple of options:\n\n   - `json` prints the table in machine readable json format.\n   - `difference` prints a `.tsv` file with the the difference from the standard (translation table 1) code\n   - `maxdifference` prints a `.tsv` file with the difference from the most common amino acid. The main difference is that `TGA` is more frequently tryptophan than a stop.\n   - \n# Citing\n\nPlease cite this repository as:\n\nEdwards, Robert A. 2023. pygenetic_code. https://github.com/linsalrob/genetic_codes. DOI: 10.5281/zenodo.10453453\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinsalrob%2Fgenetic_codes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinsalrob%2Fgenetic_codes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinsalrob%2Fgenetic_codes/lists"}