{"id":24446411,"url":"https://github.com/mvinyard/gtfast","last_synced_at":"2026-05-21T09:06:05.361Z","repository":{"id":40385134,"uuid":"444487955","full_name":"mvinyard/GTFast","owner":"mvinyard","description":"Lift annotations from a GTF file to an adata object.","archived":false,"fork":false,"pushed_at":"2022-05-12T15:19:13.000Z","size":50,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-25T16:15:07.541Z","etag":null,"topics":["anndata","gencode","genomics","gff","gtf","python","single-cell"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mvinyard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-01-04T16:25:34.000Z","updated_at":"2022-08-03T08:12:22.000Z","dependencies_parsed_at":"2022-08-09T19:00:29.769Z","dependency_job_id":null,"html_url":"https://github.com/mvinyard/GTFast","commit_stats":null,"previous_names":["mvinyard/anngtf"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvinyard%2FGTFast","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvinyard%2FGTFast/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvinyard%2FGTFast/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mvinyard%2FGTFast/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mvinyard","download_url":"https://codeload.github.com/mvinyard/GTFast/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243505898,"owners_count":20301619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anndata","gencode","genomics","gff","gtf","python","single-cell"],"created_at":"2025-01-21T00:00:28.883Z","updated_at":"2025-12-28T10:07:54.671Z","avatar_url":"https://github.com/mvinyard.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ![GTFast-logo](/docs/img/GTFast.logo.svg)\n\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/gtfast.svg)](https://pypi.python.org/pypi/gtfast/)\n[![PyPI version](https://badge.fury.io/py/gtfast.svg)](https://badge.fury.io/py/gtfast)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n### Installation\n\nTo install via [pip](https://pypi.org/project/gtfast):\n```BASH\npip install gtfast\n```\n\nTo install the development version: \n```BASH\ngit clone https://github.com/mvinyard/gtfast.git\n\ncd gtfast; pip install -e .\n```\n\n## Example usage\n\n### Parsing a `.gtf` file\n```python\nimport gtfast\n\ngtf_filepath = \"/path/to/ref/hg38/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/genes/genes.gtf\"\n\n```\n\nIf this is your first time using `gtfast`, run:\n```python\ngtf = gtfast.parse(path=gtf_filepath, genes=False, force=False, return_gtf=True)\n```\nRunning this function will create two `.csv` files from the given `.gtf` files - one containing all feature types and one containing only genes. Both of these files are smaller than a `.gtf` and can be loaded into memory much faster using `pandas.read_csv()` (shortcut implemented in the next function). Additionally, this function leaves a paper trail for `gtfast` to find the newly-created `.csv` files again in the future such that one does not need to pass a path to the gtf. \n\nIn the scenario in which you've already run the above function, run:\n```python\ngtf = gtfast.load() # no path necessary! \n```\n\n### Interfacing with [AnnData](https://anndata.readthedocs.io/en/stable/) and updating an `adata.var` table. \n\nIf you're workign with single-cell data, you can easily lift annotations from a **[`gtf`](https://en.wikipedia.org/wiki/Gene_transfer_format)** to your **[`adata`](https://anndata.readthedocs.io/en/stable/)** object. \n\n```python\nfrom anndata import read_h5ad\nimport gtfast\n\nadata = read_h5ad(\"/path/to/singlecell/data/adata.h5ad\")\ngtf = gtfast.load(genes=True)\n\ngtfast.add(adata, gtf)\n```\n\nSince the `gtfast` distribution already knows where the `.csv / .gtf` files are, we could directly annotate `adata` without first specifcying `gtf` as a DataFrame, saving a step but I think it's more user-friendly to see what each one looks like, first. \n\n\n### Working advantage\n\nLet's take a look at the time difference of loading a `.gtf` into memory as a `pandas.DataFrame`: \n\n```python\nimport gtfast\nimport gtfparse\nimport time\n\nstart = time.time()\ngtf = gtfparse.read_gtf(\"/home/mvinyard/ref/hg38/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/genes/genes.gtf\")\nstop = time.time()\n\nprint(\"baseline loading time: {:.2f}s\".format(stop - start), end='\\n\\n')\n\nstart = time.time()\ngtf = gtfast.load()\nstop = time.time()\n\nprint(\"GTFast loading time: {:.2f}s\".format(stop - start))\n```\n```\nbaseline loading time: 87.54s\n\nGTFast loading time: 12.46s\n```\n~ 7x speed improvement. \n\n* **Note**: This is not meant to criticize or comment on anything related to [`gtfparse`](https://github.com/openvax/gtfparse) - in fact, this library relies solely on `gtfparse` for the actual parsing of a `.gtf` file into memory as `pandas.DataFrame` and it's an amazing tool for python developers!\n\n### Contact\n\nIf you have suggestions, questions, or comments, please reach out to Michael Vinyard via [email](mailto:mvinyard@broadinstitute.org)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmvinyard%2Fgtfast","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmvinyard%2Fgtfast","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmvinyard%2Fgtfast/lists"}