{"id":15693241,"url":"https://github.com/lucacappelletti94/ucsc_genomes_downloader","last_synced_at":"2025-05-07T23:45:32.910Z","repository":{"id":62586073,"uuid":"188782276","full_name":"LucaCappelletti94/ucsc_genomes_downloader","owner":"LucaCappelletti94","description":"Python package to quickly download genomes from the UCSC.","archived":false,"fork":false,"pushed_at":"2022-04-29T08:14:07.000Z","size":604,"stargazers_count":7,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-07T23:45:26.558Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LucaCappelletti94.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"LucaCappelletti94"}},"created_at":"2019-05-27T06:16:41.000Z","updated_at":"2024-02-24T13:56:05.000Z","dependencies_parsed_at":"2022-11-03T22:08:02.955Z","dependency_job_id":null,"html_url":"https://github.com/LucaCappelletti94/ucsc_genomes_downloader","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fucsc_genomes_downloader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fucsc_genomes_downloader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fucsc_genomes_downloader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fucsc_genomes_downloader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LucaCappelletti94","download_url":"https://codeload.github.com/LucaCappelletti94/ucsc_genomes_downloader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252973618,"owners_count":21834105,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T18:42:22.563Z","updated_at":"2025-05-07T23:45:32.858Z","avatar_url":"https://github.com/LucaCappelletti94.png","language":"Python","funding_links":["https://github.com/sponsors/LucaCappelletti94"],"categories":[],"sub_categories":[],"readme":"UCSC Genomes Downloader\n=========================================================================================\n|python_version| |pip| |downloads|\n\nPython package to quickly download and process genomes from the UCSC website.\n\nHow do I install this package?\n----------------------------------------------\nAs usual, just download it using pip:\n\n.. code:: shell\n\n    pip install ucsc_genomes_downloader\n    \nGetting COVID-19 Genome\n----------------------------------------------\nTo download the COVID19 genome just run:\n\n.. code:: python\n\n    from ucsc_genomes_downloader import Genome\n    covid = Genome(\"wuhCor1\")\n    \n    genome = covid[\"NC_045512v2\"]\n\n\nUsage examples\n--------------\n\nSimply instantiate a new genome\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nTo download and load into memory the chromosomes of a given genomic assembly\nyou can use the following code snippet:\n\n.. code:: python\n\n    from ucsc_genomes_downloader import Genome\n    hg19 = Genome(assembly=\"hg19\")\n\nDownloading selected chromosomes\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nIf you want to select a subset of chromosomes to be downloaded\nyou can use the attribute \"chromosomes\":\n\n.. code:: python\n\n    from ucsc_genomes_downloader import Genome\n    hg19 = Genome(\"hg19\", chromosomes=[\"chr1\", \"chr2\"])\n\nGetting gaps regions\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nThe method returns a DataFrame in bed-like format\nthat contains the regions where only n or N nucleotides\nare present.\n\n.. code:: python\n\n    all_gaps = hg19.gaps() # Returns gaps (region formed of Ns) for all chromosomes\n    # Returns gaps for chromosome chrM\n    chrM_gaps = hg19.gaps(chromosomes=[\"chrM\"])\n\nGetting filled regions\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nThe method returns a DataFrame in bed-like format\nthat contains the regions where no unknown nucleotides\nare present, basically the complementary\nof the gaps method.\n\n.. code:: python\n\n    all_filled = hg19.filled() # Returns filled for all chromosomes\n    # Returns filled for chromosome chrM\n    chrM_filled = hg19.filled(chromosomes=[\"chrM\"])\n\nRemoving genome's cache\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nTo delete the cache of the genome, including chromosomes\nand metadata you can use the delete method:\n\n.. code:: python\n\n    hg19.delete()\n\nGenome objects representation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nWhen printed, a Genome object has a human-readable representation.\nThis allows you to print lists of Genome objects as follows:\n\n.. code:: python\n\n    print([\n        hg19,\n        hg38,\n        mm10\n    ])\n\n    # \u003e\u003e\u003e [\n    #    Human, Homo sapiens, hg19, 2009-02-28, 25 chromosomes,\n    #    Human, Homo sapiens, hg38, 2013-12-29, 25 chromosomes,\n    #    Mouse, Mus musculus, mm10, 2011-12-29, 22 chromosomes\n    # ]\n\nObtaining a given bed file sequences\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nGiven a pandas DataFrame in bed-like format, you can obtain\nthe corresponding genomic sequences for the loaded assembly\nusing the bed_to_sequence method:\n\n.. code:: python\n\n    my_bed = pd.read_csv(\"path/to/my/file.bed\", sep=\"\\t\")\n    sequences = hg19.bed_to_sequence(my_bed)\n\nProperties\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nA Genome object has the following properties:\n\n.. code:: python\n\n    hg19.assembly # Returns \"hg19\"\n    hg19.date # Returns \"2009-02-28\" as datetime object\n    hg19.organism # Returns \"Human\"\n    hg19.scientific_name # Returns \"Homo sapiens\"\n    hg19.description # Returns the brief description as provided from UCSC\n    hg19.path # Returns path where genome is cached\n\n\nUtilities\n-------------------------------\n\nRetrieving a list of the available genomes\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nYou can get a complete list of the genomes available\nfrom the UCSC website with the following method:\n\n.. code:: python\n\n    from ucsc_genomes_downloader.utils import get_available_genomes\n    all_genomes = get_available_genomes()\n\n\nTessellating bed files\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nCreate a tessellation of a given size of a given bed-like pandas dataframe.\n\nAvailable alignments are to the left, right or center.\n\n.. code:: python\n\n    from ucsc_genomes_downloader.utils import tessellate_bed\n    import pandas as pd\n\n    my_bed = pd.read_csv(\"path/to/my/file.bed\", sep=\"\\t\")\n    tessellated = tessellate_bed(\n        my_bed,\n        window_size=200,\n        alignment=\"left\"\n    )\n\nExpand bed files regions\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nExpand a given dataframe in bed-like format using selected alignment.\n\nAvailable alignments are to the left, right or center.\n\n.. code:: python\n\n    from ucsc_genomes_downloader.utils import expand_bed_regions\n    import pandas as pd\n\n    my_bed = pd.read_csv(\"path/to/my/file.bed\", sep=\"\\t\")\n    expanded = expand_bed_regions(\n        my_bed,\n        window_size=1000,\n        alignment=\"left\"\n    )\n\nWiggle bed files regions\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nGenerate new bed regions based on a given bed file by wiggling the\ninitial regions.\n\n.. code:: python\n\n    from ucsc_genomes_downloader.utils import wiggle_bed_regions\n    import pandas as pd\n\n    my_bed = pd.read_csv(\"path/to/my/file.bed\", sep=\"\\t\")\n    expanded = wiggle_bed_regions(\n        my_bed,\n        max_wiggle_size=100, # Maximum amount to wiggle region\n        wiggles=10, # Number of wiggled samples to introduce\n        seed=42 # Random seed for reproducibility\n    )\n\n.. _hg19: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/\n\n.. |pip| image:: https://badge.fury.io/py/ucsc-genomes-downloader.svg\n    :target: https://badge.fury.io/py/ucsc-genomes-downloader\n    :alt: Pypi project\n\n.. |downloads| image:: https://pepy.tech/badge/ucsc-genomes-downloader\n    :target: https://pepy.tech/badge/ucsc-genomes-downloader\n    :alt: Pypi total project downloads\n\n.. |python_version| image:: https://img.shields.io/badge/python-3.x-blue\n    :alt: Supported Python Versions","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucacappelletti94%2Fucsc_genomes_downloader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucacappelletti94%2Fucsc_genomes_downloader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucacappelletti94%2Fucsc_genomes_downloader/lists"}