{"id":21962522,"url":"https://github.com/CellArr/GenomicArrays","last_synced_at":"2025-07-22T13:32:09.766Z","repository":{"id":260220824,"uuid":"879935104","full_name":"BiocPy/GenomicArrays","owner":"BiocPy","description":null,"archived":false,"fork":false,"pushed_at":"2024-11-18T16:29:46.000Z","size":278,"stargazers_count":1,"open_issues_count":4,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-18T17:52:43.165Z","etag":null,"topics":["ml","tiledb"],"latest_commit_sha":null,"homepage":"https://biocpy.github.io/GenomicArrays/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BiocPy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-28T20:11:30.000Z","updated_at":"2024-11-11T19:14:50.000Z","dependencies_parsed_at":"2024-11-18T17:33:12.324Z","dependency_job_id":"b8d31c6f-1714-4d8f-81bf-8d31065f2976","html_url":"https://github.com/BiocPy/GenomicArrays","commit_stats":null,"previous_names":["biocpy/genomicarrays"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FGenomicArrays","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FGenomicArrays/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FGenomicArrays/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BiocPy%2FGenomicArrays/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BiocPy","download_url":"https://codeload.github.com/BiocPy/GenomicArrays/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227101617,"owners_count":17731166,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ml","tiledb"],"created_at":"2024-11-29T10:42:51.434Z","updated_at":"2025-07-22T13:32:09.760Z","avatar_url":"https://github.com/BiocPy.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- These are examples of badges you might want to add to your README:\n     please update the URLs accordingly\n\n[![Built Status](https://api.cirrus-ci.com/github/\u003cUSER\u003e/GenomicArrays.svg?branch=main)](https://cirrus-ci.com/github/\u003cUSER\u003e/GenomicArrays)\n[![ReadTheDocs](https://readthedocs.org/projects/GenomicArrays/badge/?version=latest)](https://GenomicArrays.readthedocs.io/en/stable/)\n[![Coveralls](https://img.shields.io/coveralls/github/\u003cUSER\u003e/GenomicArrays/main.svg)](https://coveralls.io/r/\u003cUSER\u003e/GenomicArrays)\n[![PyPI-Server](https://img.shields.io/pypi/v/GenomicArrays.svg)](https://pypi.org/project/GenomicArrays/)\n[![Conda-Forge](https://img.shields.io/conda/vn/conda-forge/GenomicArrays.svg)](https://anaconda.org/conda-forge/GenomicArrays)\n[![Monthly Downloads](https://pepy.tech/badge/GenomicArrays/month)](https://pepy.tech/project/GenomicArrays)\n[![Twitter](https://img.shields.io/twitter/url/http/shields.io.svg?style=social\u0026label=Twitter)](https://twitter.com/GenomicArrays)\n--\u003e\n\n[![PyPI-Server](https://img.shields.io/pypi/v/GenomicArrays.svg)](https://pypi.org/project/GenomicArrays/)\n![Unit tests](https://github.com/CellArr/GenomicArrays/actions/workflows/run-tests.yml/badge.svg)\n\n# Genomic Arrays based on TileDB\n\nGenomicArrays is a Python package for converting genomic data from BigWig format to TileDB arrays.\n\n## Installation\n\nInstall the package from [PyPI](https://pypi.org/project/genomicarrays/)\n\n```sh\npip install genomicarrays\n```\n\n## Quick Start\n\n### Build a `GenomicArray`\n\nBuilding a `GenomicArray` generates 3 TileDB files in the specified output directory:\n\n- `feature_annotation`: A TileDB file containing input feature intervals.\n- `sample_metadata`: A TileDB file containing sample metadata, each BigWig file is considered a sample.\n- A matrix TileDB file named by the `layer_matrix_name` parameter. This allows the package\nto store multiple different matrices, e.g. 'coverage', 'some_computed_statistic', for the same interval,\nand sample metadata attributes.\n\nThe organization is inspired by the [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) data structure. The TileDB matrix file is stored in a **features X samples** orientation.\n\n![`GenomicArray` structure](./assets/genarr.png \"GenomicArray\")\n\nTo build a `GenomicArray` from a collection of `BigWig` files:\n\n```python\nimport numpy as np\nimport tempfile\nimport genomicarrays as garr\n\n# Create a temporary directory, this is where the\n# output files are created. Pick your location here.\ntempdir = tempfile.mkdtemp()\n\n# List BigWig paths\nbw_dir = \"your/biwig/dir\"\nfiles = os.listdir(bw_dir)\nbw_files = [f\"{bw_dir}/{f}\" for f in files]\n\nfeatures = pd.DataFrame({\n     \"seqnames\": [\"chr1\", \"chr1\"],\n     \"starts\": [1000, 2000],\n     \"ends\": [1500, 2500]\n})\n\n# Build GenomicArray\ndataset = garr.build_genomicarray(\n     files=bw_files,\n     output_path=tempdir,\n     features=features,\n     # Specify a fasta file to extract sequences\n     # for each region in features\n     genome_fasta=\"path/to/genome.fasta\",\n     # agg function to summarize mutiple values\n     # from bigwig within an input feature interval.\n     feature_annotation_options=garr.FeatureAnnotationOptions(\n        aggregate_function = np.nanmean\n     ),\n     # for parallel processing multiple bigwig files\n     num_threads=4\n)\n```\n\n\u003e [!NOTE]\n\u003e - The aggregate function is expected to return either a scalar value or a 1-dimensional NumPy ndarray. If the later, users need to specify the expected dimension of the return array. e.g. \n\u003e   ```python\n\u003e         feature_annotation_options=garr.FeatureAnnotationOptions(\n\u003e               aggregate_function = my_custom_func,\n\u003e               expected_agg_function_length = 10,\n\u003e          ),\n\u003e - The build process stores missing intervals from a bigwig file as `np.nan`. The default is to choose an aggregate functions that works with `np.nan`.\n\n\n\n### Query a `GenomicArrayDataset`\n\nUsers have the option to reuse the `dataset` object retuned when building the arrays or by creating a `GenomicArrayDataset` object by initializing it to the path where the files were created.\n\n```python\n# Create a GenomicArrayDataset object from the existing dataset\ndataset = GenomicArrayDataset(dataset_path=tempdir)\n\n# Query data for the first 10 regions across all samples\ncoverage_data = dataset[0:10, :]\n\nprint(expression_data.matrix)\nprint(expression_data.feature_annotation)\n```\n\n     ## output 1\n     array([[1. , 0.5],\n          [1. , 0.5],\n          [1. , 0.5],\n          [1. , 0.5],\n          [1. , 0.5],\n          [1. , 0.5],\n          [1. , 0.5],\n          [1. , 0.5],\n          [1. , 0.5],\n          [1. , 0.5],\n          [1. , nan]], dtype=float32)\n\n     ## output 2\n     seqnames  starts  ends  genarr_feature_index\n     0      chr1     300   315                     0\n     1      chr1     320   335                     1\n     2      chr1     340   355                     2\n     3      chr1     360   375                     3\n     4      chr1     380   395                     4\n     5      chr1     400   415                     5\n     6      chr1     420   435                     6\n     7      chr1     440   455                     7\n     8      chr1     460   475                     8\n     9      chr1     480   495                     9\n     10     chr1     500   515                    10\n\n\n\u003c!-- pyscaffold-notes --\u003e\n\n## Note\n\nThis project has been set up using PyScaffold 4.6. For details and usage\ninformation on PyScaffold see https://pyscaffold.org/.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCellArr%2FGenomicArrays","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCellArr%2FGenomicArrays","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCellArr%2FGenomicArrays/lists"}