{"id":17686975,"url":"https://github.com/benfulcher/allensdk","last_synced_at":"2025-05-13T00:37:17.220Z","repository":{"id":41990975,"uuid":"104984017","full_name":"benfulcher/AllenSDK","owner":"benfulcher","description":"Workflow for retrieving spatial gene-expression data from the Allen Institute's Mouse Brain Atlas","archived":false,"fork":false,"pushed_at":"2022-04-20T04:18:59.000Z","size":46,"stargazers_count":6,"open_issues_count":1,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-01T05:11:14.497Z","etag":null,"topics":["bioinformatics","geneexpression","transcriptomics"],"latest_commit_sha":null,"homepage":"","language":"MATLAB","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benfulcher.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-09-27T07:03:51.000Z","updated_at":"2024-09-04T03:01:59.000Z","dependencies_parsed_at":"2022-08-12T01:40:25.796Z","dependency_job_id":null,"html_url":"https://github.com/benfulcher/AllenSDK","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfulcher%2FAllenSDK","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfulcher%2FAllenSDK/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfulcher%2FAllenSDK/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benfulcher%2FAllenSDK/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benfulcher","download_url":"https://codeload.github.com/benfulcher/AllenSDK/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253850767,"owners_count":21973666,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","geneexpression","transcriptomics"],"created_at":"2024-10-24T10:46:33.442Z","updated_at":"2025-05-13T00:37:17.200Z","avatar_url":"https://github.com/benfulcher.png","language":"MATLAB","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AllenSDK\n\n[![DOI](https://zenodo.org/badge/104984017.svg)](https://zenodo.org/badge/latestdoi/104984017)\n[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/bendfulcher.svg?style=social\u0026label=Follow%20%40bendfulcher)](https://twitter.com/bendfulcher)\n\nThis repository contains code for:\n1. Retrieving gene-expression data from the AllenSDK; and\n2. processing it into nice structures for further analysis in Matlab.\n\nRequires Matlab and python.\nThe [AllenSDK package](http://alleninstitute.github.io/AllenSDK/install.html) for python must be installed.\n\nIf anything is unclear or needs improvement, please send questions by [raising an Issue](https://docs.github.com/en/github/managing-your-work-on-github/creating-an-issue) or [sending me an email](mailto:ben.d.fulcher@gmail.com).\n\nThis pipeline is based on code developed for [Fulcher and Fornito, _PNAS_ (2016)](https://doi.org/10.1073/pnas.1513302113), and used for [Fulcher et al., _PNAS_ (2019)](https://doi.org/10.1073/pnas.1814144116).\nIf you find this code useful, consider citing these papers if relevant to your work, or you can cite this code directly using its [DOI](https://doi.org/10.5281/zenodo.3951756).\n\n## Constructing a brain region x gene matrix\n\n### Retrieve full gene information\nYou first need to get a full list of genes, by running `AllGenes.py`.\n\nThis outputs you generic information about the genes:\n* `sectionDatasetInfo.csv` (all section data)\n* `geneInfo.csv` (gene information: acronym, entrez_id, gene_id, name)\n* `geneEntrezID.csv` (just the list of EntrezIDs)\n\n### Preparing inputs for a specific region x gene matrix\n\n#### 1. Retrieve IDs for all brain regions, `structIDs` and `structInfo`\n\nRetrieve all structure IDs of interest directly by adapting `WriteStructureInfo.py` to retrieve a custom set of structures.\n\nIf you already have structure IDs in Matlab, you can alternatively to this step using `WriteStructureIDs.m` -\u003e `structIDs_Oh.csv` and `structInfo_Oh.csv`.\n\n#### 2. Retrieve gene entrez IDs\n\nSave a list of gene entrez IDs for the genes you're interested in.\nFor all genes, you can use the `geneEntrezID.csv` file produced from `AllGenes.py` above.\nFor a subset of genes, you can adapt something like `subsetGenes.py`.\n\n#### 3. Run retrieve the expression data from the Allen API\n\nNow you've defined the structures and genes you're interested in, you can run the queries to get all combinations of expression data (of brain regions and genes).\nThis is done using `RetrieveGene.py`.\n\nNote that in `RetrieveGene.py`, variables need to be set.\n\nFirst the input files need to match the IDs saved in Steps 1 and 2 above.\n\n___Input files___\n* `structIDSource`: name of the `.csv` file of Allen structure IDs\n* `entrezSource`: name of the `.csv` file of gene entrez IDs to retrieve\n\n___Output filenames___\n\n__To set:__\n* `structInfoFilename`: saves retrieved information for the structure IDs specified.\n* `allDataFilename`: saves detailed expression information out to this file.\n\n__Generated:__\n* `expression_energy_AxB`: expression energy values for the A structures and B section datasets\n* `expression_density_AxB`: expression density values for the A structure and B section datasets\n* `dataSetIDs_Columns.csv`: dataset IDs representing each column in the above matrices\n\n## Importing data into Matlab\n\nThen you can import the resulting data into Matlab as:\n```matlab\n[GeneExpData,sectionDatasetInfo,geneInfo,structInfo] = ImportAllenToMatlab();\n```\n\nIn this function, you must specify the filenames to read in:\n* `fileNames.struct`: the structure info file specified above (`structInfoFilename`)\n* `fileNames.sectionDatasets`: full information about all datasets retrieved (`allDataFilename`)\n* `fileNames.geneInfo`:\n* `fileNames.energy`:\n* `fileNames.density`:\n* `fileNames.columns`:\n\nOutputs a processed .mat file: `AllenGeneDataset_X.mat` containing information about X unique genes.\n\n## Computing a structure mask\n\nExample pipeline:\nFirst generate `.csv` files for structure IDs and matching to structure info (for interpretation)\nE.g., for the Oh et al. 213-region parcellation:\n```matlab\nWriteStructureIDs\n```\nThis generates `structIDs_Oh.csv` and `structInfo_Oh.csv`.\nIn the python file `MakeCCFMasks`, these files are listed as inputs, such that\n```python\nMakeCCFMasks\n```\ngenerates a mask for these, saving as `mask_Oh.h5`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenfulcher%2Fallensdk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenfulcher%2Fallensdk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenfulcher%2Fallensdk/lists"}