{"id":18077489,"url":"https://github.com/stephenturner/kgp","last_synced_at":"2025-04-12T09:31:52.461Z","repository":{"id":59227242,"uuid":"534582959","full_name":"stephenturner/kgp","owner":"stephenturner","description":"1000 Genomes Project Metadata R Package","archived":false,"fork":false,"pushed_at":"2022-12-21T11:20:07.000Z","size":1025,"stargazers_count":18,"open_issues_count":5,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-04-29T20:05:34.788Z","etag":null,"topics":["1000genomes","bioinformatics","genetics","genomics","metadata","population-genetics","sequencing"],"latest_commit_sha":null,"homepage":"https://stephenturner.github.io/kgp/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stephenturner.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null}},"created_at":"2022-09-09T09:38:46.000Z","updated_at":"2024-04-22T16:15:55.000Z","dependencies_parsed_at":"2023-01-30T03:31:01.573Z","dependency_job_id":null,"html_url":"https://github.com/stephenturner/kgp","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stephenturner%2Fkgp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stephenturner%2Fkgp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stephenturner%2Fkgp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stephenturner%2Fkgp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stephenturner","download_url":"https://codeload.github.com/stephenturner/kgp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223510342,"owners_count":17157306,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["1000genomes","bioinformatics","genetics","genomics","metadata","population-genetics","sequencing"],"created_at":"2024-10-31T11:44:32.165Z","updated_at":"2024-11-07T12:03:52.965Z","avatar_url":"https://github.com/stephenturner.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# kgp\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/kgp)](https://CRAN.R-project.org/package=kgp)\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\n[![arXiv](https://img.shields.io/badge/arXiv-2210.00539-b31b1b.svg)](https://arxiv.org/abs/2210.00539)\n\u003c!-- badges: end --\u003e\n\nThis kgp data package provides metadata about populations and data about samples from the 1000 Genomes Project, including the 2,504 samples sequenced for the Phase 3 release and the expanded collection of 3,202 samples with 602 additional trios.\n\n## Installation\n\nYou can install the released version of kgp from [CRAN](https://CRAN.R-project.org/package=kgp) with:\n\n```r\ninstall.packages(\"kgp\")\n```\n\nYou can install the development version of kgp from [GitHub](https://github.com/stephenturner/kgp) with:\n\n```r\n# install.packages(\"devtools\")\ndevtools::install_github(\"stephenturner/kgp\")\n```\n\n## About the data\n\nThe 1000 Genomes Project data Phase 3 data contains 2,504 samples with sequence data available, and was later expanded to 3,202 samples with high coverage adding 602 trios. Data is available through the [1000 Genomes FTP site](http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/) and [GitHub](https://github.com/igsr/1000Genomes_data_indexes/). \n\n- Pilot publication: [An integrated map of genetic variation from 1,092 human genomes](https://www.nature.com/articles/nature11632)\n- Phase 1 publication: [A map of human genome variation from population scale sequencing](https://www.nature.com/articles/nature09534)\n- Phase 3 publication: [A global reference for human genetic variation](https://www.nature.com/articles/nature15393)\n- Expanded high-coverage publication: [High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios](https://pubmed.ncbi.nlm.nih.gov/36055201/)\n\nThere are three data sets available in the kgp package.\n\n```{r example}\nlibrary(kgp)\ndata(kgp)\n```\n\nThe `kgp3` data contains pedigree and population information for the 2,504 samples included in the Phase 3 release of the 1000 Genomes Project data.\n\n```{r}\nkgp3\n```\n\nThe `kgpe` data contains pedigree and population information all 3,202 samples included in the expanded 1000 Genomes Project data, which includes 602 trios.\n\n```{r}\nkgpe\n```\n\nThe `kgpmeta` contains population metadata for the 26 populations across five continental regions.\n\n```{r}\nkgpmeta\n```\n\n## Examples\n\n```{r, message=FALSE, warning=FALSE}\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(kgp)\ndata(kgp)\n```\n\nCount the number of samples in each region, or in each population: \n\n```{r}\nkgp3 %\u003e% \n  count(region) %\u003e% \n  knitr::kable()\n```\n\n```{r}\nkgp3 %\u003e% \n  count(region, population) %\u003e% \n  knitr::kable()\n```\n\n```{r kgp3barplot, fig.width=9, fig.height=12}\nkgp3 %\u003e% \n  count(region, population) %\u003e% \n  arrange(region, n) %\u003e% \n  mutate(population=forcats::fct_inorder(population)) %\u003e% \n  ggplot(aes(population, n)) + \n  geom_col(aes(fill=region)) + \n  labs(fill=NULL, x=NULL, x=\"N\") + \n  coord_flip() + \n  theme_bw() + \n  theme(legend.position=\"bottom\")\n```\n\nThe latitude and longitude coordinates in `kgpmeta` can be used to plot a map of the locations of the 1000 Genomes populations. There is also a column for region color, which provides a hexadecimal color code to enable reproduction of the population data map as shown on the IGSR population data page. The figure below shows a static map produced using ggplot2, but interactive maps such as that shown on the IGSR population data portal can be created with the leaflet package.\n\n```{r kgpmap, fig.cap=\"Map showing locations of the 1000 Genomes Phase 3 populations.\", fig.width=8, fig.height=6}\npal \u003c- kgpmeta %\u003e% distinct(reg, regcolor) %\u003e% tibble::deframe()\nggplot() + \n  geom_polygon(data=map_data(\"world\"), \n               aes(long, lat, group=group), \n               col=\"gray30\", fill=\"gray95\", lwd=.2, alpha=.5) + \n  geom_point(data=kgpmeta, aes(lng, lat, col=reg), size=4) + \n  scale_colour_manual(values=pal) +\n  theme_minimal() + \n  theme(axis.ticks = element_blank(), \n        axis.text = element_blank(), \n        axis.title = element_blank(), \n        legend.title = element_blank(),\n        panel.grid = element_blank(),\n        legend.position = \"bottom\")\n```\n\nThe table below shows a selection of samples from `kgpe` showing pedigree information for each sample. This pedigree information could be used in downstream analysis to filter out related individuals, select only trios, or to visualize family structure.\n\n```{r kgpe}\nkgpe %\u003e% \n  filter(pid!=\"0\" \u0026 mid!=\"0\") %\u003e% \n  group_by(pop) %\u003e% \n  slice(1) %\u003e% \n  head(12) %\u003e% \n  arrange(reg, pop) %\u003e% \n  select(fid:reg) %\u003e% \n  select(-sexf) %\u003e% \n  knitr::kable()\n```\n\nThe figure below shows an example of a pedigree plot made by parsing the pedigree information using [skater](https://cran.r-project.org/package=skater) and plotting using [kinship2](https://cran.r-project.org/package=kinship2). The skater package provides documentation, examples, and a vignette demonstrating how to iteratively plot all pedigrees in a given data set.\n\n```{r pedplot, fig.height=5, fig.width=8, fig.cap=\"Trios in 1000 Genomes Project family 13291.\"}\nkgpe %\u003e% \n  filter(fid==\"13291\") %\u003e% \n  transmute(fid, id, dadid=pid, momid=mid, sex, affected=1) %\u003e% \n  skater::fam2ped() %\u003e% \n  pull(ped) %\u003e% \n  purrr::pluck(1) %\u003e% \n  kinship2::plot.pedigree(mar=c(4,2,4,2), cex=.8)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstephenturner%2Fkgp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstephenturner%2Fkgp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstephenturner%2Fkgp/lists"}