{"id":24847372,"url":"https://github.com/quantgen/symdmatrix","last_synced_at":"2025-03-26T10:10:02.933Z","repository":{"id":56936145,"uuid":"45495066","full_name":"QuantGen/symDMatrix","owner":"QuantGen","description":"Partitioned Symmetric Matrices","archived":false,"fork":false,"pushed_at":"2020-08-02T12:34:43.000Z","size":435,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":5,"default_branch":"master","last_synced_at":"2023-11-20T16:06:57.009Z","etag":null,"topics":["cran","genetics","genomics","r","r-pkg"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QuantGen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-11-03T20:58:08.000Z","updated_at":"2020-08-02T12:34:46.000Z","dependencies_parsed_at":"2022-08-21T06:20:38.088Z","dependency_job_id":null,"html_url":"https://github.com/QuantGen/symDMatrix","commit_stats":null,"previous_names":[],"tags_count":5,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantGen%2FsymDMatrix","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantGen%2FsymDMatrix/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantGen%2FsymDMatrix/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QuantGen%2FsymDMatrix/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QuantGen","download_url":"https://codeload.github.com/QuantGen/symDMatrix/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245632423,"owners_count":20647193,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cran","genetics","genomics","r","r-pkg"],"created_at":"2025-01-31T11:20:04.952Z","updated_at":"2025-03-26T10:10:02.901Z","avatar_url":"https://github.com/QuantGen.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"symDMatrix\n==========\n\n[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/symDMatrix)](https://CRAN.R-project.org/package=symDMatrix)\n[![Rdoc](http://www.rdocumentation.org/badges/version/symDMatrix)](http://www.rdocumentation.org/packages/symDMatrix)\n[![Travis-CI Build Status](https://travis-ci.org/QuantGen/symDMatrix.svg?branch=master)](https://travis-ci.org/QuantGen/symDMatrix)\n[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/7cei04chrkvxke27?svg=true)](https://ci.appveyor.com/project/agrueneberg/symdmatrix)\n[![Coverage status](https://codecov.io/gh/QuantGen/symDMatrix/branch/master/graph/badge.svg)](https://codecov.io/github/QuantGen/symDMatrix?branch=master)\n\nsymDMatrix is an R package that provides symmetric matrices partitioned into file-backed blocks.\n\nA symmetric matrix `G` is partitioned into blocks as follows:\n\n```\n+ --- + --- + --- +\n| G11 | G12 | G13 |\n+ --- + --- + --- +\n| G21 | G22 | G23 |\n+ --- + --- + --- +\n| G31 | G32 | G33 |\n+ --- + --- + --- +\n```\n\nBecause the matrix is assumed to be symmetric (i.e., `Gij` equals `Gji`), only the diagonal and upper-triangular blocks are stored and the other blocks are virtual transposes of the corresponding diagonal blocks. Each block is a file-backed matrix of type `ff_matrix` of the [ff](https://CRAN.R-project.org/package=ff) package.\n\nThe package defines the class and multiple methods that allow treating this file-backed matrix as a standard RAM matrix.\n\n\nTutorial\n--------\n\n### (0) Creating a symmetric matrix in RAM\n\nBefore we start, let's create a symmetric matrix in RAM.\n\n```R\nlibrary(BGLR)\n\n# Load genotypes from a mice data set\ndata(mice)\nX \u003c- mice.X\nrownames(X) \u003c- paste0(\"ID_\", 1:nrow(X))\n\n# Compute a symmetric genetic relationship matrix (G matrix) in RAM\nG1 \u003c- tcrossprod(scale(X))\nG1 \u003c- G1 / mean(diag(G1))\n```\n\n### (1) Converting a RAM matrix into a symDMatrix\n\nIn practice, if we can hold a matrix in RAM, there is not much of a point to convert it to a `symDMatrix` object; however, this will help us to get started.\n\n```R\nlibrary(symDMatrix)\n\nG2 \u003c- as.symDMatrix(G1, blockSize = 400, vmode = \"double\", folderOut = \"mice\")\n```\n\n### (2) Exploring operators\n\nNow that we have a `symDMatrix` object, let's illustrate some operators.\n\n```R\n# Basic operators applied to a matrix in RAM and to a symDMatrix\n\n# Dimension operators\nall.equal(dim(G1), dim(G2))\nnrow(G1) == nrow(G2)\nncol(G1) == ncol(G2)\nall.equal(diag(G1), diag(G2))\n\n# Names operators\nall.equal(dimnames(G1), dimnames(G2))\nall(rownames(G1) == rownames(G2))\nall(colnames(G1) == rownames(G2))\n\n# Block operators\nnBlocks(G2)\nblockSize(G2)\n\n# Indexing (can use booleans, integers or labels)\nG2[1:2, 1:2]\nG2[c(\"ID_1\", \"ID_2\"), c(\"ID_1\", \"ID_2\")]\ntmp \u003c- c(TRUE, TRUE, rep(FALSE, nrow(G2) - 2))\nG2[tmp, tmp]\nhead(G2[tmp, ])\n\n# Exhaustive check of indexing\nfor (i in 1:100) {\n    n1 \u003c- sample(1:50, size = 1)\n    n2 \u003c- sample(1:50, size = 1)\n    i1 \u003c- sample(1:nrow(X), size = n1)\n    i2 \u003c- sample(1:nrow(X), size = n2)\n    TMP1 \u003c- G1[i1, i2, drop = FALSE]\n    TMP2 \u003c- G2[i1, i2, drop = FALSE]\n    stopifnot(all.equal(TMP1, TMP2))\n}\n```\n\n### (3) Creating a symDMatrix from genotypes\n\nThe function `getG_symDMatrix` of the [BGData](https://CRAN.R-project.org/package=BGData) package computes G=XX' (with options for centering and scaling) without ever loading G in RAM. It creates the `symDMatrix` object directly, block by block. In this example, `X` is a matrix in RAM. For large genotype data sets, `X` could be a file-backed matrix, e.g., a `BEDMatrix` or `ff` object.\n\n```R\nlibrary(BGData)\n\nG3 \u003c- getG_symDMatrix(X, blockSize = 400, vmode = \"double\", folderOut = \"mice2\")\nclass(G3)\nall.equal(diag(G1), diag(G3))\n\nfor(i in 1:10){\n    n1 \u003c- sample(1:25, size = 1)\n    i1 \u003c- sample(1:25, size = n1)\n    for(j in 1:10){\n        n2 \u003c- sample(1:nrow(G1), size = 1)\n        i2 \u003c- sample(1:nrow(G1), size = n2)\n        tmp1 \u003c- G1[i1, i2]\n        tmp2 \u003c- G3[i1, i2]\n        stopifnot(all.equal(tmp1, tmp2))\n    }\n}\n```\n\n### (4) Creating a symDMatrix from `ff` files containing the blocks\n\nThe function `symDMatrix` allows creating a `symDMatrix` object from a list of `.RData` files containing `ff_matrix` objects. The list is assumed to provide, in order, files for `G11, G12, ..., G1q, G22, G23, ..., G2q, ..., Gqq`. This approach is useful for very large G matrices. If `n` is large it may make sense to compute the blocks of the `symDMatrix` object in parallel jobs (e.g., in an HPC). The function `getG` of the [BGData](https://CRAN.R-project.org/package=BGData) package is similar to `getG_symDMatrix` but accepts arguments `i1` and `i2` which define a block of G (i.e., rows of `X`).\n\n```R\nlibrary(BGLR)\nlibrary(BGData)\nlibrary(ff)\n\n# Load genotypes from a wheat data set\ndata(wheat)\nX \u003c- wheat.X\nrownames(X) \u003c- paste0(\"ID_\", 1:nrow(X))\n\n# Compute G matrix in RAM\ncenters \u003c- colMeans(X)\nscales \u003c- apply(X = X, MARGIN = 2, FUN = sd)\nG1 \u003c- tcrossprod(scale(X, center = centers, scale = scales))\nG1 \u003c- G1 / mean(diag(G1))\n\n# Compute G matrix block by block (each block computation can be distributed)\nnBlocks \u003c- 3\nblockSize \u003c- ceiling(nrow(X) / nBlocks)\ni \u003c- 1:nrow(X)\nblockIndices \u003c- split(i, ceiling(i / blockSize))\nfor (r in 1:nBlocks) {\n    for (s in r:nBlocks) {\n        blockName \u003c- paste0(\"wheat_\", r, \"_\", s - r + 1)\n        block \u003c- getG(X, center = centers, scale = scales, scaleG = TRUE,\n                      i = blockIndices[[r]], i2 = blockIndices[[s]])\n        block \u003c- ff::as.ff(block, filename = paste0(blockName, \".bin\"), vmode = \"double\")\n        save(block, file = paste0(blockName, \".RData\"))\n    }\n}\nG2 \u003c- as.symDMatrix(list.files(pattern = \"^wheat.*RData$\"))\nattr(G2, \"centers\") \u003c- centers\nattr(G2, \"scales\") \u003c- scales\n\nall.equal(diag(G1), diag(G2)) # there will be a slight numerical penalty\n```\n\n\nInstallation\n------------\n\nInstall the stable version from CRAN:\n\n```R\ninstall.packages(\"symDMatrix\")\n```\n\nAlternatively, install the development version from GitHub:\n\n```R\n# install.packages(\"remotes\")\nremotes::install_github(\"QuantGen/symDMatrix\")\n```\n\n\nDocumentation\n-------------\n\nFurther documentation can be found on [RDocumentation](http://www.rdocumentation.org/packages/symDMatrix).\n\n\nContributing\n------------\n\n- Issue Tracker: https://github.com/QuantGen/symDMatrix/issues\n- Source Code: https://github.com/QuantGen/symDMatrix\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantgen%2Fsymdmatrix","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantgen%2Fsymdmatrix","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantgen%2Fsymdmatrix/lists"}