{"id":22299386,"url":"https://github.com/aritra90/cogg","last_synced_at":"2026-02-02T12:16:54.971Z","repository":{"id":166168291,"uuid":"70845894","full_name":"aritra90/COGG","owner":"aritra90","description":"This repository contains instructions to run the method, COGG or Correlation Optimization of Genetics and Geodemographics.","archived":false,"fork":false,"pushed_at":"2018-10-17T21:10:27.000Z","size":27,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-25T23:18:58.834Z","etag":null,"topics":["correlation-optimization","genetics","principal-components","statistics"],"latest_commit_sha":null,"homepage":"","language":"Matlab","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aritra90.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-10-13T20:34:16.000Z","updated_at":"2019-04-24T00:04:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"b8c20f5e-3dc9-4d5a-904b-08bd2b5d10c7","html_url":"https://github.com/aritra90/COGG","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aritra90/COGG","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aritra90%2FCOGG","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aritra90%2FCOGG/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aritra90%2FCOGG/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aritra90%2FCOGG/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aritra90","download_url":"https://codeload.github.com/aritra90/COGG/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aritra90%2FCOGG/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261964829,"owners_count":23237492,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["correlation-optimization","genetics","principal-components","statistics"],"created_at":"2024-12-03T18:05:55.861Z","updated_at":"2026-02-02T12:16:49.935Z","avatar_url":"https://github.com/aritra90.png","language":"Matlab","funding_links":[],"categories":[],"sub_categories":[],"readme":"# COGG\nCOGG or Correlation Optimization of Genetics and Geodemographics is a novel method where we find the \nmaximum correlation of the principal components obtained from genotype data and the Geodemographic matrix,\nwhich consists of Geographical coordinates and external information that influence the genetics of a particular \nregion. In other words, if any external factors such as society, languages, occupations, natural phenomenon\nhas created genetic stratification in the population. \nIn our case, we have chosen data from the Indian subcontinent. India is a land of diversity where the Genetics \nof the country has been largely influenced by prevalence of caste system and presence of different language \nfamilies. \nCOGG maximizes the squared correlation of the vector of Principal Component and Geodemographic matrix.\n\n\n## List of Inputs: \n ````\n--\u003e File containing Principal Components (we use EIGENSTRAT output)\n--\u003e File containing external information along with geographical\n    coordinates. In our case we use Caste and Language information as\n    external information\n--\u003e value of p, the top principal components to be considered. \n```` \n\n## COGG-CCA\nThis also computes a Canonical Correlation Analysis of CCA, which we name COGG-CCA. COGG is one-sided CCA, \nwe extend both sides in CCA. We now consider top \"p\" principal components, instead of a vector for the \nPrincipal Components. We want to maximize the squared correlation of the Principal Component matrix and \nthe Geodemographic matrix. Canonical Correlation Analysis finds linear combinations of the variables \nin the Genetics and Geodemographic matrices.\n\n## Running COGG: \nYou have to clone/download the codes and run the COGG_Wrapper.m \n\n### Formatting PCs\nEdit the COGG_Wrapper.m to include the paths to your files containing the Principal Components in the following\n\n````format: SampleID PC1 PC2 PC3 PC4 PC5 .. PCp Population````\n\nWhere, SampleID is the id of each sample under study and the last column contains the population to which the \nsamples belong. The PC1-PCp, are the number of principal components under consideration. \n\n### Formatting Geodemographic information\nAlso include the link to the file containing external information of geographical coordinates and external \ninformation which influence the genetics of those populations. Our methods needs a one-to-one relationship\nwith the external factors and the sample. That is, each sample can belong to only one language group and caste \ngroup in our case. \nOur file is formatted as follows:\n\n```` Population Name #ofSamples State/Province Region Language Caste Latitude Longitude Dataset ```` \n\nChange line **44** in ````COGG_Wrapper.m```` according to the format that you are entering. For example, we have the geography, caste and language information in columns 5,6,7,8 in our format. Change the columns accordingly and note that, the first column should always be the population group name/annotation. \n\nAlso you have to edit the value of **p** and enter the desired number of principal components. \n\nRun it as COGG_Wrapper from the command line or MATLAB IDE. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faritra90%2Fcogg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faritra90%2Fcogg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faritra90%2Fcogg/lists"}