{"id":20116765,"url":"https://github.com/brandonleekramer/tidyorgs","last_synced_at":"2025-07-15T11:45:35.852Z","repository":{"id":110543798,"uuid":"389747216","full_name":"brandonleekramer/tidyorgs","owner":"brandonleekramer","description":"A tidy package that detects and standardizes organizations in unstructured text data ","archived":false,"fork":false,"pushed_at":"2021-12-13T22:24:40.000Z","size":50555,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-25T14:52:33.876Z","etag":null,"topics":["academic","business","geography","government","nonprofit","organizations","r","standardization","text-classification","text-mining","tidyverse"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brandonleekramer.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-07-26T19:32:27.000Z","updated_at":"2025-03-20T14:59:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"9e755e33-8d81-4904-88c1-1644faa6a07b","html_url":"https://github.com/brandonleekramer/tidyorgs","commit_stats":{"total_commits":31,"total_committers":1,"mean_commits":31.0,"dds":0.0,"last_synced_commit":"bc328877835ca067a5feb62447da21b17264f2f6"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/brandonleekramer/tidyorgs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Ftidyorgs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Ftidyorgs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Ftidyorgs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Ftidyorgs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brandonleekramer","download_url":"https://codeload.github.com/brandonleekramer/tidyorgs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Ftidyorgs/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265433248,"owners_count":23764249,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["academic","business","geography","government","nonprofit","organizations","r","standardization","text-classification","text-mining","tidyverse"],"created_at":"2024-11-13T18:43:03.016Z","updated_at":"2025-07-15T11:45:35.811Z","avatar_url":"https://github.com/brandonleekramer.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(echo = TRUE)\n```\n\n# tidyorgs: A tidy package that standardizes text data for organizational and sector analysis \u003cimg src=\"man/figures/tidyorgs_logo.png\" align=\"right\" height=\"250\" /\u003e\n\n**Authors:** [Brandon Kramer](https://www.brandonleekramer.com/) with contributions from members of the [University of Virginia's Biocomplexity Institute](https://biocomplexity.virginia.edu/institute/divisions/social-and-decision-analytics), the [National Center for Science and Engineering Statistics](https://www.nsf.gov/statistics/), and the [2020](https://dspg-young-scholars-program.github.io/dspg20oss/team/?dspg) and [2021](https://dspgtools.shinyapps.io/dspg21oss/) [UVA Data Science for the Public Good Open Source Software Teams](https://biocomplexity.virginia.edu/institute/divisions/social-and-decision-analytics/dspg)\u003cbr/\u003e\n**License:** [MIT](https://opensource.org/licenses/MIT)\u003cbr/\u003e\n\n### Installation\n\nYou can install this package using the `devtools` package:\n\n```{r, eval=FALSE, warning=FALSE, message=FALSE}\ninstall.packages(\"devtools\")\ndevtools::install_github(\"brandonleekramer/tidyorgs\") \n```\n\nThe `tidyorgs` package provides several functions that help standardize messy text data for organizational analysis. More specifically, the package's two core sets of functions `detect_{sector}()` and `email_to_orgs()` standardize organizations from across the academic, business, government and nonprofit sectors based on unstructured text and email domains. The package is intended to support linkage across multiple datasets, bibliometric analysis, and sector classification for social, economic, and policy analysis. \n\n### Matching organizations with the `detect_orgs()` function\n\nThe `detect_{sector}()` functions detects patterns in messy text data and then standardizes them into organizations based on a curated dictionary. For example, messy bio information scraped from GitHub can be easily codified so that statistical analysis can be done on academic users. \n\n#### `detect_academic()`\n\n```{r, warning=FALSE, message=FALSE}\nlibrary(tidyverse)\nlibrary(tidyorgs)\ndata(github_users)\n\nclassified_academic \u003c- github_users %\u003e%\n  detect_academic(login, company, organization, email) %\u003e% \n  filter(academic == 1) %\u003e% \n  select(login, organization, company) \n\nclassified_academic\n```\n\n#### `detect_business()`\n\n```{r}\nclassified_businesses \u003c- github_users %\u003e%\n  detect_business(login, company, organization, email) %\u003e% \n  filter(business == 1) %\u003e%\n  select(login, organization, company)\nclassified_businesses\n```\n\n#### `detect_government()`\n\n```{r}\nclassified_government \u003c- github_users %\u003e%\n  detect_government(login, company, organization, email) %\u003e% \n  filter(government == 1) %\u003e% \n  select(login, organization, company)\nclassified_government\n```\n\n#### `detect_nonprofit()`\n\n```{r}\nclassified_nonprofit \u003c- github_users %\u003e%\n  detect_nonprofit(login, company, organization, email) %\u003e% \n  filter(nonprofit == 1) %\u003e% \n  select(login, organization, company, email)\nclassified_nonprofit\n```\n\n### Matching users to organizations by emails using `email_to_orgs()`\n\nFor those that only have email information, the `email_to_orgs()` function matches users to organizations based on our curated domain list. \n\n```{r, warning=FALSE, message=FALSE}\nuser_emails_to_orgs \u003c- github_users %\u003e%\n  email_to_orgs(login, email, country_name, \"academic\") \n\ngithub_users %\u003e% \n  left_join(user_emails_to_orgs, by = \"login\") %\u003e% \n  drop_na(country_name) %\u003e% \n  select(email, country_name)\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrandonleekramer%2Ftidyorgs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrandonleekramer%2Ftidyorgs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrandonleekramer%2Ftidyorgs/lists"}