{"id":20116613,"url":"https://github.com/brandonleekramer/diversity","last_synced_at":"2026-05-05T01:37:23.870Z","repository":{"id":110543801,"uuid":"202599624","full_name":"brandonleekramer/diversity","owner":"brandonleekramer","description":"In this project, my colleague Catherine Lee (Rutgers) and I employ computational text analysis to examine quantitative trends in the use of diversity terms, OMB/Census terms, and other population labels in a sample of 2.6+ million biomedical abstracts spanning the last 30 years.","archived":false,"fork":false,"pushed_at":"2023-01-02T19:56:58.000Z","size":101422,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-02T19:28:54.717Z","etag":null,"topics":["diversity","pubmed","python","r","sql","text-mining","word-embeddings"],"latest_commit_sha":null,"homepage":"https://riseofdiversity.netlify.app/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brandonleekramer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-15T19:26:59.000Z","updated_at":"2024-09-26T16:50:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"e2ffab0c-3a20-4f88-9981-4d3051fcaf94","html_url":"https://github.com/brandonleekramer/diversity","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/brandonleekramer/diversity","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Fdiversity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Fdiversity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Fdiversity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Fdiversity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brandonleekramer","download_url":"https://codeload.github.com/brandonleekramer/diversity/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brandonleekramer%2Fdiversity/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32632288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-04T10:08:07.713Z","status":"ssl_error","status_checked_at":"2026-05-04T10:08:02.005Z","response_time":58,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diversity","pubmed","python","r","sql","text-mining","word-embeddings"],"created_at":"2024-11-13T18:42:36.081Z","updated_at":"2026-05-05T01:37:23.854Z","avatar_url":"https://github.com/brandonleekramer.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n#### The Rise of Diversity and Population Terminology in Biomedical Research\n\nAs of: 05-17-2021\n\nThis repository provides the source code for the Brandon Kramer and Catherine Lee's \"The Rise of Diversity and Population Terminology in Biomedical Research.\" After uploading the PubMed/MEDLINE database with `PubMedPortable` in `Python`, we used `R`'s `tidytext` package to examine trends in the use of diversity in more than 2.5 million scientific abstracts from 1990-2020. Overall, our analyses demonstrate that various types of \"diversity\" and other population terminiology, including race and ethnicity, are rising over time. While we provide some prelimiary results and a full appendix on our [project website](https://riseofdiversity.netlify.app/), the source code, database, and outputs are detailed below. This project is still in progress, but is updated often. \n\n#### Code structure \n\n    ├── content (website)\n        ├── overview.Rmd\n        ├── methods.Rmd\n        ├── analyses\n            ├── hypothesis1.Rmd\n            ├── hypothesis2.Rmd\n            ├── hypothesis3.Rmd\n    ├── data\n        ├── dictionaries\n            ├── preprocessing\n                ├── compoundR.csv\n                ├── polysemeR.csv\n                ├── humanizeR.csv\n            ├── h1_dictionary.csv\n            ├── h2_dictionary.csv\n            ├── h3_dictionary.csv\n            ├── tree_data.csv\n        ├── journal_rankings\n        ├── regression_analyses\n        ├── sensitivity_checks\n        ├── text_results \n            ├── h1_results\n            ├── h2_results\n            ├── h3_results \n        ├── word_embeddings\n    ├── src\n        ├── 01_pubmed_db\n            ├── 01_download_medline.sh\n            ├── 02_pubmed_parser.ipynb\n            ├── 03_clean_db.sql\n            ├── 04_pubmed_abstract_db.sql\n            ├── 05_filtered_publications.R\n            ├── 06_articles_per_journal.sql\n            ├── 07_articles_per_year.sql\n            ├── 08_biomedical_abstracts.sql\n            ├── 09_check_abstracts_tbl.sql\n        ├── 02_text_trends\n            ├── 01_hypothesis1.R\n            ├── 02_hypothesis2.R\n            ├── 03_hypothesis3.R\n            ├── 04_all_hypotheses.slurm\n            ├── 05_pub_figures.Rmd\n            ├── supplementary_analyses\n                ├── 06_aggregate_ids.R\n                ├── 07_diversity_abstracts.sql\n                ├── 08_diversity_abstracts.R\n                ├── 09_soc_diversity_eda.R\n                ├── 10_human_abstracts.R\n        ├── 03_word_embeddings\n            ├── 01_w2v_train.ipynb\n            ├── 02_w2v_results.ipynb\n        ├── 04_text_relations\n            ├── unfinished_analyses\n        ├── 05_collaborations\n            ├── unfinished_analyses\n\n#### Database structure \n\n    ├── pubmed_2021\n        ├── abstract_data\n        ├── articles_per_journal\n        ├── articles_per_year\n        ├── biomedical_abstracts \n        ├── filtered_publications \n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrandonleekramer%2Fdiversity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrandonleekramer%2Fdiversity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrandonleekramer%2Fdiversity/lists"}