{"id":13859154,"url":"https://github.com/leeper/references","last_synced_at":"2025-09-07T05:37:39.314Z","repository":{"id":70770022,"uuid":"72351670","full_name":"leeper/references","owner":"leeper","description":"All of my bibliographic references","archived":false,"fork":false,"pushed_at":"2020-06-21T11:09:03.000Z","size":9048,"stargazers_count":16,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-14T08:25:31.806Z","etag":null,"topics":["bibliographic-references","bibtex","citations","references"],"latest_commit_sha":null,"homepage":null,"language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leeper.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-10-30T13:38:51.000Z","updated_at":"2025-01-04T11:21:38.000Z","dependencies_parsed_at":"2023-04-24T13:02:59.809Z","dependency_job_id":null,"html_url":"https://github.com/leeper/references","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/leeper/references","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeper%2Freferences","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeper%2Freferences/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeper%2Freferences/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeper%2Freferences/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leeper","download_url":"https://codeload.github.com/leeper/references/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leeper%2Freferences/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273997824,"owners_count":25204632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibliographic-references","bibtex","citations","references"],"created_at":"2024-08-05T03:02:34.567Z","updated_at":"2025-09-07T05:37:39.291Z","avatar_url":"https://github.com/leeper.png","language":"TeX","funding_links":[],"categories":["TeX"],"sub_categories":[],"readme":"---\ntitle: \"My BibTeX database\"\nauthor: \"Thomas J. Leeper\"\noutput:\n  md_document:\n    variant: markdown_github\n---\n\n```{r setup, results=\"hide\"}\n# knitr\nlibrary(\"knitr\")\nopts_chunk$set(fig.width=8, fig.height=5, cache=TRUE)\n\n# ggplot\nlibrary(\"ggplot2\")\ntheme_set(theme_minimal())\nupdate_geom_defaults(\"bar\", list(fill = \"black\"))\nupdate_geom_defaults(\"line\", list(colour = \"red\"))\nupdate_geom_defaults(\"line\", list(fill = \"black\", colour = \"black\"))\n\n# other packages\nrequireNamespace(\"bib2df\", quietly = TRUE)\nrequireNamespace(\"igraph\", quietly = TRUE)\nrequireNamespace(\"gender\", quietly = TRUE)\nrequireNamespace(\"ggraph\", quietly = TRUE)\n````\n\nLicense: Public Domain (CC-0)\n\nThis is the bibtex (.bib) file containing all of my bibliographic references. Figured I'd share it publicly.\n\nThis README was last updated on `r Sys.Date()`.\n\n```{r data}\ndat \u003c- suppressWarnings(bib2df::bib2df(\"references.bib\"))\nsuppressWarnings(dat[[\"YEAR\"]] \u003c- as.numeric(dat[[\"YEAR\"]]))\n```\n\nThe database contains `r nrow(dat)` references. What follows are some basic statistics on its contents.\n\n\n## Citation Types\n\nReference types in the database:\n\n```{r bibtype, dependson=c(\"data\")}\ndat$CATEGORY \u003c- factor(dat$CATEGORY, levels = names(sort(table(dat$CATEGORY))))\nggplot(dat[!is.na(dat$CATEGORY),], aes(x = CATEGORY)) + \n  geom_bar() + \n  xlab(\"Count\") + \n  ylab(\"Citation Type\") + \n  coord_flip()\n```\n\n## Journals\n\nMost common 50 journals:\n\n```{r journal, dependson=c(\"data\"), fig.height = 8}\ndat$JOURNAL[is.na(dat$JOURNAL)] \u003c- dat$JOURNALTITLE[is.na(dat$JOURNAL)]\ntopjournals \u003c- aggregate(CATEGORY ~ JOURNAL, data = dat, FUN = length)\ntopjournals \u003c- head(topjournals[order(topjournals$CATEGORY, decreasing = TRUE), ], 50)\ntopjournals$JOURNAL \u003c- factor(topjournals$JOURNAL, levels = rev(topjournals$JOURNAL))\nggplot(topjournals, aes(x = JOURNAL, y = CATEGORY)) + \n  geom_bar(stat = \"identity\") + \n  ylab(\"Count\") + \n  xlab(\"Journal\") + \n  coord_flip()\n```\n\n## Book Publishers\n\nMost common 25 journals:\n\n```{r publisher, dependson=c(\"data\"), fig.height = 6}\ntoppublishers \u003c- aggregate(CATEGORY ~ PUBLISHER, data = dat[dat$CATEGORY == \"BOOK\",], FUN = length)\ntoppublishers \u003c- head(toppublishers[order(toppublishers$CATEGORY, decreasing = TRUE), ], 25)\ntoppublishers$PUBLISHER \u003c- factor(toppublishers$PUBLISHER, levels = rev(toppublishers$PUBLISHER))\nggplot(toppublishers, aes(x = PUBLISHER, y = CATEGORY)) + \n  geom_bar(stat = \"identity\") + \n  ylab(\"Count\") + \n  xlab(\"Publisher\") + \n  coord_flip()\n```\n\n\n## Authors\n\nNumber of coauthors per publication (excluding some recent extreme outliers):\n\n```{r nauthors, dependson=c(\"data\")}\ndat$nauthors \u003c- lengths(dat$AUTHOR)\nggplot(dat[!is.na(dat$YEAR) \u0026 dat$YEAR \u003e 1900 \u0026 dat$nauthors \u003c 40, ], aes(x = YEAR, y = nauthors)) + \n  geom_point(alpha=0.1, fill=\"black\", colour=\"black\") + \n  geom_smooth(method = \"gam\", colour = \"red\") + \n  xlab(\"Publication Year\") + \n  ylab(\"Coauthors per Publication\")\n```\n\nMost common 50 authors:\n\n```{r authors, dependson=c(\"data\"), fig.height = 8}\naut \u003c- unlist(dat$AUTHOR)\ntopaut \u003c- as.data.frame(head(sort(table(aut), decreasing = TRUE), 150))\ntopaut$aut \u003c- factor(topaut$aut, levels = rev(topaut$aut))\nggplot(topaut[1:50, ], aes(x = aut, y = Freq)) + \n  geom_bar(stat = \"identity\") + \n  ylab(\"Count\") + \n  xlab(\"Author Name\") + \n  coord_flip()\n```\n\n## Gender of authors\n\nThe overall breakdown of author genders (counting each author only once) is as follows:\n\n```{r authorgender, dependson=c(\"data\"), fig.height=2}\npull_first_names \u003c- function(x) {\n    x \u003c- unlist(regmatches(as.character(x), regexec(\"(?\u003c=, )[A-Za-z]+(?=([., ]{1}|$))\", as.character(x), perl = TRUE)))\n    x[x != \"\"]\n}\nfirst_names \u003c- pull_first_names(unique(as.character(unlist(dat$AUTHOR))))\nauthor_genders \u003c- gender::gender(unlist(first_names))\n\nggplot(author_genders[, \"gender\", drop = FALSE], aes(x = \"\", fill = gender)) +\n  geom_bar(aes(y = (..count..)/sum(..count..)), width = 1, position = \"dodge\") + \n  scale_fill_manual(limits = c(\"male\", \"female\"), values = c(\"gray\", \"black\")) +\n  scale_y_continuous(breaks = seq(0,1,by=0.1), labels = scales::percent) +\n  coord_flip() +\n  xlab(\"\") +\n  ylab(\"\") +\n  theme(legend.position = \"bottom\")\n```\n\n```{r teamgender, dependson=c(\"data\", \"authorgender\"), fig.height=2}\nteam_genders \u003c- unlist(lapply(dat$AUTHOR, function(x) {\n    firsts \u003c- pull_first_names(as.character(x))\n    u \u003c- author_genders$gender[match(firsts[firsts != \"\"], author_genders$name)]\n    if (!length(u) || is.na(u)) {\n        \"Ambiguous\"\n    } else if (length(u) == 1 \u0026\u0026 u == \"male\") {\n        \"Male Solo\"\n    } else if (length(u) == 1 \u0026\u0026 u == \"female\") {\n        \"Female Solo\"\n    } else if (all(u %in% \"male\")) {\n        \"Male Team\"\n    } else if (all(u %in% \"female\")) {\n        \"Female Team\"\n    } else {\n        \"Male-Female Team\"\n    }\n}))\nteam_genders_df \u003c- table(factor(team_genders, rev(c(\"Male-Female Team\", \"Male Solo\", \"Female Solo\", \"Male Team\", \"Female Team\", \"Ambiguous\"))))\nggplot(data.frame(team_genders_df), aes(x = Var1, y = Freq)) +\n  geom_bar(width = 1, position = \"dodge\", stat = \"identity\") + \n  coord_flip() +\n  xlab(\"\") +\n  ylab(\"\")\n```\n\nCaveat: The above is based upon [the gender package](https://cran.r-project.org/package=gender), which classifies first names based upon historical data. This is not necessarily accurate and is restricted to a binary classification. It also uses all historical data provided in the package and is based only on United States data, making it possibly inaccurate for any given individual in the dataset.\n\n## Coauthorship\n\nCoauthorship network among most common 150 authors:\n\n```{r authornetwork, dependson=c(\"data\"), fig.height=10}\n# get all coauthor pairs\ncolist \u003c- lapply(dat$AUTHOR, function(x) if (length(x) \u003e= 2) combn(x, m = 2) else NA_character_)\n# convert networks of top coauthors to igraph object\ncodat \u003c- na.omit(data.frame(t(do.call(\"cbind\", colist))))\ncodat$N \u003c- 1L\n# make coauthor graph from top coauthors\ntopco \u003c- aggregate(N ~ X1 + X2, data = codat[codat$X1 %in% topaut$aut \u0026 codat$X2 %in% topaut$aut, ], FUN = sum)\ncograph \u003c- igraph::graph_from_data_frame(topco, directed = FALSE)\n\n## ggraph\nggraph::ggraph(cograph, \"igraph\", algorithm = \"nicely\") + \n  ggraph::geom_edge_link(aes(edge_width = log(N)), colour = \"gray\") + \n  ggraph::geom_node_text(aes(label = name), fontface = 1, size = 2) + \n  theme_void()\n```\n\nAnother, more interactive way of doing this might be:\n\n```{r eval=FALSE, dependson=c(\"data\", \"authornetwork\")}\nnetworkD3::simpleNetwork(topco)\nd3 \u003c- networkD3::igraph_to_networkD3(cograph)\nd3$nodes$group \u003c- 1L\nnetworkD3::forceNetwork(Links = d3$links, Nodes = d3$nodes, NodeID = \"name\", Group = \"group\")\n```\n\nBetweenness centrality of top 25 authors:\n\n```{r between, dependson=c(\"data\", \"authornetwork\")}\nbetween \u003c- igraph::betweenness(cograph)\ntopcoaut \u003c- na.omit(data.frame(betweenness = head(sort(between, decreasing = TRUE), 30)))\ntopcoaut$aut \u003c- factor(rownames(topcoaut), levels = rev(rownames(topcoaut)))\nggplot(topcoaut, aes(x = aut, y = betweenness)) + \n  geom_bar(stat = \"identity\") + \n  ylab(\"Network Betweenness\") + \n  xlab(\"Author Name\") + \n  coord_flip()\n```\n\n## Publication Years\n\nYears of publication (post-1950):\n\n```{r year, dependson=c(\"data\"), fig.height=4}\nggplot(dat[!is.na(dat$YEAR) \u0026 dat$YEAR \u003e 1950, ], aes(x = YEAR)) + \n  geom_bar() +\n  xlab(\"Publication Year\") + \n  ylab(\"Count\")\n```\n\n## Data missingness\n\nProportion missing data, by field, for articles:\n\n```{r missingness_articles, dependson=c(\"data\"), fig.height=3}\narticles \u003c- dat[dat$CATEGORY == \"ARTICLE\", c(\"AUTHOR\", \"TITLE\", \"JOURNAL\", \"YEAR\", \"VOLUME\", \"NUMBER\", \"PAGES\", \"ABSTRACT\", \"DOI\")]\narticles \u003c- cbind.data.frame(FIELD = names(articles), MISSINGNESS = unlist(lapply(articles, function(x) sum(is.na(x) == TRUE)/length(x))))\nggplot(articles, aes(x = FIELD, y = MISSINGNESS)) +\n  geom_bar(stat = \"identity\", fill = \"darkgray\") + \n  ylab(\"Proportion Missing\") + \n  ylim(c(0,1)) +\n  xlab(\"\") + \n  coord_flip()\n```\n\nProportion missing data, by field, for books:\n\n```{r missingness_books, dependson=c(\"data\"), fig.height=3}\nbooks \u003c- dat[dat$CATEGORY == \"BOOK\", c(\"AUTHOR\", \"EDITOR\", \"TITLE\", \"PUBLISHER\", \"YEAR\", \"ADDRESS\", \"ISBN\")]\nbooks \u003c- cbind.data.frame(FIELD = names(books), MISSINGNESS = unlist(lapply(books, function(x) sum(is.na(x) == TRUE)/length(x))))\nggplot(books, aes(x = FIELD, y = MISSINGNESS)) +\n  geom_bar(stat = \"identity\", fill = \"darkgray\") + \n  ylab(\"Proportion Missing\") + \n  ylim(c(0,1)) +\n  xlab(\"\") + \n  coord_flip()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleeper%2Freferences","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleeper%2Freferences","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleeper%2Freferences/lists"}