{"id":17790439,"url":"https://github.com/briatte/epsaconf","last_synced_at":"2026-01-21T02:03:16.153Z","repository":{"id":144777941,"uuid":"387525767","full_name":"briatte/epsaconf","owner":"briatte","description":"Data from EPSA conferences, 2019-2023","archived":false,"fork":false,"pushed_at":"2023-07-18T15:02:09.000Z","size":16009,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-07T12:50:11.246Z","etag":null,"topics":["coauthors","conferences","network"],"latest_commit_sha":null,"homepage":"https://netconf-geoscimo.univ-tlse2.fr/project/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/briatte.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-07-19T16:18:57.000Z","updated_at":"2024-01-22T11:05:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"3956d58c-2d33-4e7e-ba51-9bf7c150b136","html_url":"https://github.com/briatte/epsaconf","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/briatte/epsaconf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briatte%2Fepsaconf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briatte%2Fepsaconf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briatte%2Fepsaconf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briatte%2Fepsaconf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/briatte","download_url":"https://codeload.github.com/briatte/epsaconf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/briatte%2Fepsaconf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28622472,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-20T23:49:58.628Z","status":"online","status_checked_at":"2026-01-21T02:00:08.227Z","response_time":86,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coauthors","conferences","network"],"created_at":"2024-10-27T10:43:43.494Z","updated_at":"2026-01-21T02:03:16.139Z","avatar_url":"https://github.com/briatte.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data from EPSA conferences, 2019-2023\n\n[![DOI](https://zenodo.org/badge/387525767.svg)](https://zenodo.org/badge/latestdoi/387525767)\n\nThis repository contains R code to collect and assemble the full programmes of recent [EPSA](https://epsanet.org/) conferences:\n\n| Conference year                  | GitHub      | Online programme        |\n|:---------------------------------|:-----------:|:------------------------|\n| [EPSA 2019][y19]                 | [repo][r19] | [Oxford Abstracts][p19] |\n| [EPSA 2020][y20] (virtual event) | [repo][r20] | [COMS.events][p20]      |\n| [EPSA 2021][y21] (virtual event) | [repo][r21] | [COMS.events][p21]      |\n| [EPSA 2022][y22]                 | [repo][r22] | [COMS.events][p22]      |\n| [EPSA 2023][y23]                 | [repo][r23] | [Oxford Abstracts][p23] |\n\n[y19]: https://epsanet.org/epsa2019/\n[y20]: https://epsanet.org/epsa2020/\n[y21]: https://epsanet.org/epsa2021/\n[y22]: https://epsanet.org/epsa2022/\n[y23]: https://epsanet.org/epsa-2023-programme-committee/\n\n[r19]: https://github.com/briatte/epsa2019\n[r20]: https://github.com/briatte/epsa2020\n[r21]: https://github.com/briatte/epsa2021\n[r22]: https://github.com/briatte/epsa2022\n[r23]: https://github.com/briatte/epsa2023\n\n[p19]: https://virtual.oxfordabstracts.com/#/event/public/772/program\n[p20]: https://coms.events/EPSA-2020/en/\n[p21]: https://coms.events/epsa2021/en/\n[p22]: https://coms.events/epsa-2022/en/\n[p23]: https://virtual.oxfordabstracts.com/#/event/3738/information\n\nThe master dataset [`data/epsa-program.tsv`][prgm] contains all 5 conference years. Details on variables appear in the notes below.\n\nThe code starts by importing the conference programme located in each of the repositories listed above. It then applies some corrections to academic affiliations, guesses genders, performs a few more cleaning routines, updates participant hashes, and creates the master dataset. The single-year programmes, with uncorrected academic affiliations, are preserved for reference.\n\nThis is __work in progress__. See the [issues](issues) for a list of things that still need fixing. In the unlikely event that you need to run the code on your side (the TSV master dataset should be usable without doing so), please feel free to ask for help if something does not work as expected.\n\n[prgm]: https://github.com/briatte/epsaconf/blob/main/data/epsa-program.tsv\n\n# Data\n\nFor each conference year, we collected information on the conference panels, the papers that they hosted, and the individuals involved in either organizing the panels (chairs and discussants) or presenting the papers (authors):\n\n|                  | 2019 | 2020 | 2021 | 2022 | 2023 |\n|:-----------------|:----:|:----:|:----:|:----:|:----:|\n| Participants (1) | 1318 |  298 |  792 | 1415 | 1863 |\n| Affiliations (2) |  328 |  130 |  241 |  348 |  392 |\n| Panels           |  186 |   32 |  131 |  228 |  258 |\n| Abstracts        |  802 |  136 |  517 |  933 | 1127 |\n| Edges (3)        | 1964 |  319 | 1262 | 2285 | 2912 |\n\n1. The names of the participants have not been harmonised across datasets. The data contain 32-bit hashes to identify unique participants _in a single conference year_, based on his or her name and affiliation, in addition to the conference year. You will need to generate new hashes to identify e.g. participants with identical names throughout _all conference years_.\n2. Academic affiliations (which are not always academic) have been cleaned and identified with their [ROR][ror] IDs. A few participants have affiliations with no ROR record, and independent researchers have been assigned special value `\"(independent)\"` as their affiliation.\n3. Defined as the presence of a participant `i` in a conference panel `j` as either chair (`c`), discussant (`d`) or presenter (`p`). This is only one of the (one-mode or two-mode, at least) networks that can be built from the data. See the section on networks for further notes.\n\n[ror]: https://ror.org/\n\n```r\nlibrary(tidyverse)\n\n# participants, panels, abstracts and edges\nfs::dir_ls(\"data\", regexp = \"epsa\\\\d{4}\") %\u003e%\n  map(read_tsv, col_types = cols(.default = \"c\")) %\u003e%\n  map_int(nrow)\n\n# unique affiliations\nfs::dir_ls(\"data\", regexp = \"epsa\\\\d{4}-participants\") %\u003e%\n  map(read_tsv, col_types = cols(.default = \"c\")) %\u003e%\n  map_int(~ n_distinct(.x$affiliation_ror))\n```\n\nThe `data/` folder also contains two external resources used to fix affiliations: [this spreadsheet of manual checks and corrections to ROR guesses][ror-corrections], and a [ROR data dump](https://ror.readme.io/docs/data-dump) from March 2023.\n\n[ror-corrections]: https://docs.google.com/spreadsheets/d/1DHR7NQCNUOslXO5CA2e9hTla6YWZLPs7Uwqmp-wLATE/edit?usp=sharing\n\n## Variables\n\nContents of [`data/epsa-program.tsv`][prgm]:\n\n|                     | 2019  | 2020   | 2021  | 2022  | 2023  |\n|:--------------------|:-----:|:------:|:-----:|:-----:|:-----:|\n| panel id (file)     | x     | x      | x     | x     | x     |\n| panel ref           | x     | NA     | NA    | NA    | x (1) |\n| panel title         | x     | x      | x     | x     | x     |\n| panel track         | x     | x (1)  | NA    | NA    | x     |\n| panel type          | x (1) | x      | x     | x     | x (1) |\n| panel chairs        | x     | x      | x     | x     | x     |\n| panel discussants   | x     | NA (2) | x     | x     | x     |\n| abstract id (file)  | x     | x      | x     | x     | x     |\n| abstract ref        | x     | x      | x     | x     | x     |\n| abstract title      | x     | x      | x     | x     | x     |\n| abstract text       | x     | x      | x     | x     | x     |\n| abstract topic      | NA    | NA     | x (3) | x (3) | NA    |\n| abstract authors    | x     | x      | x     | x     | x     |\n| abstract presenters | x     | x      | x     | x     | x     |\n| affiliations        | x (4) | x (4)  | x (4) | x (4) | x (4) |\n| genders             | x (5) | x (5)  | x (5) | x (5) | x (5) |\n\n1. Contains some missing values (`NA`).\n2. There were no discussants that year, only chairs, called 'moderators' in the data.\n3. Ues the same values as panel tracks in other years, but varies within each panel.\n4. Affiliations are available for chairs, discussants and authors. They have been manually checked and, when possible, matched to [ROR][ror] identifiers (the first affiliation was used when there were more than one). Raw affiliations are available in the single-year programmes.\n5. Genders were guessed by [genderize.io](https://genderize.io/), with a few `\"unknown\"` results, based on the first part of the full names of the participants.\n\nFull-text variables (like titles and abstracts) have been only minimally cleaned to avoid having line breaks and double quotes in the (TSV) data. All other text, punctuation and special characters have been preserved.\n\n## Format\n\nOverview of the [`data/epsa-program.tsv`][prgm] dataset:\n\n```r\nlibrary(tidyverse)\nglimpse(read_tsv(\"data/epsa-program.tsv\"))\n```\n```\nRows: 8,742\nColumns: 20\n$ year              \u003cchr\u003e \"2019\", \"2019\", \"2019\", \"2019\", \"2019\", \"2019\", …\n$ session_id        \u003cchr\u003e \"4823\", \"4823\", \"4823\", \"4555\", \"4555\", \"4555\", …\n$ session_ref       \u003cchr\u003e \"PS1 Roundtable\", \"PS1 Roundtable\", \"PS1 Roundta…\n$ session_track     \u003cchr\u003e \"Political Science as a Discipline\", \"Political …\n$ session_type      \u003cchr\u003e \"Roundtable\", \"Roundtable\", \"Roundtable\", \"Panel…\n$ session_title     \u003cchr\u003e \"Journal Publishing: Finding the Right Outlet fo…\n$ pid               \u003cchr\u003e \"e04cbb06a9c309fa40dc2d8bc65251d4\", \"db22ada49c8…\n$ full_name         \u003cchr\u003e \"Brandon Prins\", \"Scott Gates\", \"Debbie Lisle\", …\n$ gender            \u003cchr\u003e \"male\", \"male\", \"female\", \"male\", \"male\", \"male\"…\n$ affiliation_ror   \u003cchr\u003e \"University of Tennessee at Knoxville\", \"Peace R…\n$ role              \u003cchr\u003e \"c\", \"d\", \"p\", \"c\", \"d\", \"p\", \"p\", \"p\", \"p\", \"p\"…\n$ presenter         \u003cchr\u003e NA, NA, NA, NA, NA, \"y\", \"y\", \"y\", \"n\", \"y\", \"n\"…\n$ abstract_id       \u003cchr\u003e NA, NA, \"133452\", NA, NA, \"86598\", \"78993\", \"857…\n$ abstract_ref      \u003cchr\u003e NA, NA, \"1281\", NA, NA, \"1157\", \"80\", \"549\", \"54…\n$ abstract_title    \u003cchr\u003e NA, NA, \"Navigating an R\u0026R Decision\", NA, NA, \"B…\n$ abstract_text     \u003cchr\u003e NA, NA, \"My contribution to this roundtable on J…\n$ abstract_topic    \u003cchr\u003e NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …\n$ affiliation_url   \u003cchr\u003e \"https://ror.org/020f3ap87\", \"https://ror.org/04…\n$ affiliation_ccode \u003cchr\u003e \"US\", \"NO\", \"GB\", \"GB\", \"GB\", \"PT\", \"CH\", \"US\", …\n$ affiliation_cname \u003cchr\u003e \"United States\", \"Norway\", \"United Kingdom\", \"Un…\n```\n\nSee [stage/issues/38](https://github.com/briatte/stage/issues/38) and [the related wiki page](https://github.com/briatte/stage/wiki/Format-des-donn%C3%A9es) for details (the links point to a private repository, sorry).\n\n## Unique identifiers (UIDs)\n\nParticipants (`pid`):\n\n- 2019: `dc5d...7ff5` (32-bit hashes)\n- 2020: `2b94...d199` (32-bit hashes)\n- 2021: `bd0f...e19e` (32-bit hashes)\n- 2022: `1a89...5098` (32-bit hashes)\n- 2023: `6f5e...4b74` (32-bit hashes)\n\nHashes are based on names, affiliations and conference year, and so are unique at that level. Names might contain homonyms, and affiliations are not stable from a conference year to the other.\n\nPanels (`session_id`_):\n\n- 2019: `4555` (fixed-length, 4 digits)\n- 2020: `20`, `212` (variable-length, 2-3 digits)\n- 2021: `3`, `84`, `129` (variable-length, 1-3 digits)\n- 2022: `9`, `11`, `109` (variable-length, 1-3 digits)\n- 2023: `74640` (fixed-length, 5 digits)\n\nPanel UIDs are based on their Web page identifiers rather than on their conference identifiers (`session_ref`).\n\nAbstracts (`abstract_id`):\n\n- 2019: `133452`, `87064` (variable-length, 5-6 digits)\n- 2020: `0008`, `0009` (fixed-length, sequential, left-padded)\n- 2021: `0069`, `0075` (fixed-length, sequential, left-padded)\n- 2022: `0303`, `0304` (fixed-length, sequential, left-padded)\n- 2023: `1`, `97`, `104`, `1043` (variable-length, 1-4 digits)\n\nAbstract UIDs are based on their Web page identifiers rather than on their conference identifiers (`abstract_ref`).\n\n## Participant roles\n\n- 2019: `c`, `d`, `p`\n- 2020: `c`, `p` (no discussants that year, only chairs/moderators)\n- 2021: `c`, `d`, `p`\n- 2022: `c`, `d`, `p`\n- 2023: `c`, `d`, `p`\n\nAlmost all panels have a single chair `c` and a single discussant `d`, but there are many other combinations between 0-2 chairs and 0-2 discussants:\n\n```r\nread_tsv(\"data/epsa-program.tsv\") %\u003e% \n  group_by(year, session_id) %\u003e% \n  summarise(n_chairs = n_distinct(pid[ role == \"c\" ]), \n            n_discus = n_distinct(pid[ role == \"d\" ])) %\u003e%\n  count(n_chairs, n_discus) %\u003e% \n  print(n = Inf)\n```\n\nThe number of authors/presenters `p` per panel is unbounded. In most cases, they correspond to the authors/presenters of 4 to 6 papers per panel:\n\n```r\n# number of authors/presenters\nread_tsv(\"data/epsa-program.tsv\") %\u003e% \n  group_by(year, session_id) %\u003e% \n  summarise(na = n_distinct(pid[ role == \"p\" ]) %\u003e% \n              cut(c(0:99, Inf), right = FALSE)) %\u003e%\n  count(na) %\u003e% \n  print(n = Inf)\n\n# number of papers per panel\nread_tsv(\"data/epsa-program.tsv\") %\u003e% \n    group_by(year, session_id) %\u003e% \n    summarise(n_papers = n_distinct(abstract_id)) %\u003e% \n    ungroup() %\u003e% \n    count(n_papers)\n```\n\nThe additional `presenter` variable indicates whether the author/presenter of an abstract was formally listed as a presenter in the programme (`y` for yes, `n` for no, `NA` for chairs and discussants).\n\n# All years\n\n```r\nlibrary(tidyverse)\n\n# 5 conference years\nd \u003c- read_tsv(\"data/epsa-program.tsv\", col_types = cols(.default = \"c\"))\n\n# ... 3515 conference papers\nnrow(drop_na(distinct(d, year, abstract_id), abstract_id))\n\n# ... 835 conference panels\nnrow(drop_na(distinct(d, year, session_id), session_id))\n\n# ... 8742 conference participations as chair, discussant or author/presenter\nnrow(d)\n\n# ... 3892 unique participants\nn_distinct(pull(bind_rows(d), full_name))\n```\n\n# Network constructors\n\n```r\nlibrary(igraph)\nlibrary(tidyverse)\n\n# two-mode (participant-panel), unweighted\nfs::dir_ls(\"data\", regexp = \"epsa\\\\d{4}-edges\") %\u003e% \n  map(read_tsv, col_types = cols(.default = \"c\")) %\u003e% \n  map(select, -year) %\u003e% \n  map(~ add_count(group_by(.x, j))) %\u003e% # number of participants per panel\n  map(igraph::graph_from_data_frame)\n\n# one-mode (participant-to-participant), weighted by shared panel appearances\nfs::dir_ls(\"data\", regexp = \"epsa\\\\d{4}-edges\") %\u003e%\n  map(read_tsv, col_types = cols(.default = \"c\")) %\u003e%\n  map(select, i, j) %\u003e% \n  # treating all participations to a panel (c, d, p) as a single tie\n  map(distinct) %\u003e% \n  # link participants i.x to participants i.y over panels j\n  map2(., ., full_join, by = \"j\") %\u003e% \n  # remove self-ties and de-duplicate i -\u003e j and j -\u003e i\n  map(filter, i.x \u003c i.y) %\u003e% \n  map(select, -j, i = i.x, j = i.y) %\u003e% \n  # edge weights n = number of shared panel appearances (1 to 3)\n  map(count, i, j, sort = TRUE) %\u003e% \n  map(igraph::graph_from_data_frame, directed = FALSE)\n```\n\nFeel free to open an issue to discuss additional constructors.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbriatte%2Fepsaconf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbriatte%2Fepsaconf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbriatte%2Fepsaconf/lists"}