Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nicolasdz/UNGDC
The UN General Debate Corpus (UNGDC) is a dataset of all speeches given at the high-level UN forum usually held in September of each year.
https://github.com/nicolasdz/UNGDC
diplomacy international-relations nlp united-nations
Last synced: 8 days ago
JSON representation
The UN General Debate Corpus (UNGDC) is a dataset of all speeches given at the high-level UN forum usually held in September of each year.
- Host: GitHub
- URL: https://github.com/nicolasdz/UNGDC
- Owner: nicolasdz
- Created: 2021-01-14T21:07:24.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-03-07T23:41:06.000Z (over 3 years ago)
- Last Synced: 2024-08-01T12:18:31.095Z (3 months ago)
- Topics: diplomacy, international-relations, nlp, united-nations
- Language: HTML
- Homepage:
- Size: 12.2 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# UNGDC
The UN General Debate (UNGD) is the annual high-level event where each UN member state can address all the others. The UNGD Corpus (UNGDC) provides the English-language text of speeches from 200 countries between 1970 and 2018: some 8,093 speeches in total.
The UNGDC dataset was created by Slava Jankin Mikhaylov, Alexander Baturo, and Niheer Dasandi in 2017. See their [Github repository](https://github.com/sjankin/UnitedNations) for the latest version. You can also find all of their replication materials on this [webpage](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/0TJX8Y).
The 2017 article they published in the journal Research & Politics which draws upon the dataset can be found [here](https://journals.sagepub.com/doi/full/10.1177/2053168017712821) (pay-walled).In this repository, I provide a simple Jupyter notebook demonstrating how the dataset can be used to analyze and visualize trends in global diplomacy. It employs the Pandas and SpaCy packages to do some simple NLP on the dataset, with three major applications: tracking topic mentions across countries; tracking topic mentions over time; and performing semantic similarity analysis. At a later date, I hope to add a demonstration of how named entity recognition can be used to visualize the UNGDC as a network.