Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/openeventdata/plover
Next generation event data ontology
https://github.com/openeventdata/plover
event-data nlp political-science shared-tasks
Last synced: 3 days ago
JSON representation
Next generation event data ontology
- Host: GitHub
- URL: https://github.com/openeventdata/plover
- Owner: openeventdata
- Created: 2016-10-19T14:12:10.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2024-02-04T13:06:02.000Z (9 months ago)
- Last Synced: 2024-03-26T08:27:08.890Z (8 months ago)
- Topics: event-data, nlp, political-science, shared-tasks
- Language: TeX
- Size: 10.5 MB
- Stars: 59
- Watchers: 14
- Forks: 6
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PLOVER
![plover_icon](https://github.com/openeventdata/PLOVER/blob/master/media/plover_icon175.png "PLOVER logo")
PLOVER–Political Language Ontology for Verifiable Event Records–is a
next generation political event coding specification under development by the
Open Event Data Alliance (http://openeventdata.org/) which is intended
to replace the earlier
[CAMEO] (http://eventdata.parusanalytics.com/data.dir/cameo.html)
system.The full PLOVER codebook is available in the repo
[above](https://github.com/openeventdata/PLOVER/blob/master/PLOVER_Manual.pdf),
and a short introduction to PLOVER and event data is below.There is currently a near-real-time PLOVER-coded global event data set on Dataverse at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/AJGVIT. Extensive details are available at these two open-access papers:
Halterman, Andrew, Philip A. Schrodt, Andreas Beger, Benjamin E. Bagozzi and Grace I. Scarborough. 2023. “Creating Custom Event Data Without Dictionaries: A Bag-of-Tricks.” Working paper presented at the International Studies Association, Montreal, March-2023. [arXiv link](https://arxiv.org/abs/2304.01331)
Halterman, Andrew, Benjamin E. Bagozzi, Andreas Beger, Philip A. Schrodt, and Grace I. Scarborough. 2023. PLOVER and POLECAT: A New Political Event Ontology and Dataset.” Working paper presented at the International Studies Association, Montreal, March-2023. [socArXiv link](https://osf.io/preprints/socarxiv/rm5dw/)
## Event Data and Ontologies
Event data in political science is a structured way of recording interactions
between political actors described in text. For instance, a researcher
encountering the sentence> A town in western Sudan's South Kordofan state has been recaptured by Sudanese
> government forces from the rebel Sudan People's Liberation Army (SPLA).
(AFP_ENG_19970408.0772)might want to represent it as ACTOR = "Sudanese military", EVENT = "capture
territory", TARGET = "SPLA", a process that can be applied over hundreds of
thousands of reports to create datasets of thousands of such events for
answering research questions.The software to extract and categorize these events exists (see OEDA's
[Petrarch2](https://github.com/openeventdata/petrarch2), for instance), but all
event data systems require an ontology defining what actors events will be
recorded and how they will be defined.Ontologies face difficult tradeoffs: broader groupings of event types and
actors are easier to define and implement, are easier to work with, and provide
data at a useful level of aggregation, especially analyzed globally. More
specialized and specific ontologies will sometimes be necessary for answering
certain research questions and allow more granular and subnational study, but
are more difficult to implement and may provide distinctions that are not
useful to most researchers. In the example above, should the event be
categorized as a general "seize" event? Or a more specific "capture territory"?
Should the target be the group the land was taken from, or should we think
about the territory itself as the target. If the group it was taken from,
should the target here be coded as a general "Sudanese rebel" or the very
specific "SPLA"?The best level of detail will depend on the question and resources of the
researcher. PLOVER is a new event data ontology that choses to be more
general, easier to implement (including across languages), and at the level of
detail demanded by most existing users of event data. At the same time, PLOVER
defines and provides guidance on making "PLOVER-compliant extensions" that will
fit into the ecosystem of tools for creating and analyzing PLOVER event data.PLOVER event categories
====================PLOVER defines 18 event types, many of which are aggregations of older CAMEO
codes:CAMEO code | CAMEO text | PLOVER category |
--- | --- | --- |
01 | MAKE PUBLIC STATEMENT | dropped |
02 | APPEAL | dropped |
03 | EXPRESS INTENT TO COOPERATE | AGREE |
04 | CONSULT | CONSULT |
05 | ENGAGE IN DIPLOMATIC COOPERATION | SUPPORT |
06 | ENGAGE IN MATERIAL COOPERATION | COOPERATE |
07 | PROVIDE AID | AID |
08 | YIELD (081 to 083) | CONCEDE |
08 | YIELD (084 to 087) | RETREAT |
10 | DEMAND | DEMAND |
11 | DISAPPROVE | DISAPPROVE |
12 | REJECT | REJECT |
13 | THREATEN | THREATEN |
14 | PROTEST | PROTEST |
15| EXHIBIT FORCE POSTURE | MOBILIZE |
16 | REDUCE RELATIONS | SANCTION |
17 | COERCE | COERCE |
18,19,20 | ASSAULT, FIGHT | ASSAULT |PLOVER quad categories
----------------------Many users of event data aggregate events into four categories in a 2x2 of
cooperation--conflict and verbal--material. Those categories are defined in
terms of their constituent categories here:Quad category | PLOVER categories |
--- | --- |
Verbal cooperation | AGREE, CONSULT, SUPPORT, CONCEDE |
Material cooperation | COOPERATE, AID, RETREAT, INVESTIGATE |
Verbal conflict | DEMAND, DISAPPROVE, REJECT, THREATEN, SANCTION |
Material conflict | PROTEST, CRIME, MOBILIZE, COERCE, ASSAULT |Migrating From CAMEO
---------------------The existing standard ontology for event data is
[CAMEO](http://eventdata.parusanalytics.com/data.dir/cameo.html). Users who are
familiar with CAMEO may be interested in the differences between CAMEO and
PLOVER.- A set of standardized names ("fields") for JSON
records are specified for both the core event
data fields and for extended information such as geolocation and
extracted texts. Most of these fields are optional but standardized
field names will allow for the development of common utilities, which
cannot be done with the current proliferation of incompatible CSV and
tab-delimited formats.
- Only the 2-digit event 'cue categories' have been retained from
CAMEO: our hope is that these are sufficiently broad and distinct
that it will be possible to achieve a reasonably high level of human
inter-coder agreement— hence "verifiable"—on the coding categories,
and that can be consistently implemented, across all categories, in
automated systems. These are defined in much greater detail than they
were in WEIS and CAMEO.
- Much of the detail previously incorporated in the 3- and 4-digit CAMEO categories is
now reflected in category-specific "mode" fields and a general "context"
field: in effect, this "category-mode-context" scheme covers the "what-how-why" of
the event. We anticipate these will be much easier to systematically code
than the 250 or so hierarchically-arranged numerical codes of CAMEO. The "context" field also to handles
issues such as refugees, disease, natural disaster, elections, parliamentary
processes and cyber-security.
- The CAMEO 01 and 02 categories dealing with comments have been
eliminated.
- A new category has been added for criminal behavior.
- The WEIS/CAMEO YIELD category has been split into verbal (CONCEDE) and material (RETREAT) components.
- The complexity of substate actor codes has been limited, and the
allowable substate modifiers have been substantially simplified.
- The "target" is optional in some categories.
- Both the source and target fields can have compound actors, rather than
dealing with compounds by duplicating event.
- 'dead', 'injured' and 'size' fields are available for recording information
on the magnitude of acts of violence and protests.
- In the near future, we are hoping to make available a large corpus of
"gold standard records" for validation purposes: these will include
Spanish and Arabic cases as well as English. The current release
has a file of English-language gold standard records derived from
from the CAMEO manual.Why "PLOVER"?
=============[Plovers](http://www.rspb.org.uk/discoverandenjoynature/discoverandlearn/birdguide/name/r/ringedplover/)
(Charadriidae) are a globally-distributed family of short-billed
gregarious wading birds who spend their lives frantically poking through
endless stretches of sand and muck trying to find something of interest.
It is difficult to imagine a better analogy to the process of coding
event data.Description of files in repository
==================================* **PLOVER_Manual.pdf/.tex**
Version 0.8 draft which is being implemented by the Political Instability Task Force
* **PLOVER.bib**BibTex entries for the manual
* **plover_reference.html**
Basic reference for the PLOVER event categories in HTML format
* **gold_standard_records/PLOVER_GSR_CAMEO.txt**This is a JSON file of the example sentences from the CAMEO 1.1b3 manual classified by PLOVER categories. The CAMEO LaTeX markup indicating source, target and event texts has been converted to a simple in-line markup; some locations have been added manually. The details of the coding are found in the files *PLOVER_GSR_CAMEO_readme.tex/pdf* and the version of PLOVER being coded is in the older file *plover-manual_draft.0.6b1.pdf*.
* **CAMEO-PLOV.txt**Translation table for CAMEO to PLOVER: this is still something of a draft but will get you started
Acknowledgments
===============
This program was developed as part of research funded by a U.S. National Science Foundation "Resource
Implementations for Data Intensive Research in the Social Behavioral and Economic Sciences (RIDIR)"
project: Modernizing Political Event Data for Big Data Social Science Research (Award 1539302)Any opinions, findings, conclusions or recommendations in this document are [only *probably* still] those of [at least one of] the authors and do not necessarily reflect the views of the National Science Foundation, or any company or government agency employing or funding the authors or otherwise contributing to the document.
This work was partially supported by the Political Instability Task Force (PITF). The PITF is funded by the Central Intelligence Agency. The views expressed in this codebook are the authors' alone and do not represent the views of the US Government.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.