{"id":24111868,"url":"https://github.com/andybega/icews","last_synced_at":"2025-07-09T18:13:20.670Z","repository":{"id":49437851,"uuid":"152747904","full_name":"andybega/icews","owner":"andybega","description":"Get the ICEWS event data","archived":false,"fork":false,"pushed_at":"2023-06-06T10:34:30.000Z","size":1929,"stargazers_count":18,"open_issues_count":16,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-06-06T11:42:57.117Z","etag":null,"topics":["cameo","cameo-codes","dataverse","dvn","event-data","icews","r","sqlite-database"],"latest_commit_sha":null,"homepage":"https://www.andybeger.com/icews/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andybega.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-12T12:34:33.000Z","updated_at":"2023-01-09T09:21:47.000Z","dependencies_parsed_at":"2022-09-15T07:10:20.158Z","dependency_job_id":null,"html_url":"https://github.com/andybega/icews","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andybega%2Ficews","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andybega%2Ficews/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andybega%2Ficews/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andybega%2Ficews/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andybega","download_url":"https://codeload.github.com/andybega/icews/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233433048,"owners_count":18675524,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cameo","cameo-codes","dataverse","dvn","event-data","icews","r","sqlite-database"],"created_at":"2025-01-11T02:52:08.542Z","updated_at":"2025-01-11T02:52:10.967Z","avatar_url":"https://github.com/andybega.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput:\n  md_document:\n    variant: gfm\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, echo = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"README-\"\n)\n```\n\n# icews\n\n[![CRAN status](https://www.r-pkg.org/badges/version/icews)](https://cran.r-project.org/package=icews)\n[![R-CMD-check](https://github.com/andybega/icews/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/andybega/icews/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/andybega/icews/branch/master/graph/badge.svg)](https://codecov.io/gh/andybega/icews?branch=master)\n\n_Note: the ICEWS data were discontinued on 11 April 2023. You can still use this package to download the data from dataverse however._\n\nGet the ICEWS event data from the Harvard Dataverse repos at [https://doi.org/10.7910/DVN/28075](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28075) (historic data) and [https://doi.org/10.7910/DVN/QI2T9A](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QI2T9A) (weekly updates).\n\nThe icews package provides these major features:\n\n- get the ICEWS event data without having to deal with Dataverse\n- use raw data files (tab-separated variables, .tsv) or a database (SQLite3) or both as the storage backend\n- set options so that in future R sessions icews knows where your data lives\n- icews keeps the local data in sync with the latest versions on Dataverse\n\n## Installation\n\nNot on CRAN, so you will have to install via remotes or devtools:\n\n```{r github-install, eval = FALSE}\nlibrary(\"remotes\")\nremotes::install_github(\"andybega/icews\")\n```\n\nThe **icews** package relies on the R [**dataverse** client](https://github.com/IQSS/dataverse-client-r). Note that this package requires a dataverse API token to correctly work. See the package README and https://guides.dataverse.org/en/latest/user/account.html#api-token. \n\n## Usage\n\n**tl;dr**: get a SQLite database with the current events on Dataverse with this code; otherwise read below for more details. \n\n```{r, eval = FALSE}\nSys.setenv(DATAVERSE_KEY = \"{api key}\")  # see ?dataverse_api_token\nSys.setenv(DATAVERSE_SERVER = \"dataverse.harvard.edu\")\nlibrary(\"icews\")\nlibrary(\"DBI\")\nlibrary(\"dplyr\")\n\nsetup_icews(data_dir = \"/where/should/data/be\", use_db = TRUE, keep_files = TRUE,\n            r_profile = TRUE)\n# this will give instructions for what to add to .Rprofile so that settings\n# persist between R sessions\n\nupdate_icews(dryrun = TRUE)\n# Should list proposed downloads, ingests, etc.\nupdate_icews(dryrun = FALSE)\n# Wait until all is done; like 45 minutes or more the first time around\n\n# The events will be in a table called \"events\". \n# To get the data:\nquery_icews(\"SELECT count(*) AS n FROM events;\")\n# or\ncon \u003c- connect()\nDBI::dbGetQuery(con, \"SELECT count(*) AS n FROM events;\")\n# or\ncon \u003c- connect()\ndplyr::tbl(con, \"events\") %\u003e% summarize(n = n())\n# or, \n# read all 17+ million rows into memory\nevents \u003c- read_icews()\n```\n\n\n### Files only\n\nIn the most simple use case, you can use the package to download the ICEWS event data, which comes in several tab-serparated value (TSV) files, without having to deal with Dataverse. \n\n```{r, eval = FALSE}\nSys.setenv(DATAVERSE_SERVER = \"dataverse.harvard.edu\")\nlibrary(\"icews\")\n\ndir.create(\"~/Downloads/icews\")\ndownload_data(\"~/Downloads/icews\")\n```\n\nThis will conventiently also re-use and update any files already in the same directory. Zipped files will be unzipped. E.g. the yearly data files come in zipped a file with the pattern \"events.2018.yyyymmddhhmmss.tab\", where the \"yyyymmddhhmmss\" part changes might change if data have been updated. The downloader can identify this and will replace the old with the new file. \n\nJust in case, you can do a dry run that will not actually make any changes:\n\n```{r, eval = FALSE}\ndownload_data(\"~/Downloads/icews\", dryrun = TRUE)\n```\n\n```bash\nFound 25 local data file(s)\nDownloading 2 new file(s)\nRemoving 1 old local file(s)\n\nPlan:\nDownload 'events.1995.20150313082510.tab.zip'\nDownload 'events.1996.20150313082528.tab.zip'\nRemove   'events.1996.20140313082528.tab'\n```\n\nThe events come in (zipped) tab-separated files. To load all of these into memory in a big combined data frame with about 16 million rows (~2.5Gb):\n\n```{r, eval = FALSE}\nevents \u003c- read_icews(\"~/Downloads/icews\")\n```\n\nBeyond this basic usage, the goal is to abstract as many little pains away as possible. To that end: \n\n### Persist the data directory location\n\nThe package can keep track of the data location via variables stored in the package options. The easiest way is to add these to an \".Rprofile\" file so that they are available each time R starts up.\n\n```{r, eval = FALSE}\nsetup_icews(data_dir = \"/where/should/data/be\", use_db = TRUE, \n            keep_files = TRUE, r_profile = TRUE)\n```\n\nThis will open the \".Rprofile\" file and tell you what to add to it (requires [usethis](https://cran.r-project.org/package=usethis) to be installed). From now on the package knows where your data lives, and most of the functions can be called without specifying any directory or path arguments.  \n\nUnder the hood, this will set three R options that the package uses in the downloader functions:\n\n```{r, eval = FALSE}\n# ICEWS data location and options\noptions(icews.data_dir   = \"~/path/to/icews_data\")\noptions(icews.use_db     = TRUE)\noptions(icews.keep_files = TRUE)\n```\n\n### Use a SQLite database that keeps in sync with Dataverse\n\nTo setup and populate a database with the current version on Dataverse, use this command:\n\n```{r, eval = FALSE}\n# assumes setup_icews with use_db = TRUE has already been called\nupdate_icews(dryrun = FALSE)\n```\n\nThis will download any data files needed from Dataverse, and create and populate a SQLite database with them. The events will be in a table called \"events\". To connect, use `connect()`; this returns a RSQLite database connection. From then on, it can be used like this:\n\n```{r, eval = FALSE}\nlibrary(\"DBI\")\nlibrary(\"dplyr\")\n\ncon \u003c- connect()\n\ndbGetQuery(con, \"SELECT count(*) FROM events;\")\n# or\ntbl(con, \"events\") %\u003e% summarize(n = n())\n```\n\nWhen done, it is good etiquette to close to the database connection:\n\n```{r, eval = FALSE}\nDBI::dbDisconnect(con)\n```\n\n### Bonus: CAMEO codes\n\nAlso included is a dictionary of the CAMEO code for event types. This includes quad and penta category mappings as well. \n\n```{r}\ndata(\"cameo_codes\")\nstr(cameo_codes)\n```\n\nAnd, a dictionary mapping Goldstein scores to CAMEO codes. \n\n```{r}\ndata(\"goldstein_mappings\")\nstr(goldstein_mappings)\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandybega%2Ficews","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandybega%2Ficews","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandybega%2Ficews/lists"}