{"id":13858381,"url":"https://github.com/hrbrmstr/newsflash","last_synced_at":"2025-07-19T04:34:01.338Z","repository":{"id":141238585,"uuid":"80084177","full_name":"hrbrmstr/newsflash","owner":"hrbrmstr","description":"Tools to Work with the Internet Archive and GDELT Television Explorer in R","archived":false,"fork":false,"pushed_at":"2022-12-01T14:40:11.000Z","size":3768,"stargazers_count":88,"open_issues_count":8,"forks_count":9,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-06-04T08:38:50.209Z","etag":null,"topics":["gdelt-television-explorer","internet-archive","r","r-cyber","rstats"],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hrbrmstr.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-01-26T03:57:28.000Z","updated_at":"2025-05-11T05:39:31.000Z","dependencies_parsed_at":"2024-02-09T02:10:14.657Z","dependency_job_id":"da413261-00ce-4c4d-953f-4317f8f07783","html_url":"https://github.com/hrbrmstr/newsflash","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hrbrmstr/newsflash","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fnewsflash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fnewsflash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fnewsflash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fnewsflash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hrbrmstr","download_url":"https://codeload.github.com/hrbrmstr/newsflash/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hrbrmstr%2Fnewsflash/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265888951,"owners_count":23844537,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gdelt-television-explorer","internet-archive","r","r-cyber","rstats"],"created_at":"2024-08-05T03:02:06.692Z","updated_at":"2025-07-19T04:34:01.315Z","avatar_url":"https://github.com/hrbrmstr.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: rmarkdown::github_document\neditor_options: \n  chunk_output_type: console\n---\n\n*** BREAKING CHANGES ***\n\n# newsflash\n\nTools to Work with the Internet Archive and GDELT Television Explorer\n\n## Description\n\nRef: \n\n- \u003chttp://television.gdeltproject.org/cgi-bin/iatv_ftxtsearch/iatv_ftxtsearch\u003e\n- \u003chttps://archive.org/details/third-eye\u003e\n\nTV Explorer:\n\u003e_\"In collaboration with the Internet Archive's Television News Archive, GDELT's Television Explorer allows you to keyword search the closed captioning streams of the Archive's 6 years of American television news and explore macro-level trends in how America's television news is shaping the conversation around key societal issues. Unlike the Archive's primary Television News interface, which returns results at the level of an hour or half-hour \"show,\" the interface here reaches inside of those six years of programming and breaks the more than one million shows into individual sentences and counts how many of those sentences contain your keyword of interest. Instead of reporting that CNN had 24 hour-long shows yesterday that mentioned Donald Trump, the interface here will count how many sentences uttered on CNN yesterday mentioned his name - a vastly more accurate metric for assessing media attention.\"_\n\nThird Eye:\n\u003e_The TV News Archive's Third Eye project captures the chyrons–or narrative text–that appear on the lower third of TV news screens and turns them into downloadable data and a Twitter feed for research, journalism, online tools, and other projects. At project launch (September 2017) we are collecting chyrons from BBC News, CNN, Fox News, and MSNBC–more than four million collected over just two weeks.\"_\n\nAn advantage of using this over the TV Explorer interactive selector \u0026 downloader or Third Eye API is that you get tidy tibbles with this package, ready to use in R.\n\nNOTE: While I don't claim that this alpha-package is anywhere near perfect, the IA/GDELT TV API hiccups every so often so when there are critical errors run the same query in their web interface before submitting an issue. I kept getting errors when searching all affiliate markets for the \"mexican president\" query that also generate errors on the web site when JSON is selected as output (it's fine on the web site if the choice is interactive browser visualizations). Submit those errors to them, not here.\n\n## What's Inside The Tin\n\nThe following functions are implemented:\n\n- `list_chyrons`:\tRetrieve Third Eye chyron index\n- `list_networks`:\tHelper function to identify station/network keyword and corpus date range for said market\n- `newsflash`:\tTools to Work with the Internet Archive and GDELT Television Explorer\n- `query_tv`:\tIssue a query to the TV Explorer\n- `read_chyrons`:\tRetrieve TV News Archive chyrons from the Internet Archive's Third Eye project\n- `gd_top_trending`:\tTop Trending (GDELT)\n- `iatv_top_trending:\tTop Trending Topics (Internet Archive TV Archive)\n- `word_cloud`:\tRetrieve top words that appear most frequently in clips matching your search\n\n## Installation\n\n```{r eval=FALSE}\ndevtools::install_github(\"hrbrmstr/newsflash\")\n```\n\n```{r message=FALSE, warning=FALSE, error=FALSE}\noptions(width=120)\n```\n\n## Usage\n\n```{r message=FALSE, warning=FALSE, error=FALSE}\nlibrary(newsflash)\nlibrary(ggalt)\nlibrary(hrbrthemes)\nlibrary(tidyverse)\n\n# current verison\npackageVersion(\"newsflash\")\n```\n\n### \"Third Eye\" Chyrons are simpler so we'll start with them first:\n\n```{r fig.width=8, fig.height=5, cache=TRUE}\nlist_chyrons()\n\nch \u003c- read_chyrons(\"2018-04-13\")\n\nmutate(\n  ch, \n  hour = lubridate::hour(ts),\n  text = tolower(text),\n  mention = grepl(\"comey\", text)\n) %\u003e% \n  filter(mention) %\u003e% \n  count(hour, channel) %\u003e% \n  ggplot(aes(hour, n)) +\n  geom_segment(aes(xend=hour, yend=0), color = \"lightslategray\", size=1) +\n  scale_x_continuous(name=\"Hour (GMT)\", breaks=seq(0, 23, 6),\n                     labels=sprintf(\"%02d:00\", seq(0, 23, 6))) +\n  scale_y_continuous(name=\"# Chyrons\", limits=c(0,20)) +\n  facet_wrap(~channel, scales=\"free\") +\n  labs(title=\"Chyrons mentioning 'Comey' per hour per channel\",\n       caption=\"Source: Internet Archive Third Eye project \u0026 \u003cgithub.com/hrbrmstr/newsflash\u003e\") +\n  theme_ipsum_rc(grid=\"Y\")\n```\n\n## Now for the TV Explorer:\n\n### See what networks \u0026 associated corpus date ranges are available:\n\n```{r}\nlist_networks(widget=FALSE)\n```\n\n### Basic search:\n\n```{r fig.width=8, fig.height=7, cache=TRUE}\ncomey \u003c- query_tv('comey', start_date = \"2018-04-01\")\n\ncomey\n\nquery_tv('comey', start_date = \"2018-04-01\") %\u003e% \n  arrange(date) %\u003e% \n  ggplot(aes(date, value, group=network)) +\n  ggalt::geom_xspline(aes(color=network)) +\n  ggthemes::scale_color_tableau(name=NULL) +\n  labs(x=NULL, y=\"Volume Metric\", title=\"'Comey' Trends Across National Networks\") +\n  facet_wrap(~network) +\n  theme_ipsum_rc(grid=\"XY\") +\n  theme(legend.position=\"none\")\n```\n\n```{r cache=TRUE}\nquery_tv(\"comey Network:CNN\", mode = \"TimelineVol\", start_date = \"2018-01-01\") %\u003e% \n  arrange(date) %\u003e% \n  ggplot(aes(date, value, group=network)) +\n  ggalt::geom_xspline(color=\"lightslategray\") +\n  ggthemes::scale_color_tableau(name=NULL) +\n  labs(x=NULL, y=\"Volume Metric\", title=\"'Comey' Trend on CNN\") +\n  theme_ipsum_rc(grid=\"XY\")\n```\n\n### Relative Network Attention To Syria since January 1, 2018\n\n```{r cache=TRUE}\nquery_tv('syria Market:\"National\"', mode = \"StationChart\", start_date = \"2018-01-01\") %\u003e% \n  arrange(desc(count)) %\u003e% \n  knitr::kable(\"markdown\")\n```\n\n### Video Clips\n\n```{r cache=TRUE}\nclips \u003c- query_tv('comey Market:\"National\"', mode = \"ClipGallery\", start_date = \"2018-01-01\")\n\nclips\n```\n\n`r clips$show_date[1]` | `r clips$station[1]` | `r clips$show[1]`\n\n\u003ca href=\"`r clips$preview_url[1]`\"\u003e\u003cimg src=\"`r clips$preview_thumb[1]`\"\u003e\u003c/a\u003e\n\n`r clips$snippet[1]`\n\n### \"Word Cloud\" (top associated words to the query)\n\n```{r fig.height=8, fig.width=8, cache=TRUE}\nwc \u003c- query_tv('hannity Market:\"National\"', mode = \"WordCloud\", start_date = \"2018-04-13\")\n\nggplot(wc, aes(x=1, y=1)) +\n  ggrepel::geom_label_repel(aes(label=label, size=count), segment.colour=\"#00000000\", segment.size=0) +\n  scale_size_continuous(trans=\"sqrt\") +\n  labs(x=NULL, y=NULL) +\n  theme_ipsum_rc(grid=\"\") +\n  theme(axis.text=element_blank()) +\n  theme(legend.position=\"none\") \n```\n\n### Last 15 Minutes Top Trending\n\n```{r}\ngd_top_trending()\n```\n\n### Top Overall Trending from the Internet Archive TV Archive (2017 and earlier)\n\n```{r}\niatv_top_trending(\"2017-12-01 18:00\", \"2017-12-02 06:00\")\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fnewsflash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhrbrmstr%2Fnewsflash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhrbrmstr%2Fnewsflash/lists"}