{"id":16274006,"url":"https://github.com/trinker/textcorpus","last_synced_at":"2025-08-03T12:09:05.547Z","repository":{"id":146664272,"uuid":"83863390","full_name":"trinker/textcorpus","owner":"trinker","description":null,"archived":false,"fork":false,"pushed_at":"2017-04-16T14:23:19.000Z","size":8358,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-08T16:13:59.289Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trinker.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-04T03:55:48.000Z","updated_at":"2019-08-24T11:22:23.000Z","dependencies_parsed_at":"2023-04-14T16:20:12.652Z","dependency_job_id":null,"html_url":"https://github.com/trinker/textcorpus","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/trinker/textcorpus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftextcorpus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftextcorpus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftextcorpus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftextcorpus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trinker","download_url":"https://codeload.github.com/trinker/textcorpus/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trinker%2Ftextcorpus/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268541888,"owners_count":24266795,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-03T02:00:12.545Z","response_time":2577,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-10T18:26:42.335Z","updated_at":"2025-08-03T12:09:05.522Z","avatar_url":"https://github.com/trinker.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: \"textcorpus\"\ndate: \"`r format(Sys.time(), '%d %B, %Y')`\"\noutput:\n  md_document:\n    toc: true      \n---\n\n```{r, echo=FALSE}\ndesc \u003c- suppressWarnings(readLines(\"DESCRIPTION\"))\nregex \u003c- \"(^Version:\\\\s+)(\\\\d+\\\\.\\\\d+\\\\.\\\\d+)\"\nloc \u003c- grep(regex, desc)\nver \u003c- gsub(regex, \"\\\\2\", desc[loc])\nverbadge \u003c- sprintf('\u003ca href=\"https://img.shields.io/badge/Version-%s-orange.svg\"\u003e\u003cimg src=\"https://img.shields.io/badge/Version-%s-orange.svg\" alt=\"Version\"/\u003e\u003c/a\u003e\u003c/p\u003e', ver, ver)\n````\n\n```{r, echo=FALSE, message=FALSE, warning=FALSE}\nlibrary(knitr)\nknit_hooks$set(htmlcap = function(before, options, envir) {\n  if(!before) {\n    paste('\u003cp class=\"caption\"\u003e\u003cb\u003e\u003cem\u003e',options$htmlcap,\"\u003c/em\u003e\u003c/b\u003e\u003c/p\u003e\",sep=\"\")\n    }\n    })\nknitr::opts_knit$set(self.contained = TRUE, cache = FALSE)\nknitr::opts_chunk$set(fig.path = \"tools/figure/\")\n```\n\n[![Build Status](https://travis-ci.org/trinker/textcorpus.svg?branch=master)](https://travis-ci.org/trinker/textcorpus)\n[![Coverage Status](https://coveralls.io/repos/trinker/textcorpus/badge.svg?branch=master)](https://coveralls.io/r/trinker/textcorpus?branch=master)\n`r verbadge`\n\n**textcorpus** is collection of text courpus datasets.  The package also contains tools to enable easy community contributions to the package.  The underying premise is that the speech level data is stored with meta data as a list of two tibble data frames with a common key column.\n\n\n\n\n# Installation\n\nTo download the development version of **textcorpus**:\n\nDownload the [zip ball](https://github.com/trinker/textcorpus/zipball/master) or [tar ball](https://github.com/trinker/textcorpus/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:\n\n```r\nif (!require(\"pacman\")) install.packages(\"pacman\")\npacman::p_load_gh(\"trinker/textcorpus\")\n```\n\n# Data\n\n```{r, echo=FALSE}\npacman::p_load(pander)\npander(description[-4], style = \"grid\", split.table = Inf, justify = c(rep('left', 4), 'right'))\n```\n\n# Demonstration\n\n## Joining Corpus and Meta Data\n\n**dplyr** akes joining the corpus and meta data easy.\n\n```{r}\npacman::p_load(tidyverse, sentimentr, formality, readability)\npacman::p_load_current_gh('trinker/textcorpus')\n\nnixon_tapes\n\n\ndat \u003c- nixon_tapes$corpus %\u003e%\n    dplyr::left_join(nixon_tapes$meta, by = 'id')\n\ndat\n```\n\n## Text Scores\n\nHere we calculate formality, sentiment, and readability measures.  An additional call to **dplyr**'s `left_jon` with a `Reduce` makes it easy to merge the various score frames into one frame.\n\n```{r}\nn_formality \u003c- dat %\u003e%\n    filter(author == \"Nixon\") %\u003e%\n    with(formality(text, list(author, id, date)))\n\nn_sentiment \u003c- dat %\u003e%\n    filter(author == \"Nixon\") %\u003e%\n    with(sentiment_by(text, list(author, id, date)))\n\nn_readability \u003c- dat %\u003e%\n    filter(author == \"Nixon\") %\u003e%\n    with(readability(text, list(author, id, date)))\n\nstats_dat \u003c- list(n_formality, n_sentiment, n_readability) %\u003e%\n    Reduce(function(x, y) left_join(x, y, by=c(\"author\", \"id\", \"date\")), .)\n```\n\n## Plotting the Text Scores Across Time\n\n```{r}\nstats_dat %\u003e%\n    select(date, F, ave_sentiment, Average_Grade_Level) %\u003e%\n    rename(Formality = F, Sentiment = ave_sentiment, Readbiltiy = Average_Grade_Level) %\u003e%\n    gather(Measure, Score, -date) %\u003e%\n    mutate(Date = as.factor(date), Date2 = as.numeric(Date)) %\u003e%\n    ggplot(aes(x = Date2, y = Score)) +\n        geom_point() +\n        geom_smooth(span = 0.4, fill = NA) +\n        facet_wrap(~Measure, ncol = 1, scales = 'free_y') \n\n```\n\n# Contact\n\nYou are welcome to:    \n- submit suggestions and bug-reports at: \u003chttps://github.com/trinker/textcorpus/issues\u003e    \n- send a pull request on: \u003chttps://github.com/trinker/textcorpus/\u003e    \n- compose a friendly e-mail to: \u003ctyler.rinker@gmail.com\u003e    ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrinker%2Ftextcorpus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrinker%2Ftextcorpus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrinker%2Ftextcorpus/lists"}