https://github.com/trinker/syllable
A Small Collection of Syllable Counting Functions
https://github.com/trinker/syllable
count-syllables r readability syllable-counts text-mining
Last synced: 2 months ago
JSON representation
A Small Collection of Syllable Counting Functions
- Host: GitHub
- URL: https://github.com/trinker/syllable
- Owner: trinker
- Created: 2015-08-02T02:10:21.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2019-02-17T18:34:43.000Z (about 6 years ago)
- Last Synced: 2025-02-28T21:57:18.123Z (3 months ago)
- Topics: count-syllables, r, readability, syllable-counts, text-mining
- Language: R
- Size: 1.63 MB
- Stars: 11
- Watchers: 4
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
title: "syllable"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
md_document:
toc: true
---```{r, echo=FALSE}
desc <- suppressWarnings(readLines("DESCRIPTION"))
regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"
loc <- grep(regex, desc)
ver <- gsub(regex, "\\2", desc[loc])
verbadge <- sprintf('', ver, ver)
pacman::p_load(syllable, knitr)
``````{r, echo=FALSE}
knit_hooks$set(htmlcap = function(before, options, envir) {
if(!before) {
paste('',options$htmlcap,"
",sep="")
}
})
knitr::opts_knit$set(self.contained = TRUE, cache = FALSE)
knitr::opts_chunk$set(fig.path = "tools/figure/")
```[](https://www.repostatus.org/#inactive)
[](https://travis-ci.org/trinker/syllable)
[](https://coveralls.io/r/trinker/syllable?branch=master)
[](https://zenodo.org/badge/latestdoi/5398/trinker/syllable)
[](https://cran.r-project.org/package=syllable)
`r verbadge`
**syllable** is a small collection of tools for counting syllables and polysyllables. The tools rely
primarily on [**data.table**](https://CRAN.R-project.org/package=data.table) hash table lookups, resulting in fast syllable counting.# Main Functions
The main functions follow the format of `action_object`.
## Actions
The following table outlines the actions. Example Output correspond to this string: `"I like chicken sandwiches."`.
| Action | Description | Returns | Example Output |
|--------------|----------------------------|-----------------------|-----------------------------|
| `count` | One integer per word | A vector per string | 1, 1, 2, 3 |
| `sum` | Sum of syllable counts | An integer per string | 7 |
| `tally`\* | Sum of syllable attributes | An integer per string | pollysyllable tallies = 1 |\* The addition of `_mono`, `_di`, `_poly` `_short` (monosyllabic + disyllabic), or `_both` (short & pollysyllabic) to `tally` allows the user specify what syllable attribute is being tallied.
## Objects
The following table outlines the objects acted upon:
| Object | Description | Example |
|--------------|---------------------------------|--------------------------------|
| `string` | A character string | `"I like chicken sandwiches."` |
| `vector`\* | A vector of character strings | `c("I like it.", "Look out!")` |\* The addition of `_by` to `vector` allows the user to aggregate by one or more vectors of grouping variables.
## Putting It Together
The function `count_vector` will provide a vector of integer counts for each word in a string. For this reason `count_vector` will return a `list` of integer vector counts.
```{r}
count_vector(c("I like it.", "Look out!"))
```Each of the main functions is optimized to do its task efficiently. While one could use `sum(count_vector(x))` and achieve the same results as `sum_vector(x)` it would be less efficient.
The available syllable functions that follow the format of `action_object` are:
```{r, results='asis', echo=FALSE, comment=NA, warning=FALSE, htmlcap="Available Variable Functions"}
p_load(pander, xtable, dplyr)avaible_syllable_funs() %>%
xtable() %>%
print(type = 'html', include.colnames = FALSE, include.rownames = FALSE,
html.table.attributes = '')#matrix(c(sprintf("`%s`", vect), blanks), ncol=4) %>%
# pandoc.table(format = "markdown", caption = "Available variable functions.")
```# Installation
To download the development version of **syllable**:
Download the [zip ball](https://github.com/trinker/syllable/zipball/master) or [tar ball](https://github.com/trinker/syllable/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:
```r
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh(
'trinker/lexicon',
'trinker/textclean',
'trinker/textshape',
'trinker/syllable'
)
```# Contact
You are welcome to:
* submit suggestions and bug-reports at:
* send a pull request on:
* compose a friendly e-mail to:# Examples
The following examples demonstrate the functionality of a select sample of **syllable** functions.
## Count Syllables In a String
Counts the number of syllables for each word in a string.
```{r}
count_string("I like chicken and eggs for breakfast")
```## Count Syllables In a Vector of Strings
```{r}
sents <- c("I like chicken.", "I want eggs benidict for breakfast.")
count_vector(sents)Map(function(x, y) setNames(x, y),
count_vector(sents),
strsplit(gsub("[^a-z ]", "", tolower(sents)), "\\s+")
)
```## Sum the Syllables In a Vector of Strings by Grouping Variable(s)
```{r}
dat <- data.frame(
text = c("I like chicken.", "I want eggs benedict for breakfast.", "Really?"),
group = c("A", "B", "A")
)
sum_vector_by(dat$text, dat$group)
```## Tally the Short/Poly-Syllabic Words by Group(s)
```{r}
dat <- data.frame(
text = c("I like excellent chicken.", "I want eggs benedict now.", "Really?"),
group = c("A", "B", "A")
)
tally_both_vector_by(dat$text, dat$group)with(presidential_debates_2012, tally_both_vector_by(dialogue, person))
```## Readability Word Statistics by Grouping Variable(s)
```{r}
with(presidential_debates_2012, readability_word_stats_by(dialogue, list(person, time)))
```## Visualize Poly Syllable Distributions
```{r}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, scales)tally_both_vector(presidential_debates_2012$dialogue) %>%
mutate(Duration = 1:length(poly)) %>%
rowwise() %>%
filter((short + poly) > 4) %>%
mutate(
short = short/(short+poly),
poly = 1 - short,
size = poly > .3
) %>%
ggplot(aes(Duration, poly)) +
geom_text(aes(label = Duration, size = size, color = size)) +
coord_flip() +
scale_size_manual(values = c(1.5, 2.5), guide=FALSE) +
scale_color_manual(values = c("grey75", "black"), guide=FALSE) +
scale_x_reverse() +
scale_y_continuous(label = scales::percent) +
ylab("Poly-syllabic") +
xlab("Duration (sentences)") +
theme_bw()
```## Visualize Poly Syllable Distributions by Group
```{r}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, tidyr, scales)with(presidential_debates_2012, tally_both_vector_by(dialogue, list(person, time))) %>%
mutate(
person_time = paste(person, time, sep = "-"),
short = short/(short+poly),
poly = 1 - short
) %>%
arrange(poly) %>%
mutate(person_time = factor(person_time, levels = person_time)) %>%
gather(type, prop, c(short, poly)) %>%
ggplot(aes(person_time, weight = prop, fill = type)) +
geom_bar() +
coord_flip() +
scale_y_continuous(label = scales::percent) +
scale_fill_discrete(name="Syllable\nType") +
xlab("Person & Time") +
ylab("Usage") +
theme_bw()
```