An open API service indexing awesome lists of open source software.

https://github.com/trinker/syllable

A Small Collection of Syllable Counting Functions
https://github.com/trinker/syllable

count-syllables r readability syllable-counts text-mining

Last synced: 2 months ago
JSON representation

A Small Collection of Syllable Counting Functions

Awesome Lists containing this project

README

        

---
title: "syllable"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
md_document:
toc: true
---

```{r, echo=FALSE}
desc <- suppressWarnings(readLines("DESCRIPTION"))
regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"
loc <- grep(regex, desc)
ver <- gsub(regex, "\\2", desc[loc])
verbadge <- sprintf('Version', ver, ver)
pacman::p_load(syllable, knitr)
```

```{r, echo=FALSE}
knit_hooks$set(htmlcap = function(before, options, envir) {
if(!before) {
paste('

',options$htmlcap,"

",sep="")
}
})
knitr::opts_knit$set(self.contained = TRUE, cache = FALSE)
knitr::opts_chunk$set(fig.path = "tools/figure/")
```

[![Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.](https://www.repostatus.org/badges/latest/inactive.svg)](https://www.repostatus.org/#inactive)
[![Build Status](https://travis-ci.org/trinker/syllable.svg?branch=master)](https://travis-ci.org/trinker/syllable)
[![Coverage Status](https://coveralls.io/repos/trinker/syllable/badge.svg?branch=master)](https://coveralls.io/r/trinker/syllable?branch=master)
[![DOI](https://zenodo.org/badge/5398/trinker/syllable.svg)](https://zenodo.org/badge/latestdoi/5398/trinker/syllable)
[![](http://cranlogs.r-pkg.org/badges/syllable)](https://cran.r-project.org/package=syllable)
`r verbadge`

![](tools/syllable_logo/r_syllable.png)

**syllable** is a small collection of tools for counting syllables and polysyllables. The tools rely
primarily on [**data.table**](https://CRAN.R-project.org/package=data.table) hash table lookups, resulting in fast syllable counting.

# Main Functions

The main functions follow the format of `action_object`.

## Actions

The following table outlines the actions. Example Output correspond to this string: `"I like chicken sandwiches."`.

| Action | Description | Returns | Example Output |
|--------------|----------------------------|-----------------------|-----------------------------|
| `count` | One integer per word | A vector per string | 1, 1, 2, 3 |
| `sum` | Sum of syllable counts | An integer per string | 7 |
| `tally`\* | Sum of syllable attributes | An integer per string | pollysyllable tallies = 1 |

\* The addition of `_mono`, `_di`, `_poly` `_short` (monosyllabic + disyllabic), or `_both` (short & pollysyllabic) to `tally` allows the user specify what syllable attribute is being tallied.

## Objects

The following table outlines the objects acted upon:

| Object | Description | Example |
|--------------|---------------------------------|--------------------------------|
| `string` | A character string | `"I like chicken sandwiches."` |
| `vector`\* | A vector of character strings | `c("I like it.", "Look out!")` |

\* The addition of `_by` to `vector` allows the user to aggregate by one or more vectors of grouping variables.

## Putting It Together

The function `count_vector` will provide a vector of integer counts for each word in a string. For this reason `count_vector` will return a `list` of integer vector counts.

```{r}
count_vector(c("I like it.", "Look out!"))
```

Each of the main functions is optimized to do its task efficiently. While one could use `sum(count_vector(x))` and achieve the same results as `sum_vector(x)` it would be less efficient.

The available syllable functions that follow the format of `action_object` are:

```{r, results='asis', echo=FALSE, comment=NA, warning=FALSE, htmlcap="Available Variable Functions"}
p_load(pander, xtable, dplyr)

avaible_syllable_funs() %>%
xtable() %>%
print(type = 'html', include.colnames = FALSE, include.rownames = FALSE,
html.table.attributes = '')

#matrix(c(sprintf("`%s`", vect), blanks), ncol=4) %>%
# pandoc.table(format = "markdown", caption = "Available variable functions.")
```

# Installation

To download the development version of **syllable**:

Download the [zip ball](https://github.com/trinker/syllable/zipball/master) or [tar ball](https://github.com/trinker/syllable/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:

```r
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh(
'trinker/lexicon',
'trinker/textclean',
'trinker/textshape',
'trinker/syllable'
)
```

# Contact

You are welcome to:
* submit suggestions and bug-reports at:
* send a pull request on:
* compose a friendly e-mail to:

# Examples

The following examples demonstrate the functionality of a select sample of **syllable** functions.

## Count Syllables In a String

Counts the number of syllables for each word in a string.

```{r}
count_string("I like chicken and eggs for breakfast")
```

## Count Syllables In a Vector of Strings

```{r}
sents <- c("I like chicken.", "I want eggs benidict for breakfast.")
count_vector(sents)

Map(function(x, y) setNames(x, y),
count_vector(sents),
strsplit(gsub("[^a-z ]", "", tolower(sents)), "\\s+")
)
```

## Sum the Syllables In a Vector of Strings by Grouping Variable(s)

```{r}
dat <- data.frame(
text = c("I like chicken.", "I want eggs benedict for breakfast.", "Really?"),
group = c("A", "B", "A")
)
sum_vector_by(dat$text, dat$group)
```

## Tally the Short/Poly-Syllabic Words by Group(s)

```{r}
dat <- data.frame(
text = c("I like excellent chicken.", "I want eggs benedict now.", "Really?"),
group = c("A", "B", "A")
)
tally_both_vector_by(dat$text, dat$group)

with(presidential_debates_2012, tally_both_vector_by(dialogue, person))
```

## Readability Word Statistics by Grouping Variable(s)

```{r}
with(presidential_debates_2012, readability_word_stats_by(dialogue, list(person, time)))
```

## Visualize Poly Syllable Distributions

```{r}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, scales)

tally_both_vector(presidential_debates_2012$dialogue) %>%
mutate(Duration = 1:length(poly)) %>%
rowwise() %>%
filter((short + poly) > 4) %>%
mutate(
short = short/(short+poly),
poly = 1 - short,
size = poly > .3
) %>%
ggplot(aes(Duration, poly)) +
geom_text(aes(label = Duration, size = size, color = size)) +
coord_flip() +
scale_size_manual(values = c(1.5, 2.5), guide=FALSE) +
scale_color_manual(values = c("grey75", "black"), guide=FALSE) +
scale_x_reverse() +
scale_y_continuous(label = scales::percent) +
ylab("Poly-syllabic") +
xlab("Duration (sentences)") +
theme_bw()
```

## Visualize Poly Syllable Distributions by Group

```{r}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, ggplot2, tidyr, scales)

with(presidential_debates_2012, tally_both_vector_by(dialogue, list(person, time))) %>%
mutate(
person_time = paste(person, time, sep = "-"),
short = short/(short+poly),
poly = 1 - short
) %>%
arrange(poly) %>%
mutate(person_time = factor(person_time, levels = person_time)) %>%
gather(type, prop, c(short, poly)) %>%
ggplot(aes(person_time, weight = prop, fill = type)) +
geom_bar() +
coord_flip() +
scale_y_continuous(label = scales::percent) +
scale_fill_discrete(name="Syllable\nType") +
xlab("Person & Time") +
ylab("Usage") +
theme_bw()
```