https://github.com/trinker/syllable

A Small Collection of Syllable Counting Functions
https://github.com/trinker/syllable

count-syllables r readability syllable-counts text-mining

Last synced: 4 months ago
JSON representation

A Small Collection of Syllable Counting Functions

Host: GitHub
URL: https://github.com/trinker/syllable
Owner: trinker
Created: 2015-08-02T02:10:21.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2019-02-17T18:34:43.000Z (over 6 years ago)
Last Synced: 2025-02-28T21:57:18.123Z (4 months ago)
Topics: count-syllables, r, readability, syllable-counts, text-mining
Language: R
Size: 1.63 MB
Stars: 11
Watchers: 4
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        ---

title: "syllable"

date: "`r format(Sys.time(), '%d %B, %Y')`"

output:

  md_document:

    toc: true      

---

```{r, echo=FALSE}

desc <- suppressWarnings(readLines("DESCRIPTION"))

regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"

loc <- grep(regex, desc)

ver <- gsub(regex, "\\2", desc[loc])

verbadge <- sprintf('', ver, ver)

pacman::p_load(syllable, knitr)

```

```{r, echo=FALSE}

knit_hooks$set(htmlcap = function(before, options, envir) {

  if(!before) {

    paste('
',options$htmlcap,"",sep="")

    }

    })

knitr::opts_knit$set(self.contained = TRUE, cache = FALSE)

knitr::opts_chunk$set(fig.path = "tools/figure/")

```

[![Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.](https://www.repostatus.org/badges/latest/inactive.svg)](https://www.repostatus.org/#inactive)

[![Build Status](https://travis-ci.org/trinker/syllable.svg?branch=master)](https://travis-ci.org/trinker/syllable)

[![Coverage Status](https://coveralls.io/repos/trinker/syllable/badge.svg?branch=master)](https://coveralls.io/r/trinker/syllable?branch=master)

[![DOI](https://zenodo.org/badge/5398/trinker/syllable.svg)](https://zenodo.org/badge/latestdoi/5398/trinker/syllable)

[![](http://cranlogs.r-pkg.org/badges/syllable)](https://cran.r-project.org/package=syllable)

`r verbadge`

![](tools/syllable_logo/r_syllable.png)

**syllable** is a small collection of tools for counting syllables and polysyllables.  The tools rely 

primarily on [**data.table**](https://CRAN.R-project.org/package=data.table) hash table lookups, resulting in fast syllable counting.

# Main Functions 

The main functions follow the format of `action_object`.  

## Actions

The following table outlines the actions.  Example Output correspond to this string: `"I like chicken sandwiches."`.

| Action       | Description                | Returns               | Example Output              |

|--------------|----------------------------|-----------------------|-----------------------------|

| `count`      | One integer per word       | A vector per string   | 1, 1, 2, 3                  |

| `sum`        | Sum of syllable counts     | An integer per string | 7                           |

| `tally`\*      | Sum of syllable attributes | An integer per string | pollysyllable tallies = 1   |

\* The addition of `_mono`, `_di`, `_poly` `_short` (monosyllabic + disyllabic), or `_both` (short & pollysyllabic) to `tally` allows the user specify what syllable attribute is being tallied.

## Objects 

The following table outlines the objects acted upon:

| Object       | Description                     | Example                        |

|--------------|---------------------------------|--------------------------------|

| `string`     | A character string              | `"I like chicken sandwiches."` |

| `vector`\*   | A vector of character strings   | `c("I like it.", "Look out!")` |

\* The addition of `_by` to `vector` allows the user to aggregate by one or more vectors of grouping variables.

## Putting It Together

The function `count_vector` will provide a vector of integer counts for each word in a string.  For this reason `count_vector` will return a `list` of integer vector counts. 

```{r}

count_vector(c("I like it.", "Look out!"))

```

Each of the main functions is optimized to do its task efficiently.  While one could use `sum(count_vector(x))` and achieve the same results as `sum_vector(x)` it would be less efficient.  

The available syllable functions that follow the format of `action_object` are:

```{r, results='asis', echo=FALSE, comment=NA, warning=FALSE, htmlcap="Available Variable Functions"}

p_load(pander, xtable, dplyr)

avaible_syllable_funs() %>%

    xtable() %>%

    print(type = 'html', include.colnames = FALSE, include.rownames = FALSE,

        html.table.attributes = '')

#matrix(c(sprintf("`%s`", vect), blanks), ncol=4) %>%

#    pandoc.table(format = "markdown", caption = "Available variable functions.")

```

# Installation

To download the development version of **syllable**:

Download the [zip ball](https://github.com/trinker/syllable/zipball/master) or [tar ball](https://github.com/trinker/syllable/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:

```r

if (!require("pacman")) install.packages("pacman")

pacman::p_load_gh(

    'trinker/lexicon',

    'trinker/textclean',

    'trinker/textshape',

    'trinker/syllable'

)

```

# Contact

You are welcome to:

* submit suggestions and bug-reports at: 

* send a pull request on: 

* compose a friendly e-mail to: 

# Examples

The following examples demonstrate the functionality of a select sample of **syllable** functions.

## Count Syllables In a String

Counts the number of syllables for each word in a string.

```{r}

count_string("I like chicken and eggs for breakfast")

```

## Count Syllables In a Vector of Strings

```{r}

sents <- c("I like chicken.", "I want eggs benidict for breakfast.")

count_vector(sents)

Map(function(x, y) setNames(x, y),

   count_vector(sents),

   strsplit(gsub("[^a-z ]", "", tolower(sents)), "\\s+")

)

```

## Sum the Syllables In a Vector of Strings by Grouping Variable(s)

```{r}

dat <- data.frame(

   text = c("I like chicken.", "I want eggs benedict for breakfast.", "Really?"),

   group = c("A", "B", "A")

)

sum_vector_by(dat$text, dat$group)

```

## Tally the Short/Poly-Syllabic Words by Group(s)

```{r}

dat <- data.frame(

   text = c("I like excellent chicken.", "I want eggs benedict now.", "Really?"),

   group = c("A", "B", "A")

)

tally_both_vector_by(dat$text, dat$group)

with(presidential_debates_2012, tally_both_vector_by(dialogue, person))

```

## Readability Word Statistics by Grouping Variable(s)

```{r}

with(presidential_debates_2012, readability_word_stats_by(dialogue, list(person, time)))

```

## Visualize Poly Syllable Distributions

```{r}

if (!require("pacman")) install.packages("pacman")

pacman::p_load(dplyr, ggplot2, scales)

tally_both_vector(presidential_debates_2012$dialogue) %>%

    mutate(Duration = 1:length(poly)) %>%

    rowwise() %>%

    filter((short + poly) > 4) %>%

    mutate(

        short = short/(short+poly),

        poly = 1 - short,

        size = poly > .3

    ) %>%

    ggplot(aes(Duration, poly)) +

        geom_text(aes(label = Duration, size = size, color = size)) +

        coord_flip() +

        scale_size_manual(values = c(1.5, 2.5), guide=FALSE) +

        scale_color_manual(values = c("grey75", "black"), guide=FALSE) +

        scale_x_reverse() +

        scale_y_continuous(label = scales::percent) +

        ylab("Poly-syllabic") +

        xlab("Duration (sentences)") +

        theme_bw() 

```

## Visualize Poly Syllable Distributions by Group 

```{r}

if (!require("pacman")) install.packages("pacman")

pacman::p_load(dplyr, ggplot2, tidyr, scales)

with(presidential_debates_2012, tally_both_vector_by(dialogue, list(person, time))) %>%

    mutate(

        person_time = paste(person, time, sep = "-"),

        short = short/(short+poly),

        poly = 1 - short

    ) %>%

    arrange(poly) %>%

    mutate(person_time = factor(person_time, levels = person_time)) %>%

    gather(type, prop, c(short, poly)) %>%

    ggplot(aes(person_time, weight = prop, fill = type)) +

        geom_bar() +

        coord_flip() +        

        scale_y_continuous(label = scales::percent) +

        scale_fill_discrete(name="Syllable\nType") +

        xlab("Person & Time") +

        ylab("Usage") +

        theme_bw()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/trinker/syllable

Awesome Lists containing this project

README