{"id":17951229,"url":"https://github.com/kupolak/textstat","last_synced_at":"2025-07-04T23:07:12.017Z","repository":{"id":62558889,"uuid":"157257685","full_name":"kupolak/textstat","owner":"kupolak","description":"Ruby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.","archived":false,"fork":false,"pushed_at":"2024-07-23T11:04:41.000Z","size":248,"stargazers_count":34,"open_issues_count":16,"forks_count":10,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-05-17T04:35:11.218Z","etag":null,"topics":["flesch-kincaid-grade","flesch-reading-ease","reading","ruby","smog","statistics","text-processing","textstat","translation"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kupolak.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-12T18:26:42.000Z","updated_at":"2025-05-13T20:21:32.000Z","dependencies_parsed_at":"2023-12-21T01:54:45.433Z","dependency_job_id":"5f5193d3-4425-43c2-894d-258254c554cc","html_url":"https://github.com/kupolak/textstat","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/kupolak/textstat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kupolak%2Ftextstat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kupolak%2Ftextstat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kupolak%2Ftextstat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kupolak%2Ftextstat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kupolak","download_url":"https://codeload.github.com/kupolak/textstat/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kupolak%2Ftextstat/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260548149,"owners_count":23026252,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flesch-kincaid-grade","flesch-reading-ease","reading","ruby","smog","statistics","text-processing","textstat","translation"],"created_at":"2024-10-29T09:44:47.244Z","updated_at":"2025-07-04T23:07:11.991Z","avatar_url":"https://github.com/kupolak.png","language":"Ruby","readme":"# Textstat \nRuby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.\n\n## Table of Contents\n\n- [Usage](#usage)\n- [Installation](#installation)\n- [List of Functions](#list-of-functions)\n  - [Basic Functions](#basic-functions)\n    - [Char Count](#char-count)\n    - [Lexicon Count](#lexicon-count)\n    - [Syllable Count](#syllable-count)\n    - [Sentence Count](#sentence-count)\n    - [Average sentence length](#average-sentence-length)\n    - [Average syllables per word](#average-syllables-per-word)\n    - [Average letters per word](#average-letters-per-word)\n    - [Difficult words](#difficult-words)\n  - [Advanced Formulas](#advanced-formulas)\n    - [The Flesch Reading Ease formula](#the-flesch-reading-ease-formula)\n    - [The Flesch-Kincaid Grade Level](#the-flesch-kincaid-grade-level)\n    - [The Fog Scale (Gunning FOG Formula)](#the-fog-scale-gunning-fog-formula)\n    - [The SMOG Index](#the-smog-index)\n    - [Automated Readability Index](#automated-readability-index)\n    - [The Coleman-Liau Index](#the-coleman-liau-index)\n    - [Linsear Write Formula](#linsear-write-formula)\n    - [Dale-Chall Readability Score](#dale-chall-readability-score)\n    - [Lix Readability Formula](#lix-readability-formula)\n    - [FORCAST Readability Formula](#forcast-readability-formula)\n    - [Powers-Sumner-Kearl Readability Formula](#powers-sumner-kearl-readability-formula)\n    - [SPACHE Readability Formula](#spache-readability-formula)\n    - [Readability Consensus based upon all the above tests](#readability-consensus-based-upon-all-the-above-tests)\n- [Contributing](#contributing)\n- [Development setup](#development-setup)\n\n# Usage\n\n```ruby\nrequire 'textstat'\n\ntest_data = %(\n         Playing games has always been thought to be important to \n        the development of well-balanced and creative children \n        however, what part, if any, they should play in the lives \n        of adults has never been researched that deeply. I believe \n        that playing games is every bit as important for adults \n        as for children. Not only is taking time out to play games \n        with our children and other adults valuable to building \n        interpersonal relationships but is also a wonderful way \n        to release built up tension.\n)\n\n\nTextStat.char_count(test_data)\nTextStat.lexicon_count(test_data)\nTextStat.syllable_count(test_data)\nTextStat.sentence_count(test_data)\nTextStat.avg_sentence_length(test_data)\nTextStat.avg_syllables_per_word(test_data)\nTextStat.avg_letter_per_word(test_data)\nTextStat.avg_sentence_per_word(test_data)\nTextStat.difficult_words(test_data)\n\n\nTextStat.flesch_reading_ease(test_data)\nTextStat.flesch_kincaid_grade(test_data)\nTextStat.gunning_fog(test_data)\nTextStat.smog_index(test_data)\nTextStat.automated_readability_index(test_data)\nTextStat.coleman_liau_index(test_data)\nTextStat.linsear_write_formula(test_data)\nTextStat.dale_chall_readability_score(test_data)\nTextStat.lix(test_data)\nTextStat.forcast(test_data)\nTextStat.powers_sumner_kearl(test_data)\nTextStat.spache(test_data)\n\nTextStat.text_standard(test_data)\n```\n\nThe argument (text) for all the defined functions remains the same -\ni.e the text for which statistics need to be calculated.\n\n# Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'textstat'\n```\n\nAnd then execute:\n\n     bundle\n\nOr install it yourself as:\n\n     gem install textstat\n\n# List of Functions\n\n## Basic functions\n\n### Char Count\n\n```ruby\nTextStat.char_count(text, ignore_spaces = true)\n```\n\nCalculates the number of characters present in the text.\nOptional `ignore_spaces` specifies whether we need to take spaces into account while counting chars.\nDefault value is `true`.\n\n### Lexicon Count\n\n```ruby\nTextStat.lexicon_count(text, remove_punctuation = true)\n```\n\nCalculates the number of words present in the text.\nOptional `remove_punctuation` specifies whether we need to take\npunctuation symbols into account while counting lexicons.\nDefault value is `true`, which removes the punctuation\nbefore counting lexicon items.\n\n### Syllable Count\n\n```ruby\nTextStat.syllable_count(text, language = 'en_us')\n```\n\nReturns the number of syllables present in the given text.\n\nUses the Ruby gem [text-hyphen](https://github.com/halostatue/text-hyphen)\nfor syllable calculation. Optional `language` specifies which language dictionary to use.\n\nDefault is `'en_us'`.\n\n### Sentence Count\n\n```ruby\nTextStat.sentence_count(text)\n```\n\nReturns the number of sentences present in the given text.\n\n### Average sentence length\n\n```ruby\nTextStat.avg_sentence_length(text)\n```\n\n### Average syllables per word\n\n```ruby\nTextStat.avg_syllables_per_word(text, language = 'en_us')\n```\n\nReturns the average syllables per word in the given text.\n\n### Average letters per word\n\n```ruby\nTextStat.avg_letter_per_word(text)\n```\n\nReturns the average letters per word in the given text.\n\n### Difficult words\n\n```ruby\nTextStat.difficult_words(text, language = 'en_us')\n```\n\nReturns the number of difficult words in the given text.\nOptional `language` specifies which language dictionary to use.\n\nDefault is `'en_us'`\n\n## Advanced formulas\n\n### The Flesch Reading Ease formula\n\n```ruby\nTextStat.flesch_reading_ease(text, language = 'en_us')\n```\n\nReturns the Flesch Reading Ease Score.\n\nThe following table can be helpful to assess the ease of\nreadability in a document.\n\nThe table is an _example_ of values. While the\nmaximum score is 121.22, there is no limit on how low\nthe score can be. A negative score is valid.\n\n| Score  | Difficulty       |\n|--------|------------------|\n| 90-100 | Very Easy        |\n| 80-89  | Easy             |\n| 70-79  | Fairly Easy      |\n| 60-69  | Standard         |\n| 50-59  | Fairly Difficult |\n| 30-49  | Difficult        |\n| 0-29   | Very Confusing   |\n\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests#Flesch_reading_ease)\n\n### The Flesch-Kincaid Grade Level\n\n```ruby\nTextStat.flesch_kincaid_grade(text, language = 'en_us')\n```\n\nReturns the Flesch-Kincaid Grade of the given text. This is a grade\nformula in that a score of 9.3 means that a ninth grader would be able to\nread the document.\n\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests#Flesch%E2%80%93Kincaid_grade_level)\n\n### The Fog Scale (Gunning FOG Formula)\n\n```ruby\nTextStat.gunning_fog(text, language = 'en_us')\n```\n\nReturns the FOG index of the given text. This is a grade formula in that\na score of 9.3 means that a ninth grader would be able to read the document.\n\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Gunning_fog_index)\n\n### The SMOG Index\n\n```ruby\nTextStat.smog_index(text, language = 'en_us')\n```\n\nReturns the SMOG index of the given text. This is a grade formula in that\na score of 9.3 means that a ninth grader would be able to read the document.\n\nTexts of fewer than 30 sentences are statistically invalid, because\nthe SMOG formula was normed on 30-sentence samples. textstat requires atleast\n3 sentences for a result.\n\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/SMOG)\n\n### Automated Readability Index\n\n```ruby\nTextStat.automated_readability_index(text)\n```\n\nReturns the ARI (Automated Readability Index) which outputs\na number that approximates the grade level needed to\ncomprehend the text.\n\nFor example if the ARI is 6.5, then the grade level to comprehend\nthe text is 6th to 7th grade.\n\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Automated_readability_index)\n\n### The Coleman-Liau Index\n\n```ruby\nTextStat.coleman_liau_index(text)\n```\n\nReturns the grade level of the text using the Coleman-Liau Formula. This is\na grade formula in that a score of 9.3 means that a ninth grader would be\nable to read the document.\n\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index)\n\n### Linsear Write Formula\n\n```ruby\nTextStat.linsear_write_formula(text, language = 'en_us')\n```\n\nReturns the grade level using the Linsear Write Formula. This is\na grade formula in that a score of 9.3 means that a ninth grader would be\nable to read the document.\n\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Linsear_Write)\n\n### Dale-Chall Readability Score\n\n```ruby\nTextStat.dale_chall_readability_score(text, language = 'en_us')\n```\n\nDifferent from other tests, since it uses a lookup table\nof the most commonly used 3000 English words. Thus it returns\nthe grade level using the New Dale-Chall Formula.\n\n| Score        | Understood by                                |\n|--------------|----------------------------------------------|\n| 4.9 or lower | average 4th-grade student or lower           |\n| 5.0–5.9      | average 5th or 6th-grade student             |\n| 6.0–6.9      | average 7th or 8th-grade student             |\n| 7.0–7.9      | average 9th or 10th-grade student            |\n| 8.0–8.9      | average 11th or 12th-grade student           |\n| 9.0–9.9      | average 13th to 15th-grade (college) student |\n\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula)\n\n### Lix Readability Formula\n\n```ruby\nTextStat.lix(text)\n```\n\nReturns the grade level of the text using the Lix Formula.\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Lix_(readability_test))\n\n\n### FORCAST Readability Formula\n\n```ruby\nTextStat.forcast(text, language = 'en_us')\n```\n\nReturns the grade level of the text using the FORCAST Readability Formula.\n\u003e Further reading on\n[readabilityformulas.com](https://readabilityformulas.com/forcast-readability-results.php)\n\n### Powers-Sumner-Kearl Readability Formula\n\n```ruby\nTextStat.powers_sumner_kearl(text, language = 'en_us')\n```\n\nReturns the grade level of the text using the Powers-Sumner-Kearl Readability Formula.\n\u003e Further reading on\n[readabilityformulas.com](https://readabilityformulas.com/powers-sumner-kear-readability-formula.php)\n\n\n### SPACHE Readability Formula\n\n```ruby\nTextStat.spache(text, language = 'en_us')\n```\n\nReturns the grade level of the text using the Spache Readability Formula.\n\u003e Further reading on\n[Wikipedia](https://en.wikipedia.org/wiki/Spache_readability_formula)\n\n\n### Readability Consensus based upon all the above tests\n\n```ruby\nTextStat.text_standard(text, float_output=False)\n```\n\nBased upon all the above tests, returns the estimated school\ngrade level required to understand the text.\n\nOptional `float_output` allows the score to be returned as a\n`float`. Defaults to `False`.\n\nLanguages supported:\n- US English\n- UK English\n- Catalan\n- Czech\n- Danish\n- Spanish\n- Estonian\n- Finnish\n- French\n- Hungarian\n- Indonesian\n- Icelandic\n- Italian\n- Latin\n- Dutch (Nederlande)\n- Bokmål (Norwegian)\n- Polish\n- Portuguese\n- Russian\n- Swedish\n\n# Contributing\n\nIf you find any problems, you should open an\n[issue](https://github.com/kupolak/textstat/issues).\n\nIf you can fix an issue you've found, or another issue, you should open\na [pull request](https://github.com/kupolak/textstat/pulls).\n\n1. Fork this repository on GitHub to start making your changes to the master\nbranch (or branch off of it).\n2. Write a test which shows that the bug was fixed or that the feature works as expected.\n3. Send a pull request!\n\n# Development setup\n\n```bash\ngit clone https://github.com/kupolak/textstat.git  # Clone the repo from your fork\ncd textstat\nbundle  # Install all dependencies\n\n# Make changes\nrspec spec  # Run tests\n```\n","funding_links":[],"categories":["Natural Language Processing"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkupolak%2Ftextstat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkupolak%2Ftextstat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkupolak%2Ftextstat/lists"}