{"id":42186644,"url":"https://github.com/paceaux/methodius","last_synced_at":"2026-01-26T22:27:48.376Z","repository":{"id":46749262,"uuid":"515311630","full_name":"paceaux/methodius","owner":"paceaux","description":"A utility for analyzing text on the web","archived":false,"fork":false,"pushed_at":"2025-09-19T03:25:08.000Z","size":371,"stargazers_count":5,"open_issues_count":7,"forks_count":2,"subscribers_count":1,"default_branch":"develop","last_synced_at":"2025-10-29T09:37:36.233Z","etag":null,"topics":["bigram","ngram","parse","split","tokenize","trigram"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paceaux.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-07-18T19:15:10.000Z","updated_at":"2025-09-18T02:30:11.000Z","dependencies_parsed_at":"2025-09-18T04:11:41.745Z","dependency_job_id":null,"html_url":"https://github.com/paceaux/methodius","commit_stats":{"total_commits":25,"total_committers":1,"mean_commits":25.0,"dds":0.0,"last_synced_commit":"316ff92bef98d4e9e35eabf17a78f915f2ef72b2"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/paceaux/methodius","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paceaux%2Fmethodius","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paceaux%2Fmethodius/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paceaux%2Fmethodius/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paceaux%2Fmethodius/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paceaux","download_url":"https://codeload.github.com/paceaux/methodius/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paceaux%2Fmethodius/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28789738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-26T21:49:50.245Z","status":"ssl_error","status_checked_at":"2026-01-26T21:48:29.455Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigram","ngram","parse","split","tokenize","trigram"],"created_at":"2026-01-26T22:27:47.775Z","updated_at":"2026-01-26T22:27:48.364Z","avatar_url":"https://github.com/paceaux.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Methodius (an NGram utility)\n\nA utility for analyzing frequency of text chunks on the web.\n\nSupply a bit o' text to the Methodius class, and let it determine your bigrams, trigrams, ngrams, letter-frequencies, word frequencies, bigram relationships, and create ngram trees. \n\n[![Hippocratic License HL3-LAW-MEDIA-MIL-SOC-SV](https://img.shields.io/static/v1?label=Hippocratic%20License\u0026message=HL3-LAW-MEDIA-MIL-SOC-SV\u0026labelColor=5e2751\u0026color=bc8c3d)](https://firstdonoharm.dev/version/3/0/law-media-mil-soc-sv.html)\n\n![npm](https://img.shields.io/npm/dm/methodius)\n\n## Example\n\n```JavaScript\nconst { Methodius } = require('methodius');\n// or import { Methodius } from 'methodius';\n\nconst udhr1 = `\nAll human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.\n`;\nconst nGrams = new Methodius(udhr1);\n\nconst topLetters = nGrams.getTopLetters(10);\nconst topWords = nGrams.getTopWords(10);\n\n```\n\n# API\n\n## `Methodius`\nGlobal Class\n\n`new Methodius(text)`\n\n**Parameters**\n| name      | type  | Description   |\n| ---       |---    | ---           |\n| text    | string       |     raw text to be analyzed          |\n\n### Static Members\n#### `Punctuations`\ncharacters to ignore when analyzing text\nperiod, comma, semicolon, colon, bang, question mark, interrobang, Spanish bang+, parens, bracket, brace, single quote, some spaces\n\n`\\\\.,;:!?‽¡¿⸘()\\\\[\\\\]{}\u003c\u003e’'…\\\"\\n\\t\\r`\n\n#### `wordSeparators`\ncharacters to ignore AND CONSUME when trying to find words\nem-dash, period, comma, semicolon, colon, bang, question mark, interrobang, Spanish bang+, parens, bracket, brace, single quote, space\n\n`—\\\\.,;:!?‽¡¿⸘()\\\\[\\\\]{}\u003c\u003e…\"\\\\s`\n\n\n### Static Methods\n#### `hasPunctuation(string)`\n determines if string contains punctuation \n \n**Parameters**\n| name      | type  | Description   |\n| ---       |---    | ---           |\n| string    | string       |               |\n\n**Returns**\n`boolean`\n\n#### `hasSymbols(string)`\n determines if string contains symbols \n \n**Parameters**\n| name      | type  | Description   |\n| ---       |---    | ---           |\n| string    | string       |               |\n\n**Returns**\n`boolean`\n\n#### `hasSpace(string)`\n determines if a string has a space \n\n**Parameters**\n| name  | type  | Description   |\n| ---           |---        | ---           |\n| string        | string    |                |\n\n**Returns**\n`boolean`\n\n#### `sanitizeText(string)`\n lowercases text and removes diacritics and other characters that would throw off n-gram analysis \n\n**Parameters**\n| name  | type  | Description   |\n| ---           |---    | ---           |\n| string        |string       |               |\n\n**Returns**\n`string`\n\n#### `getWords(text)`\n extracts an array of words from a string \n\n**Parameters**\n| name  | type  | Description   |\n| ---       |---    | ---           |\n| text      | string       |               |\n\n**Returns**\n`Array\u003cstring\u003e`\n\n#### `getNGrams(text, gramSize)`\n gets ngrams from text \n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n|  text     | string       |               |\n|  gramSize     | Number       | Default = 2              |\n\n**Returns**\n`Array\u003cstring\u003e`\n\n#### `getMeanWordSize(wordArray)`\n Gets average size of a word\n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n|  wordArray     | string[]       |               |\n\n**Returns**\n`number`\n\n#### `getMedianWordSize(wordArray)`\n Gets the median (middle) size of a word\n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n|  wordArray     | string[]       |               |\n\n**Returns**\n`number`\n\n\n#### `getWordNGrams(text)`\nGets 2-word pairs from text.\n\nNote: This doesn't use sentence punctuation as a boundary. Should it?\n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n|   text     | string      |               |\n|   gramSize     | number      |    default=2           |\n\n**Returns**\n`Array\u003cstring\u003e`\n\n#### `getFrequencyMap(frequencyMap)`\n converts an array of strings into a map of those strings and number of occurences \n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| ngramArray       | `Array\u003cstring\u003e`       |               |\n\n**Returns**\n`Map\u003cstring, number\u003e`\n\n#### `getPercentMap(frequencyMap)`\n converts a frequency map into a map of percentages \n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n|    frequencyMap   | `Map\u003cstring, number\u003e`      |               |\n\n**Returns**\n`Map\u003cstring, number\u003e`\n\n#### `getTopGrams(frequencyMap)`\n filters a frequency map into only a small subset of the most frequent ones \n \n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| frequencyMap      |   `Map\u003cstring, number\u003e`    |               |\n| limit      |   number   |     default=20          |\n\n**Returns**\n`Map\u003cstring, number\u003e`\n\n#### `getIntersection(iterable1, iterable2)`\nReturns an array of items that occur in both iterables\n \n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| iterable1      |   `Map|Array`    |               |\n| iterable2      |   `Map|Array`    |               |\n\n**Returns**\n`Array\u003cany\u003e` \nAn array of items that occur in both iterables. It will compare the keys, if sent a map\n\n#### `getUnion(iterable1, iterable2)`\nReturns an array that is the union of two iterables\n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| iterable1      |   `Map|Array`    |               |\n| iterable2      |   `Map|Array`    |               |\n\n**Returns**\n`Array\u003cany\u003e` \nA union of the items that occur in both iterables. \n\n#### `getDisjunctiveUnion(iterable1, iterable2)`\nReturns an array of arrays of the unique items in either iterable. Also known as the symmetric difference\n \n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| iterable1      |   `Map|Array`    |               |\n| iterable2      |   `Map|Array`    |               |\n\n**Returns**\n`Array\u003cArray\u003cany\u003e` \nAn array of arrays of the unique items. The first item is the first parameter, 2nd item second param\n\n#### `getDifference`\nReturns an array of items that are unique only to the first parameter. \n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| iterable1      |   `Map|Array`    |               |\n| iterable2      |   `Map|Array`    |               |\n\n**Returns**\n`Array\u003cArray\u003cany\u003e`\nAn array of items unique only to the first parameter\n\n#### `getComparison(iterable1, iterable2)`\nReturns a map containing various comparisons between two iterables\n \n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| iterable1      |   `Map|Array`    |               |\n| iterable2      |   `Map|Array`    |               |\n\n**Returns**\n`Map\u003cstring, \u003carray\u003e\u003e` \nA map containing various comparisons between two iterables. Those comparisons will be arrays of intersection, disjunctiveUnion, difference, and union.\n\n#### `getWordPlacementForNGram(ngram, wordsArray)`\ndetermines the placement of a single ngram in an array of words\n \n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| ngram      |   `string`    |               |\n| wordsArray      |   `Array\u003cstring\u003e`    |               |\n\n**Returns**\n`Map\u003cstring, number\u003e` \na map with the keys 'start', 'middle', and 'end' whose values correspond to how often the provided ngram occurs in this position\n\n#### `getWordPlacementForNGrams(ngrams, wordsArray)`\ndetermines the placement of ngrams in an array of words\n \n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| ngram      |   `Array\u003cstring\u003e`    |               |\n| wordsArray      |   `Array\u003cstring\u003e`    |               |\n\n**Returns**\n`Map\u003cstring, Map\u003cstring, number\u003e\u003e` \na map with the key of the ngram, and the value that is a map containing start, middle, end\n\n#### `getNgramCollections(ngrams, wordsArray)`\ngets ngrams from an array of words\n \n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| wordArray      |   `Array\u003cstring\u003e`    |     an array of words          |\n| ngramSize      |   `number`    | default = 2. The size of the ngrams to return               |\n\n**Returns**\n`Array\u003cArray\u003cstring\u003e\u003e` \nAn array containing arrays of ngrams, each array corresponds to a word. \n\n#### `getNgramSiblings(searchText, ngramCollections, siblingSize)`\nusing a collection returned from getNgramCollections, searches for a string and returns what comes before and after it\n \n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| searchText      |   `string`    |     the string to search for          |\n| ngramCollections      |   `Array\u003cstring\u003e|Array\u003cArray\u003cstring\u003e\u003e`    | an array of ngrams, or an nGramCollection               |\n| siblingSize      |   `number`    | default = 1. How many siblings to find in front or behind               |\n\n**Returns**\n`Map\u003c'before'|'after',Map\u003cstring, number\u003e\u003e` \na Map with the keys 'before' and 'after' which contain maps of what comes before and after\n\n**Example**\n```JavaScript\n        const words = ['revolution', 'nation'];\n        const ngramCollections = Methodius.getNgramCollections(words, 2);\n        const onSiblings = Methodius.getNgramSiblings('io', ngramCollections);\n        /* \n        new Map([\n          ['before', new Map(\n            ['ti', 2]\n          )],\n          ['after', new Map(\n            ['on', 2]\n          )]\n        ])\n        */\n```\n\n#### `getRelatedNgrams(words, ngrams, ngramSize)`\nGets the ngrams that will occur before or after other ngrams. Useful for finding patterns of ngrams.\n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| words      |   `Array\u003cstring\u003e`    |     an array of words to evaluate          |\n| ngrams      |   `Map\u003cstring, number\u003e`    | a frequency map of ngrams               |\n| ngramSize      |   `number`    | default = 2. the size of the ngram              |\n\n**Returns**\n\n`Map\u003cstring, number\u003e` A frequency map of how often ngrams occured before or after other ngrams\n\n**Example**\n\nThis requires several steps. You'll need an array of words and a frequency map of ngrams.\n\n```JavaScript\n    const ngrams = getNGrams('the revolution of the nation was on television. It was about pollution and the terrible situation ', 2);\n    const frequencyMap = getFrequencyMap(ngrams);\n    const topNgrams = getTopGrams(frequencyMap, 5);\n    const words = ['the', 'revolution', 'of', 'the', 'nation', 'was', 'on', 'television', 'it', 'was', 'about', 'pollution', 'and', 'the', 'terrible', 'situation' ];\n    const relatedNgrams = getRelatedNgrams(words, topNgrams, 2, 5);\n```\n\n#### `getNgramTreeCollection(words)`\n\nGets a nested map of maps that breaks down unique words into their smallest ngrams\n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| words      |   `Array\u003cstring\u003e`    |     an array of words to evaluate          |\n\n**Returns**\n\n`Map\u003cstring, Array\u003cstring\u003e| Map\u003cstring, \u003cArray|string\u003e\u003e` A nested map of maps that breaks down unique words into their smallest ngrams.\n\n### Instance Members\n#### `sanitizedText`\nlowercased text with diacritics removed\n\n`string`\n#### `letters`\n an array of letters in the text\n\n`Array\u003cstring\u003e`\n#### `words`\n an array of words in the text\n\n `Array\u003cstring\u003e`\n#### `bigrams`\n an array of letter bigrams in the text\n\n  `Array\u003cstring\u003e`\n#### `trigrams`\n an array of letter trigrams in the text\n\n `Array\u003cstring\u003e`\n#### `uniqueLetters`\n an array of unique letters in the text\n\n `Array\u003cstring\u003e`\n#### `uniqueBigrams`\n an array of unique bigrams in the text\n\n `Array\u003cstring\u003e`\n#### `uniqueTrigrams`\n an array of unique trigrams in the text\n\n `Map\u003cstring, Map\u003cstring, number\u003e\u003e`\n#### `letterPositions`\na map of placements of letters within words\n\n `Map\u003cstring, Map\u003cstring, number\u003e\u003e`\n#### `bigramPositions`\na map of placements of bigrams within words\n\n `Map\u003cstring, Map\u003cstring, number\u003e\u003e`\n#### `uniqueTrigrams`\na map of placements of trigrams within words\n\n `Array\u003cstring\u003e`\n#### `uniqueWords`\n an array of unique words in the text\n\n  `Array\u003cstring\u003e`\n#### `letterFrequencies`\n a map of letter frequencies in the sanitized text\n\n  `Map\u003cstring, number\u003e`\n#### `bigramFrequencies`\n a map of bigram frequencies in the sanitized text\n\n  `Map\u003cstring, number\u003e`\n#### `trigramFrequencies`\n a map of trigram frequencies in the sanitized text\n\n  `Map\u003cstring, number\u003e`\n#### `wordFrequencies`\n a map of word frequencies in the sanitized text\n\n  `Map\u003cstring, number\u003e`\n#### `letterPercentages`\n a map of letter percentages in the sanitized text\n\n  `Map\u003cstring, number\u003e`\n#### `bigramPercentages`\n a map of bigram percentages in the sanitized text\n\n  `Map\u003cstring, number\u003e`\n#### `trigramPercentages`\n a map of trigram percentages in the sanitized text\n\n  `Map\u003cstring, number\u003e`\n#### `wordPercentages`\n a map of word percentages in the sanitized text\n\n  `Map\u003cstring, number\u003e`\n\n#### `meanWordSize`\n The average size of a word\n  \n  `number`\n\n#### `medianWordSize`\n The middle size of a word\n\n `number`\n\n#### `ngramTreeCollection`\nA nested map of maps that breaks down unique words into their smallest ngrams.\n\n### Instance Methods\n\n#### `getLetterNGrams(size)`\ngets an array of customizeable ngrams in the text\n\n**Parameters**\n| name          | type  | Description   |\n| ---           |---    | ---           |\n|    size   | `number`      | default = 2  size of the n-gram to return       |\n\n**Returns**\n`Array\u003cstring\u003e`\n\n#### `getTopLetters(limit)`\n a map of the most used letters in the text\n\n**Parameters**\n| name          | type  | Description   |\n| ---           |---    | ---           |\n|    limit   | `number`      | default = 20  number of top letters to return       |\n\n**Returns**\n`Map\u003cstring, number\u003e`\n\n#### `getTopBigrams(limit)`\n a map of the most used bigrams in the text\n\n**Parameters**\n| name          | type  | Description   |\n| ---           |---    | ---           |\n|    limit   | `number`      | default = 20  number of top bigrams to return       |\n\n**Returns**\n`Map\u003cstring, number\u003e`\n\n#### `getTopTrigrams(limit)`\n a map of the most used trigrams in the text\n\n**Parameters**\n| name          | type  | Description   |\n| ---           |---    | ---           |\n|    limit   | `number`      | default = 20  number of top trigrams to return       |\n\n**Returns**\n`Map\u003cstring, number\u003e`\n\n#### `getTopWords(limit)`\n a map of the most used words in the text\n\n**Parameters**\n| name          | type  | Description   |\n| ---           |---    | ---           |\n|    limit   | `number`      | default = 20  number of top words to return       |\n\n**Returns**\n`Map\u003cstring, number\u003e`\n\n\n#### `compareTo(methodius)`\nCompare this methodius instance to another\n\n**Parameters**\n| name          | type  | Description   |\n| ---           |---    | ---           |\n|    methodius   | `Methodius`      | another Methodius instance       |\n\n**Returns**\n`Map\u003cstring, Map\u003e`\nA map of property names and their comparisons (intersection, disjunctiveUnions, etc) for a set of properties\n\n\n#### `getRelatedTopNgrams(ngramSize, limit)`\nGets the ngrams that will occur before or after other ngrams based on what the most frequent ngrams are. Useful for finding patterns of ngrams.\n\n**Parameters**\n| name  | type  | Description   |\n| ---   |---    | ---           |\n| ngramSize      |   `number`    | default = 2. the size of the ngram              |\n| limit      |   `number`    | default = 20. the number of top ngrams to use              |\n\n**Returns**\n\n`Map\u003cstring, number\u003e` A frequency map of how often the most common ngrams occured before or after other common ngrams\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaceaux%2Fmethodius","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaceaux%2Fmethodius","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaceaux%2Fmethodius/lists"}