https://github.com/wooorm/trigram-utils
A few language trigram utilities
https://github.com/wooorm/trigram-utils
clean trigram tuple
Last synced: about 2 months ago
JSON representation
A few language trigram utilities
- Host: GitHub
- URL: https://github.com/wooorm/trigram-utils
- Owner: wooorm
- License: mit
- Created: 2014-09-19T10:32:34.000Z (over 10 years ago)
- Default Branch: main
- Last Pushed: 2022-11-20T10:50:54.000Z (over 2 years ago)
- Last Synced: 2025-04-17T17:18:26.041Z (about 2 months ago)
- Topics: clean, trigram, tuple
- Language: JavaScript
- Homepage:
- Size: 73.2 KB
- Stars: 11
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- Funding: funding.yml
- License: license
Awesome Lists containing this project
README
# trigram-utils
[![Build][build-badge]][build]
[![Coverage][coverage-badge]][coverage]
[![Downloads][downloads-badge]][downloads]
[![Size][size-badge]][size]A few language trigram utilities.
## Contents
* [What is this?](#what-is-this)
* [When should I use this?](#when-should-i-use-this)
* [Install](#install)
* [Use](#use)
* [API](#api)
* [`clean(value)`](#cleanvalue)
* [`trigrams(value)`](#trigramsvalue)
* [`asDictionary(value)`](#asdictionaryvalue)
* [`asTuples(value)`](#astuplesvalue)
* [`tuplesAsDictionary(tuples)`](#tuplesasdictionarytuples)
* [Types](#types)
* [Compatibility](#compatibility)
* [Security](#security)
* [Related](#related)
* [Contribute](#contribute)
* [License](#license)## What is this?
This package contains a few utilities that can help when working with trigram
(an n-gram where each slice is 3 characters) based natural language detection.## When should I use this?
Probably not often, except when you want to create something like [franc][],
but build it in something other than UDHR.## Install
This package is [ESM only][esm].
In Node.js (version 14.14+, 16.0+), install with [npm][]:```sh
npm install trigram-utils
```In Deno with [`esm.sh`][esmsh]:
```js
import * as trigramUtils from 'https://esm.sh/trigram-utils@2'
```In browsers with [`esm.sh`][esmsh]:
```html
import * as trigramUtils from 'https://esm.sh/trigram-utils@2?bundle'
```
## Use
```js
import {clean, trigrams, asDictionary, asTuples, tuplesAsDictionary} from 'trigram-utils'clean(' t@rololol ') // => 't rololol'
trigrams(' t@rololol ')
// => [' t ', 't r', ' ro', 'rol', 'olo', 'lol', 'olo', 'lol', 'ol ']asDictionary(' t@rololol ')
// => {'ol ': 1, lol: 2, olo: 2, rol: 1, ' ro': 1, 't r': 1, ' t ': 1}const tuples = asTuples(' t@rololol ')
// => [
// ['ol ', 1],
// ['rol', 1],
// [' ro', 1],
// ['t r', 1],
// [' t ', 1],
// ['lol', 2],
// ['olo', 2]
// ]tuplesAsDictionary(tuples)
// => {olo: 2, lol: 2, ' t ': 1, 't r': 1, ' ro': 1, rol: 1, 'ol ': 1}
```## API
This package exports the identifiers `clean`, `trigrams`,
`asDictionary`, `asTuples`, and `tuplesAsDictionary`.
There is no default export.### `clean(value)`
Clean a value (`string`).
Strips some (for language detection) useless punctuation, symbols, and numbers.
Collapses white space, trims, and lowercases.### `trigrams(value)`
From a value (`string`), make clean, padded trigrams (see [`n-gram`][n-gram])
(`Array`).### `asDictionary(value)`
From a value (`string`), get clean trigrams as a dictionary
(`Record`): keys are trigrams, values are occurrence counts.### `asTuples(value)`
From a value (`string`), get clean trigrams with occurrence counts as a tuple
(`Array<[string, number]>`): first index (`0`) the trigram, second (`1`) the
occurrence count.### `tuplesAsDictionary(tuples)`
Turn trigram tuples (`Array<[string, number]>`) into a dictionary
(`Record`).## Types
This package is fully typed with [TypeScript][].
It exports the additional types `TrigramTuple`, `TrigramTuples`, and
`TrigramDictionary`.## Compatibility
This package is at least compatible with all maintained versions of Node.js.
As of now, that is Node.js 14.14+ and 16.0+.
It also works in Deno and modern browsers.## Security
This package is safe.
## Related
* [`words/trigrams`](https://github.com/wooorm/trigrams)
— trigrams for 400+ languages based on UDHR
* [`words/n-gram`](https://github.com/words/n-gram)
— get n-grams from text
* [`wooorm/franc`][franc]
— natural language detection## Contribute
Yes please!
See [How to Contribute to Open Source][contribute].## License
[MIT][license] © [Titus Wormer][author]
[build-badge]: https://github.com/wooorm/trigram-utils/workflows/main/badge.svg
[build]: https://github.com/wooorm/trigram-utils/actions
[coverage-badge]: https://img.shields.io/codecov/c/github/wooorm/trigram-utils.svg
[coverage]: https://codecov.io/github/wooorm/trigram-utils
[downloads-badge]: https://img.shields.io/npm/dm/trigram-utils.svg
[downloads]: https://www.npmjs.com/package/trigram-utils
[size-badge]: https://img.shields.io/bundlephobia/minzip/trigram-utils.svg
[size]: https://bundlephobia.com/result?p=trigram-utils
[npm]: https://docs.npmjs.com/cli/install
[esmsh]: https://esm.sh
[license]: license
[author]: https://wooorm.com
[esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
[typescript]: https://www.typescriptlang.org
[contribute]: https://opensource.guide/how-to-contribute/
[n-gram]: https://github.com/words/n-gram
[franc]: https://github.com/wooorm/franc