https://github.com/drupol/phpngrams
Get N-Grams !
https://github.com/drupol/phpngrams
n-grams ngrams
Last synced: 1 day ago
JSON representation
Get N-Grams !
- Host: GitHub
- URL: https://github.com/drupol/phpngrams
- Owner: drupol
- License: mit
- Created: 2018-02-05T14:55:35.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-07-24T06:42:55.000Z (about 6 years ago)
- Last Synced: 2025-02-01T20:44:59.382Z (8 months ago)
- Topics: n-grams, ngrams
- Language: PHP
- Homepage: https://not-a-number.io/phpngrams
- Size: 170 KB
- Stars: 8
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://packagist.org/packages/drupol/phpngrams)
[](https://packagist.org/packages/drupol/phpngrams)
[](https://travis-ci.org/drupol/phpngrams)
[](https://scrutinizer-ci.com/g/drupol/phpngrams/?branch=master)
[](https://scrutinizer-ci.com/g/drupol/phpngrams/?branch=master)
[](https://stryker-mutator.github.io)
[](https://packagist.org/packages/drupol/phpngrams)## PHPNgrams
PHP N-Grams library
## Introduction
In the fields of computational linguistics, machine-learning and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may also be called shingles.
An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram". Larger sizes are sometimes referred to by the value of n in modern language, e.g., "four-gram", "five-gram", and so on. (More on [Wikipedia](https://en.wikipedia.org/wiki/N-gram))
## Requirements
* PHP >= 7.0
## Installation
Include this library in your project by doing:
`composer require drupol/phpngrams`
The library provides two classes:
* NGrams
* NGramsCyclicand one trait:
* NGramsTrait
## Usage
```php
ngrams($chars, 3);print_r(iterator_to_array($ngrams));
/*
[
0 =>
[
0 => 'h',
1 => 'e',
2 => 'l',
],
1 =>
[
0 => 'e',
1 => 'l',
2 => 'l',
],
2 =>
[
0 => 'l',
1 => 'l',
2 => 'o',
],
3 =>
[
0 => 'l',
1 => 'o',
2 => ' ',
],
4 =>
[
0 => 'o',
1 => ' ',
2 => 'w',
],
5 =>
[
0 => ' ',
1 => 'w',
2 => 'o',
],
6 =>
[
0 => 'w',
1 => 'o',
2 => 'r',
],
7 =>
[
0 => 'o',
1 => 'r',
2 => 'l',
],
8 =>
[
0 => 'r',
1 => 'l',
2 => 'd',
],
];
*/$string = 'hello world';
// Better use preg_split() than str_split() in case of UTF8 strings.
$chars = preg_split('/(?!^)(?=.)/u', $string);$ngrams = (new NGramsCyclic())->ngrams($chars, 3);
print_r(iterator_to_array($ngrams));
/*
[
0 => [
0 => 'h',
1 => 'e',
2 => 'l',
],
1 => [
0 => 'e',
1 => 'l',
2 => 'l',
],
2 => [
0 => 'l',
1 => 'l',
2 => 'o',
],
3 => [
0 => 'l',
1 => 'o',
2 => ' ',
],
4 => [
0 => 'o',
1 => ' ',
2 => 'w',
],
5 => [
0 => ' ',
1 => 'w',
2 => 'o',
],
6 => [
0 => 'w',
1 => 'o',
2 => 'r',
],
7 => [
0 => 'o',
1 => 'r',
2 => 'l',
],
8 => [
0 => 'r',
1 => 'l',
2 => 'd',
],
9 => [
0 => 'l',
1 => 'd',
2 => 'h',
],
10 => [
0 => 'd',
1 => 'h',
2 => 'e',
],
];
*/
```To reduce to the maximum the memory footprint, the library returns Generators, if you want to get the complete resulting array, use [iterator_to_array()](https://secure.php.net/manual/en/function.iterator-to-array.php).
## API
Find the complete API documentation at [https://not-a-number.io/phpngrams](https://not-a-number.io/phpngrams).
## Code quality and tests
Every time changes are introduced into the library, [Travis CI](https://travis-ci.org/drupol/phpngrams/builds) run the tests.
The library has tests written with [PHPSpec](http://www.phpspec.net/).
Feel free to check them out in the `spec` directory. Run `composer phpspec` to trigger the tests.
[PHPInfection](https://github.com/infection/infection) is used to ensure that your code is properly tested, run `composer infection` to test your code.
# Contributing
Feel free to contribute to this library by sending Github pull requests. I'm quite reactive :-)