https://github.com/maxpatiiuk/porter-stemming
TypeScript implementation of the Porter Stemmer algorithm
https://github.com/maxpatiiuk/porter-stemming
porter stemmer stemming
Last synced: 7 months ago
JSON representation
TypeScript implementation of the Porter Stemmer algorithm
- Host: GitHub
- URL: https://github.com/maxpatiiuk/porter-stemming
- Owner: maxpatiiuk
- License: mit
- Created: 2023-04-30T01:14:18.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-23T01:39:03.000Z (over 1 year ago)
- Last Synced: 2025-03-18T12:47:17.223Z (7 months ago)
- Topics: porter, stemmer, stemming
- Language: TypeScript
- Homepage:
- Size: 662 KB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Porter Stemmer
This is a TypeScript implementation
of [The Porter Stemming Algorithm](https://tartarus.org/martin/PorterStemmer/),
a popular and efficient algorithm used for word stemming in information
retrieval and natural language processing.Word stemming is the process of reducing a word to its base or root form, making
it easier to identify related words and analyze texts more effectively.## Installation
To install the package, run the following command:
```bash
npm install porterstem
```## Usage
To use the Porter Stemming Algorithm in your TypeScript or JavaScript project,
simply import the stem function from the package and apply it to a word or an
array of words:```typescript
import { stem } from 'porterstem';// Single word
const word = 'running';
const stemmedWord = stem(word);
console.log(stemmedWord); // Output: 'run'// Array of words
const words = ['jumps', 'jumped', 'jumping'];
const stemmedWords = words.map(word => stem(word));
console.log(stemmedWords); // Output: ['jump', 'jump', 'jump']
```# About the Porter Stemming Algorithm
The Porter Stemming Algorithm, developed by Martin Porter in 1980, is an
algorithm used for stemming words in the English language. It works by removing
the common morphological and inflectional endings from words, such as plurals,
past tenses, and gerunds.The algorithm consists of five phases of word reductions applied sequentially.
Each phase contains a set of rules that define how to remove or replace a suffix
based on the word's structure and length. The result is a stemmed word that
represents the base or root form of the input word.## Meta
Inspired by https://www.npmjs.com/package/stemmer
The algorithm does not use mutation and is type-safe.
No external dependencies.
Correctness is validated using the
[vocabulary](https://tartarus.org/martin/PorterStemmer/voc.txt)
and [output pairs](https://tartarus.org/martin/PorterStemmer/output.txt)
provided by Martin Porter