https://github.com/andrianllmm/aklanon-stemmer

A Python library for Aklanon word stemming.
https://github.com/andrianllmm/aklanon-stemmer

aklanon language-processing nlp stemmer

Last synced: 9 months ago
JSON representation

A Python library for Aklanon word stemming.

Host: GitHub
URL: https://github.com/andrianllmm/aklanon-stemmer
Owner: andrianllmm
License: gpl-3.0
Created: 2024-08-09T10:58:04.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-08-09T11:25:48.000Z (almost 2 years ago)
Last Synced: 2025-01-02T11:44:08.538Z (over 1 year ago)
Topics: aklanon, language-processing, nlp, stemmer
Language: Python
Homepage:
Size: 38.1 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# AklStemmer

**A Python library for Aklanon word stemming**

## About

AklStemmer is a library that finds the root form of
Aklanon
words. It works on inflected words, even those with mixed Aklanon-English terms
or those not found in dictionaries. It removes affixes, reduces repeated
syllables, and applies transformation rules to find possible root forms. These
are filtered using a list of valid words and conditions. The best root is then
chosen based on how much was changed during the process.

## Installation

```sh
pip install git+https://github.com/andrianllmm/aklanon-stemmer.git@main
```

## Usage

AklStemmer acts as a standalone library that can be imported via
`from aklstemmer import stemmer`.

Use `get_stem` to get the root of a word. This takes a word and returns its stem
as a `Stem` object (basically a string with affixes, reduplication,
transformations, etc. as additional attributes).

```python
stem = stemmer.get_stem("nagsueat")
print(stem)
# Output: 'sueat'
```

Since `get_stem` returns a `Stem` object, the properties used in the stemming
process can be accessed as attributes.

```python
prefix = stem.pre
print(prefix)
# Output: 'nag'

suffix = stem.suf
print(suffix)
# Output: None
```

Use `get_stems` to get the root of each word in a text. This takes a text and
returns the stem of each word as a list of `Stem` objects.

```python
stems = stemmer.get_stems("nagsueat, binasa, ag gision")
print(stems)
# Output: ['sueat', 'basa', 'at', 'gisi']
```

Use `get_stem_candidates` to get all the stem candidates of a word. This takes a
word and returns the possible stems as a list of `Stem` objects. This is helpful
for loose checking considering candidate selection is not perfect.

```python
candidates = stemmer.get_stem_candidates("bukot")
print(candidates)
# Output: ['bukot', 'buko', 'bukon']
```

## Accuracy

The accuracy hasn't been tested yet.

## Contributing

Contributions are welcome! To get started:

1. Fork the project
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a pull request

## Issues

Found a bug or issue? Report it on the
[issues page](https://github.com/andrianllmm/aklanon-stemmer/issues).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andrianllmm/aklanon-stemmer

Awesome Lists containing this project

README