Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jakartaresearch/maleo
Wrapper library for text cleansing, preprocessing in NLP
https://github.com/jakartaresearch/maleo
indonesian-language machine-learning nlp nlp-library
Last synced: 5 days ago
JSON representation
Wrapper library for text cleansing, preprocessing in NLP
- Host: GitHub
- URL: https://github.com/jakartaresearch/maleo
- Owner: jakartaresearch
- License: mit
- Created: 2020-08-31T12:16:48.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-04-08T13:35:56.000Z (over 3 years ago)
- Last Synced: 2024-11-07T15:52:04.845Z (13 days ago)
- Topics: indonesian-language, machine-learning, nlp, nlp-library
- Language: Python
- Homepage: https://jakartaresearch.github.io/maleo/
- Size: 147 KB
- Stars: 17
- Watchers: 4
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Maleo
Wrapper library for text cleansing, preprocessing and POS Tagging in NLP## Docs
https://jakartaresearch.github.io/maleo/## Overview of features
- Scanner : get insight about your text dataset (ex: number of chars, words, emojis, etc)
- Remove hyperlink, punctuation, stopword, emoticon, etc
- Extract hashtags, price from text
- Convert email, phone number, date to
- Convert Indonesian slang to formal word
- Convert emoji to word or
- Convert word to number
- Predict Part-of-Speech (POS) tags## Installation
```
pip install maleo
```## Getting Started
```python
from maleo.wizard import Wizard
from maleo.pos_tag import POSwiz = Wizard()
pos = POS()wiz.scanner(df, 'text')
wiz.emoji_to_word(df.text)
wiz.slang_to_formal(df.text)pos.predict('saya mau pergi beli makan siang dulu', output_pair=False)
```## Universal POS tags
https://universaldependencies.org/u/pos/index.html## Contributor:
- Ruben Stefanus