Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jfilter/german-preprocessing
🇩🇪 Preprocess German texts to do some serious natural-language processing.
https://github.com/jfilter/german-preprocessing
german nlp package python
Last synced: 9 days ago
JSON representation
🇩🇪 Preprocess German texts to do some serious natural-language processing.
- Host: GitHub
- URL: https://github.com/jfilter/german-preprocessing
- Owner: jfilter
- License: mit
- Created: 2019-07-30T10:09:22.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-09T05:18:39.000Z (almost 2 years ago)
- Last Synced: 2024-10-12T18:49:57.366Z (about 1 month ago)
- Topics: german, nlp, package, python
- Language: Python
- Homepage:
- Size: 37.1 KB
- Stars: 10
- Watchers: 6
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# German Preprocessing [![Build Status](https://travis-ci.com/jfilter/german-preprocessing.svg?branch=master)](https://travis-ci.com/jfilter/german-preprocessing) [![PyPI](https://img.shields.io/pypi/v/german.svg)](https://pypi.org/project/german/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/german.svg)](https://pypi.org/project/german/)
Preprocess German texts to do some serious natural-language processing.
- [clean texts](https://github.com/jfilter/clean-text)
- remove stopwords (as defined by [spaCy](https://github.com/explosion/spaCy/blob/master/spacy/lang/de/stop_words.py))
- [lemmatize](https://github.com/jfilter/german-lemmatizer)
- lower-case, and remove all punctions, digits are replaced with "0"## Installation
`pip install german`
## Usage
```python
from german import preprocesspreprocess(['Johannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
# ['johannes gut schüler', 'julia trinken tee']
```## License
MIT.
## Sponsoring
This work was created as part of a [project](https://github.com/jfilter/ptf) that was funded by the German [Federal Ministry of Education and Research](https://www.bmbf.de/en/index.html).