Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anze3db/corpus_cleaner
https://github.com/anze3db/corpus_cleaner
Last synced: 7 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/anze3db/corpus_cleaner
- Owner: anze3db
- Created: 2021-02-14T18:56:33.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2021-02-15T12:47:16.000Z (almost 4 years ago)
- Last Synced: 2024-11-01T04:42:40.666Z (about 2 months ago)
- Language: Python
- Size: 10.7 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Corpus Cleaner
Prepare text for tagging with TreeTagger.
```
# 1. set the path to tree-tagger and slovenian-utf8.par in __main__.py or copy them to the root directory
# 2. Install Poetry dependencies
# 3. Run `poetry run python cleaner`
```