Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ahmetaa/zemberek-nlp
NLP tools for Turkish.
https://github.com/ahmetaa/zemberek-nlp
language morphology nlp turkish zemberek-nlp
Last synced: 3 days ago
JSON representation
NLP tools for Turkish.
- Host: GitHub
- URL: https://github.com/ahmetaa/zemberek-nlp
- Owner: ahmetaa
- License: other
- Created: 2013-06-03T06:04:14.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2024-07-25T09:39:36.000Z (6 months ago)
- Last Synced: 2025-01-02T02:03:57.303Z (10 days ago)
- Topics: language, morphology, nlp, turkish, zemberek-nlp
- Language: Java
- Homepage:
- Size: 52 MB
- Stars: 1,170
- Watchers: 109
- Forks: 210
- Open Issues: 55
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-turkish-nlp - ahmetaa/zemberek-nlp - The most known Turkish NLP tool. (Libraries / Java)
- turkish-nlp-resources - Zemberek-NLP - NLP provides Natural Language Processing tools for Turkish. (Tools/Libraries)
README
Zemberek-NLP
============**This project is now in slow maintenance mode.**
Zemberek-NLP provides Natural Language Processing tools for Turkish.
Latest version is 0.17.1 (July 23th 2019). [Change Log](CHANGELOG.md)
Please read the [FAQ](https://github.com/ahmetaa/zemberek-nlp/wiki/FAQ) for common questions.
## Modules
| Module | Maven Id | |
|------------|----------|---------|
| [Core](core) | zemberek-core | Special Collections, Hash functions and helpers. |
| [Morphology](morphology) | zemberek-morphology | Turkish morphological analysis, disambiguation and word generation. |
| [Tokenization](tokenization) | zemberek-tokenization | Turkish Tokenization and sentence boundary detection. |
| [Normalization](normalization) | zemberek-normalization | Basic spell checker, word suggestion. Noisy text normalization. |
| [NER](ner) | zemberek-ner | Turkish Named Entity Recognition. |
| [Classification](classification)| zemberek-classification | Text classification based on Java port of fastText project. |
| [Language Identification](lang-id)| zemberek-lang-id | Fast identification of text language. |
| [Language Modeling](lm) | zemberek-lm | Provides a language model compression algorithm. |
| [Applications](apps) | zemberek-apps | Console applications |
| [gRPC Server](grpc) | zemberek-grpc | gRPC server for access from other languages. |
| [Examples](examples) | zemberek-examples | Usage examples. |## Usage
### Maven
Add this to pom.xml file
```xml
zemberek-repo
zemberek Maven Repo on Github
https://raw.github.com/ahmetaa/maven-repo/master
```And dependencies (For example morphology):
```xml
zemberek-nlp
zemberek-morphology
0.17.1
```### Jar distributions
[Google drive page](https://drive.google.com/#folders/0B9TrB39LQKZWSjNKdVcwWUxxUm8) contains jar files for different versions.
[**zemberek-full.jar**] is a single jar that contains all modules and dependencies.
When it is run directly with
java -jar zemberek-full.jarIt will list available console applications.
### For Developers
[Here](https://github.com/ahmetaa/zemberek-nlp/wiki/Zemberek-For-Developers) information about
how to compile the code and generate jar files from the project is explained.### Examples
There is an [examples](examples) module in the code for usage examples.
Also, there is a separate project with same examples that uses Zemberek-NLP as maven modules:
[Turkish-nlp-examples](https://github.com/ahmetaa/turkish-nlp-examples)## Known Issues and Limitations
- NER module does not provide a model yet.
- Library is not well-tested for multi-threaded usage.Please see issues section for further issues and feel free to create new ones.
## License
Code is licensed under Apache License, Version 2.0## Citing
If you use this project in an academic publication, please refer to this site.
## Acknowledgements
Please refer to contributors.txt file.