https://github.com/dongjinleekr/beanpiece
A Java binding to Google SentencePiece
https://github.com/dongjinleekr/beanpiece
google-sentencepiece java-bindings natural-language-processing neural-machine-translation word-segmentation
Last synced: about 2 months ago
JSON representation
A Java binding to Google SentencePiece
- Host: GitHub
- URL: https://github.com/dongjinleekr/beanpiece
- Owner: dongjinleekr
- License: apache-2.0
- Created: 2017-10-25T02:10:22.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-06-28T21:07:24.000Z (almost 8 years ago)
- Last Synced: 2023-06-30T04:41:35.826Z (almost 3 years ago)
- Topics: google-sentencepiece, java-bindings, natural-language-processing, neural-machine-translation, word-segmentation
- Language: C++
- Size: 235 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Beanpiece: A Java binding to [Google SentencePiece](https://github.com/google/sentencepiece)
=====
[](https://travis-ci.org/dongjinleekr/beanpiece)
[](http://codecov.io/github/dongjinleekr/beanpiece?branch=master)
[](https://maven-badges.herokuapp.com/maven-central/com.dongjinlee/beanpiece)
SentencePiece is an unsupervised text tokenizer and detokenizer, developed by Google. Beanpiece provides a Java API to SentencePiece.
# Compatibility
As of version 0.2, this library provides API compatibility to [commit 1ff5904(Apr 1, 2018)](https://github.com/google/sentencepiece/commit/1ff5904e6606c2ece00d52fd419c9e199ce56596).
# How to build
The following tools are required to build Beanpiece:
- sbt
- g++ compiler, which supports c++ 11.
To build the project, just give:
```sh
sbt package
```
It will take all the tasks needed, from copying shared libraries from compiling, packaging the Java source code.
# Note for Windows/Mac Users
As of version 0.2, the project only contains `libsentencepiece.so` for Linux (amd64) only. Because of that, the built jar will not run on osx or windows - they will be added at 0.3.
Until then, please build the sentencepiece shared library by yourself and copy them into:
- windows: `/library/windows/[i386|amd64|ppc]`
- osx: `/library/windows/[i386|amd64|ppc]`
After then, you can build the project as described above.