Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hswick/jutsu.nlp
Library for doing natural language processing such as word embedding
https://github.com/hswick/jutsu.nlp
Last synced: 6 days ago
JSON representation
Library for doing natural language processing such as word embedding
- Host: GitHub
- URL: https://github.com/hswick/jutsu.nlp
- Owner: hswick
- Created: 2017-07-28T04:15:53.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-11-30T18:51:24.000Z (almost 6 years ago)
- Last Synced: 2024-09-16T17:13:57.001Z (2 months ago)
- Language: Clojure
- Size: 245 KB
- Stars: 10
- Watchers: 3
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# jutsu.nlp
Clojure library meant for word embedding using the deeplearning4j library under the hood.
Word2Vec and Doc2Vec features are fully functional. Project is currently in a stable state.
API is subject to change in future versions.
Pull requests are welcome!
# Usage
To install add this to your dependencies:
```clojure
[hswick/jutsu.nlp "0.1.0"]
```To use jutsu.nlp:
```clojure
(:require '[jutsu.nlp.core :as nlp]
'[jutsu.nlp.util :as util]);;Configure your Word2Vec model
(def w2v (nlp/word-2-vec "path/to/text-file"
{:min-word-frequency 5;;You can also input an option map
:iterations 1 ;;To set certain parameters
:layer-size 100
:seed 42
:window-size 5}));;This trains the model on the data given
(nlp/fit! w2v);;Write the word2vec model to memory
(nlp/write-word-vectors w2v "word_vectors.csv");;Load a word2vec model from memory
(def w2v-2 (nlp/read-word-vectors (clojure.java.io/file "word_vectors.csv")));;If you want stopping and stemming initialize word2vec like this
(require '[jutsu.nlp.sentence-iterator :as iter]
'[jutsu.nlp.tokenization :as token])(nlp/word-2-vec
(iter/default-iterator (util/absolute-path "neuromancer.txt"))
(token/default-tokenizer-factory (token/common-stemmer-preprocessor))
{:min-word-frequency 6
:stopwords (nlp/stop-words)
:window-size 10
:layer-size 150});;If you want to input a directory initialize like this
(def w2v4 (nlp/word-2-vec
(iter/dir-iterator "path/to/dir")
(token/default-tokenizer-factory)
{:min-word-frequency 6
:window-size 10
:layer-size 150
:stopwords (nlp/stop-words)}))
```# Dev
Run `boot night` to startup nightlight and begin editing your project in a browser.
Run `boot test-code` to run tests.
## License
Copyright © 2017 FIXME
Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.