Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hswick/jutsu.nlp

Library for doing natural language processing such as word embedding
https://github.com/hswick/jutsu.nlp

Last synced: 6 days ago
JSON representation

Library for doing natural language processing such as word embedding

Awesome Lists containing this project

README

        

# jutsu.nlp

Clojure library meant for word embedding using the deeplearning4j library under the hood.

Word2Vec and Doc2Vec features are fully functional. Project is currently in a stable state.

API is subject to change in future versions.

Pull requests are welcome!

# Usage

To install add this to your dependencies:

```clojure
[hswick/jutsu.nlp "0.1.0"]
```

To use jutsu.nlp:
```clojure
(:require '[jutsu.nlp.core :as nlp]
'[jutsu.nlp.util :as util])

;;Configure your Word2Vec model
(def w2v (nlp/word-2-vec "path/to/text-file"
{:min-word-frequency 5;;You can also input an option map
:iterations 1 ;;To set certain parameters
:layer-size 100
:seed 42
:window-size 5}))

;;This trains the model on the data given
(nlp/fit! w2v)

;;Write the word2vec model to memory
(nlp/write-word-vectors w2v "word_vectors.csv")

;;Load a word2vec model from memory
(def w2v-2 (nlp/read-word-vectors (clojure.java.io/file "word_vectors.csv")))

;;If you want stopping and stemming initialize word2vec like this
(require '[jutsu.nlp.sentence-iterator :as iter]
'[jutsu.nlp.tokenization :as token])

(nlp/word-2-vec
(iter/default-iterator (util/absolute-path "neuromancer.txt"))
(token/default-tokenizer-factory (token/common-stemmer-preprocessor))
{:min-word-frequency 6
:stopwords (nlp/stop-words)
:window-size 10
:layer-size 150})

;;If you want to input a directory initialize like this
(def w2v4 (nlp/word-2-vec
(iter/dir-iterator "path/to/dir")
(token/default-tokenizer-factory)
{:min-word-frequency 6
:window-size 10
:layer-size 150
:stopwords (nlp/stop-words)}))
```

# Dev

Run `boot night` to startup nightlight and begin editing your project in a browser.

Run `boot test-code` to run tests.

## License

Copyright © 2017 FIXME

Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.