Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hswick/one-hot-encoder
Tiny one hot encoding library for the Clojure community
https://github.com/hswick/one-hot-encoder
Last synced: 6 days ago
JSON representation
Tiny one hot encoding library for the Clojure community
- Host: GitHub
- URL: https://github.com/hswick/one-hot-encoder
- Owner: hswick
- License: epl-1.0
- Created: 2016-12-20T08:11:40.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2016-12-24T19:11:35.000Z (almost 8 years ago)
- Last Synced: 2024-06-15T08:22:44.216Z (5 months ago)
- Language: Clojure
- Size: 7.81 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# one-hot-encoder
A tiny Clojure library designed to perform feature extraction on categorical data using one hot encoding.
https://en.wikipedia.org/wiki/One-hot## Installation
To install with Leiningen:```clojure
[hswick/one-hot-encoder "0.1.0"]
```## Usage
```clojure
(ns foo.clojure
(:require [one-hot-encoder.core :refer :all]))(def test-data [["Sample" "Category" "Numerical"]
[1 "Human" 1]
[2 "Human" 1]
[3 "Penguin" 2]
[4 "Octopus" 3]
[5 "Alien" 4]
[6 "Octopus" 3]
[7 "Alien" 4]])(def cols (distinct (map #(nth % 1) (rest test-data))))
(encode cols "Human");;=> [1 0 0 0]
(encode cols "Human" "Penguin");;=> [1 1 0 0]
(encode-coll cols ["Human" "Penguin"]);;=> [1 1 0 0]
(encode-table cols [["Human" "Penguin"]
["Human" "Penguin"]
["Human" "Penguin"]
["Human" "Penguin"]
["Human" "Penguin"]])
;;=> ([1 1 0 0] [1 1 0 0] [1 1 0 0] [1 1 0 0] [1 1 0 0])(def enc (encode ["Human" "Bro" "Foo"] "Human"))
(decode ["Human" "Bro" "Foo"] enc);;=> ("Human")
```Implementation taken from top answer [here](https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science)
## LicenseCopyright © 2016 FIXME
Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.