Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ocramz/record-encode
Generic encoding of record types
https://github.com/ocramz/record-encode
categorical-data categorical-features data-analysis data-mining data-science generic-programming machine-learning one-hot-encode preprocessing
Last synced: 22 days ago
JSON representation
Generic encoding of record types
- Host: GitHub
- URL: https://github.com/ocramz/record-encode
- Owner: ocramz
- License: bsd-3-clause
- Created: 2018-08-06T04:29:29.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-01-27T20:08:55.000Z (almost 6 years ago)
- Last Synced: 2024-05-01T23:26:03.754Z (6 months ago)
- Topics: categorical-data, categorical-features, data-analysis, data-mining, data-science, generic-programming, machine-learning, one-hot-encode, preprocessing
- Language: Haskell
- Size: 39.1 KB
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# record-encode
## Encoding categorical variables
[![Build Status](https://travis-ci.org/ocramz/record-encode.png)](https://travis-ci.org/ocramz/record-encode)
[![Hackage](https://img.shields.io/hackage/v/record-encode.svg)](https://hackage.haskell.org/package/record-encode)This library provides generic machinery to encode values of some algebraic type as points in a vector space.
Values of a sum type (e.g. enumerations) are also called "categorical" variables in statistics, because they encode a choice between a number of discrete categories.
On the other hand, many data science / machine learning algorithms rely on a purely numerical representation of data; the conversion code from values of a static type is often "boilerplate", i.e. largely repeated and not informative.
The `encodeOneHot` function provided here is a generic utility function (i.e. defined once and for all) to compute the one-hot representation of any sum type.
# Usage example
```
{-# language DeriveGeneric -#}import qualified GHC.Generics as G
import qualified Generics.SOP as SOP
import Data.Record.Encodedata X = A | B | C deriving (G.Generic)
instance SOP.Generic X
``````
> encodeOneHot B
OH {oDim = 3, oIx = 1}
```Please refer to the documentation of Data.Record.Encode for more examples and details.
# Acknowledgements
Gagandeep Bhatia (@gagandeepb) for his Google Summer of Code 2018 work on [`Frames-beam`](https://github.com/gagandeepb/Frames-beam), Mark Karpov (@mrkkrp) for his Template Haskell tutorial, Anthony Cowley (@acowley) for [`Frames`](https://hackage.haskell.org/package/Frames), @mniip on Freenode #haskell for helping me better understand what can be done with generic programming.