https://github.com/giagiannis/data-profiler
Data profiler is an attempt to model the behavior of a given operator for a set of datasets.
https://github.com/giagiannis/data-profiler
bhattacharyya-coefficient data-modeling data-profiling data-science dataset machine-learning similarity-matrix
Last synced: about 1 month ago
JSON representation
Data profiler is an attempt to model the behavior of a given operator for a set of datasets.
- Host: GitHub
- URL: https://github.com/giagiannis/data-profiler
- Owner: giagiannis
- License: apache-2.0
- Created: 2016-06-01T08:47:18.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2019-01-09T13:26:12.000Z (about 7 years ago)
- Last Synced: 2024-06-20T07:57:06.805Z (almost 2 years ago)
- Topics: bhattacharyya-coefficient, data-modeling, data-profiling, data-science, dataset, machine-learning, similarity-matrix
- Language: Go
- Homepage:
- Size: 1.48 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
data-profiler [](https://travis-ci.org/giagiannis/data-profiler) [](https://goreportcard.com/report/github.com/giagiannis/data-profiler) [](https://coveralls.io/github/giagiannis/data-profiler?branch=master) [](https://hub.docker.com/r/ggian/data-profiler/)
=============
__data-profiler__ is a Go project used to transform a set of datasets, based on a set of characteristics (distribution similarity, correlation, etc.), in order to model the behavior of an operator, applied on top of them using Machine Learning techniques.
Screenshots
-----------




Installation
------------
You have two ways of installing __data-profiler__:
1. Through Go:
```bash
# GOPATH must be set
~> go get github.com/giagiannis/data-profiler
```
2. Using Docker:
```bash
~> docker pull ggian/data-profiler
```
Usage
-----
__data-profiler__ can be used both through a CLI and a Web interface.
1. CLI
You can access the CLI client through the __data-profiler-utils__ binary.
```bash
~> $GOPATH/bin/data-profiler-utils
```
This previous command will give an overview of the available actions.
__Note:__ use this client only if you know how data-profiler works.
2. Web UI
First run the Docker container, providing a directory with the dataset files.
```bash
~> docker run -v /src/datasets:/datasets -p 8080:8080 -d ggian/data-profiler
```
This command mounts the host's _/src/datasets_ directory to the container and forwards the host's 8080 port to the container. After the successful start of the container, go to _http://dockerhost:8080_ and insert the first set of datasets for analysis.
License
-------
Apache License v2.0 (see [LICENSE](LICENSE) file for more)
Contact
-------
Giannis Giannakopoulos ggian@cslab.ece.ntua.gr