https://github.com/brahle/data_partitioner

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/brahle/data_partitioner
Owner: brahle
License: lgpl-3.0
Created: 2017-04-10T01:21:32.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2017-04-10T01:53:56.000Z (about 8 years ago)
Last Synced: 2025-02-18T12:44:47.031Z (3 months ago)
Language: Python
Size: 9.77 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Data Partitioner

Simple project that can be used to consistently partition a data set into two parts - a test set and

a training set. There are also helpful methods that provide a way to partition into more groups of

elements. 

# Installation

The easiest way to install this module is to install it via `pip`:

```

$ pip install data_partitioner

```

# Usage

Using this module is dead simple. The main module (`DatasetSuplier`) offers two methods that return

the training set (`training_set()`) or the test set (`test_set()`). Both of these methods are

consitent, so no matter how many times you call them on the same object, they will return the same

set of elements back. 

You have two configuration options you can specify:

- `training_percent` - the percent of the dataset used for the training set. It defaults to `0.8`.

- `partitioning_function` - the function that's used to partition the dataset. 

  - It defaults to `data_partitioner.pseudorandom_function`, which will randomly assign every

  element of the dataset to either the test set or the training set.

  - Another useful existing option you can set it to is `data_partitioner.LinearFakeRandomFunction`, 

  which will make sure that no elements in the training set come after any elements of the test set. 

  - You can also manually write this callable, which will take one parameter as input - the index

  of the element currently considered.

# Example

```

from data_partitioner import DatasetSuplier

dataset = [

    ('Alice', 10, 23, 401),

    ('Bob', 20, 40, 812),

    ('Christine', 41, 92, 533),

    ('Dave', 843, 12, -5),

    ('Elizabeth', 682, 33, -7),

    ('Fred', 95, 642, 34),

]

suplier = DatasetSuplier(dataset)

for iteration in range(100):

    for element in suplier.training_set():

        do_train(element[1])

for element in suplier.test_set():

    do_evaluate(element[1])

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/brahle/data_partitioner

Awesome Lists containing this project

README