https://github.com/msoukharev/pydatamocker

Powerful data mocker for python
https://github.com/msoukharev/pydatamocker

data-mocking python

Last synced: 5 months ago
JSON representation

Powerful data mocker for python

Host: GitHub
URL: https://github.com/msoukharev/pydatamocker
Owner: msoukharev
License: mit
Created: 2021-07-25T03:19:14.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2022-06-20T02:39:05.000Z (about 4 years ago)
Last Synced: 2025-09-29T05:20:30.153Z (9 months ago)
Topics: data-mocking, python
Language: Python
Homepage:
Size: 48.4 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: CODEOWNERS

Awesome Lists containing this project

README

          # Pydatamocker

Create lots of rich mock data.

![Build](https://github.com/msoukharev/pydatamocker/actions/workflows/python_package.yaml/badge.svg)

## About

Pydatamocker can generate relational data with values of various data types and distributions using random generation and sampling

### Datasets

The package bundles a few datasets in `.pkl` files. They can be sampled by specifying `dataset`.

| Dataset | Description | Count |

|:-------:|:-----------:|:-----:|

| first_name | Collection of given names | ~ 20'000 |

| last_name | Collection of family names | ~ 20'000 |

### Code example

```python

from pydatamocker import Table, Schema

sch = Schema()

users = sch.add(Table('Users', 1_000))

users.field('FirstName', { 'dataset': { 'name': 'first_name' } })

users.field('LastName', { 'dataset': {

    'name': 'last_name',

    'restrict': 3

}})

users.field('Age', { 'binomial': { 'n': 40, 'p': 0.7 } })

users.field('SpouseAge', { 'normal': { 'mean': 40, 'std': 10 } })

users.field('Status', { 'enum': { 'values': ['Active', 'Inactive', 'Pending confirmation'],

    'counts': [23, 69, 3], 'shuffle': True } })

users.field('Bucket', { 'enum': { 'values': ['1', '2', '3', '4', '5', '6'], 'shuffle': False } })

users.field('Grade', { 'enum': { 'values': [1.5, 2.7, 3.3, 4], 'shuffle': True } })

users.field('LastLogin', { 'uniform': { 'min': '2015-02-13T8:10:30', 'max': '2021-10-30T19:30:43' } })

users.field('RegisteredDate', { 'uniform': { 'min': '2015-02-13', 'max': '2021-10-30', 'format': 'date' } })

users.field('ConstField', { 'const': 10, 'filters': [

    {

        'multiply': {

            'const': 20

        }

    },

    {

        'subtract': {

            'normal': {

                'mean': 40,

                'std': 10

            }

        }

    },

    {

        'floor': 10

    },

    {

        'round': 1

    },

    {

        'multiply': {

            'ref': (users._name, 'Grade')

        }

    }

]})

sch.sample()

df = users.getData()

df.head(5)

```

### CLI

You can create data through CLI by running the following command. To generate multiple tables, the

argument to the `--in` command must be a directory containing the build configs.

```text

python -m pydatamocker --in input-file-or-directory-path --out output-directory-path

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/msoukharev/pydatamocker

Awesome Lists containing this project

README