https://github.com/msoukharev/pydatamocker
Powerful data mocker for python
https://github.com/msoukharev/pydatamocker
data-mocking python
Last synced: 5 months ago
JSON representation
Powerful data mocker for python
- Host: GitHub
- URL: https://github.com/msoukharev/pydatamocker
- Owner: msoukharev
- License: mit
- Created: 2021-07-25T03:19:14.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2022-06-20T02:39:05.000Z (almost 4 years ago)
- Last Synced: 2025-09-29T05:20:30.153Z (9 months ago)
- Topics: data-mocking, python
- Language: Python
- Homepage:
- Size: 48.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: CODEOWNERS
Awesome Lists containing this project
README
# Pydatamocker
Create lots of rich mock data.

## About
Pydatamocker can generate relational data with values of various data types and distributions using random generation and sampling
### Datasets
The package bundles a few datasets in `.pkl` files. They can be sampled by specifying `dataset`.
| Dataset | Description | Count |
|:-------:|:-----------:|:-----:|
| first_name | Collection of given names | ~ 20'000 |
| last_name | Collection of family names | ~ 20'000 |
### Code example
```python
from pydatamocker import Table, Schema
sch = Schema()
users = sch.add(Table('Users', 1_000))
users.field('FirstName', { 'dataset': { 'name': 'first_name' } })
users.field('LastName', { 'dataset': {
'name': 'last_name',
'restrict': 3
}})
users.field('Age', { 'binomial': { 'n': 40, 'p': 0.7 } })
users.field('SpouseAge', { 'normal': { 'mean': 40, 'std': 10 } })
users.field('Status', { 'enum': { 'values': ['Active', 'Inactive', 'Pending confirmation'],
'counts': [23, 69, 3], 'shuffle': True } })
users.field('Bucket', { 'enum': { 'values': ['1', '2', '3', '4', '5', '6'], 'shuffle': False } })
users.field('Grade', { 'enum': { 'values': [1.5, 2.7, 3.3, 4], 'shuffle': True } })
users.field('LastLogin', { 'uniform': { 'min': '2015-02-13T8:10:30', 'max': '2021-10-30T19:30:43' } })
users.field('RegisteredDate', { 'uniform': { 'min': '2015-02-13', 'max': '2021-10-30', 'format': 'date' } })
users.field('ConstField', { 'const': 10, 'filters': [
{
'multiply': {
'const': 20
}
},
{
'subtract': {
'normal': {
'mean': 40,
'std': 10
}
}
},
{
'floor': 10
},
{
'round': 1
},
{
'multiply': {
'ref': (users._name, 'Grade')
}
}
]})
sch.sample()
df = users.getData()
df.head(5)
```
### CLI
You can create data through CLI by running the following command. To generate multiple tables, the
argument to the `--in` command must be a directory containing the build configs.
```text
python -m pydatamocker --in input-file-or-directory-path --out output-directory-path
```