{"id":26258966,"url":"https://github.com/walkframe/datafactory","last_synced_at":"2025-07-06T01:03:05.429Z","repository":{"id":62566577,"uuid":"38204836","full_name":"walkframe/datafactory","owner":"walkframe","description":"datafactory makes flexible data.","archived":false,"fork":false,"pushed_at":"2021-10-14T15:49:25.000Z","size":37,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-06-02T01:12:01.078Z","etag":null,"topics":["csv","demodata","factory","json","python","test-data","test-data-generator","testdata"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/walkframe.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-06-28T15:59:03.000Z","updated_at":"2025-04-08T17:10:33.000Z","dependencies_parsed_at":"2022-11-03T17:47:49.110Z","dependency_job_id":null,"html_url":"https://github.com/walkframe/datafactory","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/walkframe/datafactory","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walkframe%2Fdatafactory","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walkframe%2Fdatafactory/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walkframe%2Fdatafactory/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walkframe%2Fdatafactory/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/walkframe","download_url":"https://codeload.github.com/walkframe/datafactory/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/walkframe%2Fdatafactory/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263271611,"owners_count":23440404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","demodata","factory","json","python","test-data","test-data-generator","testdata"],"created_at":"2025-03-13T22:15:27.198Z","updated_at":"2025-07-06T01:03:05.366Z","avatar_url":"https://github.com/walkframe.png","language":"Python","readme":".. image:: https://badge.fury.io/py/datafactory.svg\n  :target: https://badge.fury.io/py/datafactory\n\n.. image:: https://github.com/walkframe/datafactory/workflows/master/badge.svg\n  :target: https://github.com/walkframe/datafactory/actions\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n  :target: https://github.com/python/black\n\n.. image:: https://codecov.io/gh/walkframe/datafactory/branch/master/graph/badge.svg\n  :target: https://codecov.io/gh/walkframe/datafactory\n\n.. image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg\n  :target: https://opensource.org/licenses/Apache-2.0\n\n\nOverview\n===========\n`datafactory` makes **flexible** data according to the given rules.\n\nThe features are divided into `field`, `model`, `container`, and `formatter`.\nIf you compare it to a DB, fields are columns, models are records, and containers are tables.\n\nThe great thing about the `datafactory` is its flexibility in type specification. \nContainers can also be nested.\n\n`formatter` supports data formatting and file output.\n\nRequirements\n============\n\n- Python 3.5 or later.\n\n\nInstall\n=======\n\n.. code-block:: sh\n\n  $ pip install datafactory\n\n\nUsage\n=====\n\nBasic Example\n-------------\n\n.. code-block:: python3\n\n  In [1]: import datafactory\n \n  In [2]: model = datafactory.Model({\n     ...:     'id': datafactory.IncrementField(),\n     ...:     'x': datafactory.CycleField(['a', 'b', 'c']),\n     ...:     # BLANK will be omit.\n     ...:     'option': datafactory.ChoiceField([True, False, datafactory.BLANK]),\n     ...: })\n \n  In [3]: container = datafactory.Container(model, 5, render=True)\n \n  In [4]: container\n  Out[4]:\n  [{'id': 1, 'x': 'a'},\n   {'id': 2, 'x': 'b', 'option': False},\n   {'id': 3, 'x': 'c', 'option': True},\n   {'id': 4, 'x': 'a'},\n   {'id': 5, 'x': 'b'}]\n \n  # specify rewrite=True, if file already exists.\n  In [5]: datafactory.JsonFormatter(container).write('/tmp/test.json', rewrite=True)\n \n  In [6]: !cat /tmp/test.json\n  [\n   {\n    \"x\": \"a\",\n    \"id\": 1\n   },\n   {\n    \"x\": \"b\",\n    \"id\": 2,\n    \"option\": false\n   },\n   {\n    \"x\": \"c\",\n    \"id\": 3,\n    \"option\": true\n   },\n   {\n    \"x\": \"a\",\n    \"id\": 4\n   },\n   {\n    \"x\": \"b\",\n    \"id\": 5\n   }\n  ]\n\nTSV Example\n-----------\n\n.. code-block:: python3\n\n  In [1]: import datafactory\n \n  In [2]: model = datafactory.ListModel([\n     ...:     datafactory.IncrementField(start=10, step=5),\n     ...:     datafactory.HashOfField(2, 'md5'),  # hashing value of the third column.\n     ...:     datafactory.ChoiceField(['foo', 'bar', 'baz']),\n     ...:     datafactory.CycleField(range(0, 30, 10)),\n     ...: ]).ordering(2)  # render at first index:2(third column)\n \n  # IterContainer is saving memory, because generating an element each time.\n  In [3]: container = datafactory.IterContainer(model, 10)  # repeat 10 times.\n \n  In [4]: datafactory.CsvFormatter(\n     ...:     container,\n     ...:     delimiter='\\t',\n     ...:     header=['id', 'hash-of-name', 'name', 'value']\n     ...: ).write('/tmp/test.csv', rewrite=True)\n \n  In [5]: !cat /tmp/test.csv\n  id\thash-of-name\tname\tvalue\n  10\tacbd18db4cc2f85cedef654fccc4a4d8\tfoo\t0\n  15\tacbd18db4cc2f85cedef654fccc4a4d8\tfoo\t10\n  20\t73feffa4b7f6bb68e44cf984c85f6e88\tbaz\t20\n  25\tacbd18db4cc2f85cedef654fccc4a4d8\tfoo\t0\n  30\tacbd18db4cc2f85cedef654fccc4a4d8\tfoo\t10\n  35\t73feffa4b7f6bb68e44cf984c85f6e88\tbaz\t20\n  40\t73feffa4b7f6bb68e44cf984c85f6e88\tbaz\t0\n  45\t73feffa4b7f6bb68e44cf984c85f6e88\tbaz\t10\n  50\t37b51d194a7513e45b56f6524f2d51f2\tbar\t20\n  55\t37b51d194a7513e45b56f6524f2d51f2\tbar\t0\n\nCustom Example\n--------------\nIf object is callable, it stores execution result.\n\nModel\n~~~~~\n\n.. code-block:: python3\n\n In [1]: import datafactory\n\n In [2]: def square(k, i):\n    ...:     return k * i\n    ...:\n\n In [3]: container = datafactory.DictContainer(square)\n\n In [4]: container(['a', 'b', 'c', 'd', 'e'])\n Out[4]: {'a': '', 'b': 'b', 'c': 'cc', 'd': 'ddd', 'e': 'eeee'}\n\n\nField\n~~~~~~~\n\n.. code-block:: python3\n\n In [1]: import datafactory\n\n In [2]: model = datafactory.Model({\n    ...:    'col1': (lambda r, i: i),\n    ...:    'col2': (lambda r: r['col1'] + 1),\n    ...:    'col3': (lambda r: r['col2'] * 2),\n    ...:    'col4': 100,  # fixed value\n    ...: }).ordering('col1', 'col2', 'col3')\n\n In [3]: container = datafactory.ListContainer(model)\n\n In [4]: container(4)\n Out[4]:\n [{'col1': 0, 'col2': 1, 'col3': 2, 'col4': 100},\n  {'col1': 1, 'col2': 2, 'col3': 4, 'col4': 100},\n  {'col1': 2, 'col2': 3, 'col3': 6, 'col4': 100},\n  {'col1': 3, 'col2': 4, 'col3': 8, 'col4': 100}]\n\n\nLimited number of element Example\n---------------------------------\n\n.. code-block:: python3\n\n In [1]: import datafactory\n\n In [2]: model = datafactory.Model({\n    ...:     # x: a is 1times limited. / b is 2times limited. / c is 3times limited.\n    ...:     'x': datafactory.PickoutField({'a': 1, 'b': 2, 'c': 3}, missing=None),\n    ...:     # y: a is 2times limited. / b and c is 1times limited.\n    ...:     'y': datafactory.PickoutField(['a', 'a', 'b', 'c'], missing='*'),\n    ...:     # z: a and b can't be selected. / c is 5times limited.\n    ...:     'z': datafactory.PickoutField(['c']*5, missing=None),\n    ...: })\n\n In [3]: container = datafactory.ListContainer(model)\n\n In [4]: container(6)\n Out[4]:\n [{'x': 'a', 'y': 'a', 'z': 'c'},\n  {'x': 'c', 'y': 'b', 'z': 'c'},\n  {'x': 'c', 'y': 'a', 'z': 'c'},\n  {'x': 'b', 'y': 'c', 'z': 'c'},\n  {'x': 'c', 'y': '*', 'z': 'c'},\n  {'x': 'b', 'y': '*', 'z': None}]\n\n\nCombination Example\n-------------------\nTo generate the testdata that combines multiple elements\ncan be achieved by using the repeat-argument of `CycleField` and `SequenceField`.\n\n.. code-block:: python3\n\n In [1]: import datafactory\n\n In [2]: l0 = ['a', 'b']\n\n In [3]: l1 = ['a', 'b', 'c']\n\n In [4]: l2 = ['a', 'b', 'c', 'd']\n\n In [5]: model = datafactory.ListModel([\n    ...:     datafactory.SequenceField(l0, repeat=len(l1)*len(l2), missing=datafactory.ESCAPE),\n    ...:     datafactory.CycleField(l1, repeat=len(l2)),\n    ...:     datafactory.CycleField(l2),\n    ...: ])\n\n In [6]: container = datafactory.Container(model)\n\n # by specifying the ESCAPE to missing-argument\n # automatically detect end of elements and escape before reaching 10000.\n In [7]: container(10000)\n Out[7]:\n [['a', 'a', 'a'],\n  ['a', 'a', 'b'],\n  ['a', 'a', 'c'],\n  ['a', 'a', 'd'],\n  ['a', 'b', 'a'],\n  ['a', 'b', 'b'],\n  ['a', 'b', 'c'],\n  ['a', 'b', 'd'],\n  ['a', 'c', 'a'],\n  ['a', 'c', 'b'],\n  ['a', 'c', 'c'],\n  ['a', 'c', 'd'],\n  ['b', 'a', 'a'],\n  ['b', 'a', 'b'],\n  ['b', 'a', 'c'],\n  ['b', 'a', 'd'],\n  ['b', 'b', 'a'],\n  ['b', 'b', 'b'],\n  ['b', 'b', 'c'],\n  ['b', 'b', 'd'],\n  ['b', 'c', 'a'],\n  ['b', 'c', 'b'],\n  ['b', 'c', 'c'],\n  ['b', 'c', 'd']]\n\nnested example\n--------------\n\n.. code-block:: python3\n\n In [1]: import datafactory\n\n In [2]: model = datafactory.Model({\n    ...:     'a': datafactory.ListModel([\n    ...:         datafactory.CycleField(['b', 'c']),\n    ...:         datafactory.CycleField(['d', 'e']),\n    ...:     ]),\n    ...:     datafactory.ChoiceField(['f', 'g', 'h']): datafactory.DictContainer(lambda x: x * 2, 5)\n    ...: })\n\n In [3]: datafactory.Container(model, 10, render=True)\n Out[3]:\n [{'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['c', 'e'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['c', 'e'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['b', 'd'], 'f': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['b', 'd'], 'g': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['b', 'd'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}},\n  {'a': ['c', 'e'], 'h': {0: 0, 1: 2, 2: 4, 3: 6, 4: 8}}]\n\n- `b` and `c` appear at first of a list in `a` key alternatively.\n- `d` and `e` appear at second of a list in `a` key alternatively.\n\ndatetime Utility\n----------------\n\nchoice\n~~~~~~\n\nrandom choice between start and end.\n\n.. code-block:: python3\n\n In [1]: from datafactory.utils.datetime import choice\n\n\n In [2]: choice(1988, '2015-11-11T11:11:11.111111')\n Out[2]: datetime.datetime(2009, 11, 30, 23, 25, 43, 240031)\n\n # tuple: datetime(*tuple), dict: datetime(**dict)\n In [3]: choice((1988, 5, 22), {'year': 2015, 'month': 11, 'day': 11})\n Out[3]: datetime.datetime(1996, 7, 1, 11, 14, 59, 314809)\n\n In [4]: from datetime import datetime, date\n\n In [5]: choice(date(1988, 5, 22), datetime(2015, 11, 11, 11, 11, 11))\n Out[5]: datetime.datetime(2011, 3, 23, 19, 39, 14, 476901)\n\ngenerator\n~~~~~~~~~\n\ngenerator that generate the datetime object at regular intervals.\n\n.. code-block:: python3\n\n In [1]: from datetime import timedelta\n In [2]: from datafactory.utils.datetime import generator\n\n # if you omit end-argument, then it creates an object infinitely.\n In [3]: g = generator(start=2015, interval=timedelta(days=1, hours=12))\n\n In [4]: next(g)\n Out[4]: datetime.datetime(2015, 1, 1, 0, 0)\n\n In [5]: next(g)\n Out[5]: datetime.datetime(2015, 1, 2, 12, 0)\n\n In [6]: next(g)\n Out[6]: datetime.datetime(2015, 1, 4, 0, 0)\n\n In [7]: next(g)\n Out[7]: datetime.datetime(2015, 1, 5, 12, 0)\n\nrange\n~~~~~\n\ngenerate list object that includes regularly generated datetime objects element.\n\n.. code-block:: python3\n\n In [1]: from datetime import timedelta\n In [2]: from datafactory.utils.datetime import range\n\n In [3]: range(2015, '2015/2/1')\n Out[3]:\n [datetime.datetime(2015, 1, 1, 0, 0),\n  datetime.datetime(2015, 1, 2, 0, 0),\n  datetime.datetime(2015, 1, 3, 0, 0),\n  datetime.datetime(2015, 1, 4, 0, 0),\n  datetime.datetime(2015, 1, 5, 0, 0),\n  datetime.datetime(2015, 1, 6, 0, 0),\n  datetime.datetime(2015, 1, 7, 0, 0),\n  datetime.datetime(2015, 1, 8, 0, 0),\n  datetime.datetime(2015, 1, 9, 0, 0),\n  datetime.datetime(2015, 1, 10, 0, 0),\n  datetime.datetime(2015, 1, 11, 0, 0),\n  datetime.datetime(2015, 1, 12, 0, 0),\n  datetime.datetime(2015, 1, 13, 0, 0),\n  datetime.datetime(2015, 1, 14, 0, 0),\n  datetime.datetime(2015, 1, 15, 0, 0),\n  datetime.datetime(2015, 1, 16, 0, 0),\n  datetime.datetime(2015, 1, 17, 0, 0),\n  datetime.datetime(2015, 1, 18, 0, 0),\n  datetime.datetime(2015, 1, 19, 0, 0),\n  datetime.datetime(2015, 1, 20, 0, 0),\n  datetime.datetime(2015, 1, 21, 0, 0),\n  datetime.datetime(2015, 1, 22, 0, 0),\n  datetime.datetime(2015, 1, 23, 0, 0),\n  datetime.datetime(2015, 1, 24, 0, 0),\n  datetime.datetime(2015, 1, 25, 0, 0),\n  datetime.datetime(2015, 1, 26, 0, 0),\n  datetime.datetime(2015, 1, 27, 0, 0),\n  datetime.datetime(2015, 1, 28, 0, 0),\n  datetime.datetime(2015, 1, 29, 0, 0),\n  datetime.datetime(2015, 1, 30, 0, 0),\n  datetime.datetime(2015, 1, 31, 0, 0),\n  datetime.datetime(2015, 2, 1, 0, 0)]\n\n # +-3 hour noise, +5 minute noise\n In [4]: range(2015, '2015-01-15', hours=3, minutes=(0, 5))\n Out[4]:\n [datetime.datetime(2015, 1, 1, 3, 1),\n  datetime.datetime(2015, 1, 2, 0, 3),\n  datetime.datetime(2015, 1, 3, 2, 0),\n  datetime.datetime(2015, 1, 3, 22, 2),\n  datetime.datetime(2015, 1, 4, 22, 3),\n  datetime.datetime(2015, 1, 6, 0, 2),\n  datetime.datetime(2015, 1, 7, 0, 4),\n  datetime.datetime(2015, 1, 8, 0, 4),\n  datetime.datetime(2015, 1, 8, 21, 3),\n  datetime.datetime(2015, 1, 9, 22, 0),\n  datetime.datetime(2015, 1, 11, 0, 0),\n  datetime.datetime(2015, 1, 11, 22, 1),\n  datetime.datetime(2015, 1, 12, 22, 5),\n  datetime.datetime(2015, 1, 14, 3, 0),\n  datetime.datetime(2015, 1, 15, 2, 5)]\n\n # it is able to specify minus direction as interval.\n In [5]: range(start='2015-5-22', end='2015-04-22', interval=timedelta(days=-1))\n Out[5]:\n [datetime.datetime(2015, 5, 22, 0, 0),\n  datetime.datetime(2015, 5, 21, 0, 0),\n  datetime.datetime(2015, 5, 20, 0, 0),\n  datetime.datetime(2015, 5, 19, 0, 0),\n  datetime.datetime(2015, 5, 18, 0, 0),\n  datetime.datetime(2015, 5, 17, 0, 0),\n  datetime.datetime(2015, 5, 16, 0, 0),\n  datetime.datetime(2015, 5, 15, 0, 0),\n  datetime.datetime(2015, 5, 14, 0, 0),\n  datetime.datetime(2015, 5, 13, 0, 0),\n  datetime.datetime(2015, 5, 12, 0, 0),\n  datetime.datetime(2015, 5, 11, 0, 0),\n  datetime.datetime(2015, 5, 10, 0, 0),\n  datetime.datetime(2015, 5, 9, 0, 0),\n  datetime.datetime(2015, 5, 8, 0, 0),\n  datetime.datetime(2015, 5, 7, 0, 0),\n  datetime.datetime(2015, 5, 6, 0, 0),\n  datetime.datetime(2015, 5, 5, 0, 0),\n  datetime.datetime(2015, 5, 4, 0, 0),\n  datetime.datetime(2015, 5, 3, 0, 0),\n  datetime.datetime(2015, 5, 2, 0, 0),\n  datetime.datetime(2015, 5, 1, 0, 0),\n  datetime.datetime(2015, 4, 30, 0, 0),\n  datetime.datetime(2015, 4, 29, 0, 0),\n  datetime.datetime(2015, 4, 28, 0, 0),\n  datetime.datetime(2015, 4, 27, 0, 0),\n  datetime.datetime(2015, 4, 26, 0, 0),\n  datetime.datetime(2015, 4, 25, 0, 0),\n  datetime.datetime(2015, 4, 24, 0, 0),\n  datetime.datetime(2015, 4, 23, 0, 0),\n  datetime.datetime(2015, 4, 22, 0, 0)]\n\ncommon\n~~~~~~\n\n**noise**\n\nIt is possible to specify the gap between the actual time as noise parameters.\nallow to specify the noise parameters are “datetimes.generator” and “datetimes.range” functions.\n\n`**noise` is specified in the kwargs format and they are not required.\n\nThe available keys are same with timedelta-args.\n\n- days\n- hours\n- minute\n- seconds\n- microseconds\n\n**argtype**\n\nThe acceptable arguments as the other than datetime type are the following.\n\n:int: It is evaluated as a `year`.\n:str: It is parsed as `datetime` from the numeric part of the string.\n:tuple: It will be passed into `datetime` args.\n:dict: It will be passed into `datetime` kwargs.\n:date: It will be converted `datetime` type.\n\nhistory\n-------\n\n1.0.x\n~~~~~\nInitialize.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwalkframe%2Fdatafactory","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwalkframe%2Fdatafactory","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwalkframe%2Fdatafactory/lists"}