{"id":16549580,"url":"https://github.com/uiur/demae","last_synced_at":"2025-12-13T19:42:34.996Z","repository":{"id":62567586,"uuid":"103126367","full_name":"uiur/demae","owner":"uiur","description":"A framework to build a machine learning batch","archived":false,"fork":false,"pushed_at":"2017-10-20T09:26:21.000Z","size":14,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-01T17:01:53.140Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uiur.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-09-11T11:14:04.000Z","updated_at":"2024-09-04T05:06:45.000Z","dependencies_parsed_at":"2022-11-03T17:00:59.051Z","dependency_job_id":null,"html_url":"https://github.com/uiur/demae","commit_stats":null,"previous_names":["uiureo/demae"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uiur%2Fdemae","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uiur%2Fdemae/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uiur%2Fdemae/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uiur%2Fdemae/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uiur","download_url":"https://codeload.github.com/uiur/demae/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238681882,"owners_count":19512862,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T19:29:56.131Z","updated_at":"2025-10-28T16:32:10.746Z","avatar_url":"https://github.com/uiur.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# demae\n[![Build Status](https://travis-ci.org/uiureo/demae.svg?branch=master)](https://travis-ci.org/uiureo/demae)\n[![PyPI version](https://badge.fury.io/py/demae.svg)](https://badge.fury.io/py/demae)\n\ndemae is a framework to build a batch program using Machine Learning.\nMakes it easier to deploy your ML model into production.\n\nMain features:\n\n- handle data source and destination easily\n- support parallel execution\n- print stats of execution time\n\nThis example is to fetch input from S3, transform it and push output to S3.\n\n`S3 -\u003e transform -\u003e S3`\n\n```python\nfrom demae import Base\nfrom demae.source import S3Source\nfrom demae.dest import S3Dest\n\n\"\"\"\nrequires `source`, `dest` and `transform` to be implemented\n\"\"\"\nclass Batch(Base):\n    \"\"\"\n    Set data source\n\n    This reads input from files with the prefix in `redshift-copy-buffer` bucket.\n    Input files must be in tsv format.\n    \"\"\"\n    source = S3Source(\n        bucket='bucket',\n        prefix='{env}/example_input/{date}/example_input.tsv',\n        columns=['id', 'text'],\n    )\n\n    \"\"\"\n    Specify output destination in s3.\n\n    key_map : a function (input key -\u003e output key)\n\n    This example maps input:\n      from: development/example_input/2017-12-24/example_input.0000_part_00.gz\n      to:   development/example_output/2017-12-24/example_output.0000_part_00.gz\n    \"\"\"\n    dest = S3Dest(\n        key_map=lambda key: re.sub('_input', '_output', key)\n    )\n\n    \"\"\"\n    Write your inference code here\n    data : pandas DataFrame\n        columns is automatically set from source.columns.\n    must returns array-like objects (DataFrame, numpy array or list)\n    \"\"\"\n    def transform(self, data):\n        output = predict(data[:, 'text'])\n        return output\n\n```\n\nTo run:\n\n```python\nbatch = Batch(\n  env='development',\n  date='2017-02-13'\n)\nbatch.run()\n```\n\n## Parallel execution\nParallel execution is supported by providing environment variables that are specified in `parallel_env`.\n\nA batch handles only a corresponding part of input.\n\n\n```python\nsource = S3Source(\n    bucket='bucket',\n    prefix='development/foo/foo.tsv',\n    columns=['id', 'text'],\n    parallel_env={'index': 'PARALLEL_INDEX', 'size': 'PARALLEL_SIZE'},\n)\n```\n\nFor example,\ninput files: `input.tsv.part0` `input.tsv.part1` `input.tsv.part2`\n\nWhen `PARALLEL_INDEX=1` and `PARALLEL_SIZE=3` are provided, it handles only `input.tsv.part1`.\n\n\n## License\n\nMIT\n\nThis software is developed while working for [Cookpad Inc.](https://github.com/cookpad)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuiur%2Fdemae","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuiur%2Fdemae","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuiur%2Fdemae/lists"}