{"id":26412787,"url":"https://github.com/alteryx/compose","last_synced_at":"2025-05-14T18:02:30.881Z","repository":{"id":38080941,"uuid":"163425829","full_name":"alteryx/compose","owner":"alteryx","description":"A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.","archived":false,"fork":false,"pushed_at":"2025-03-31T22:05:37.000Z","size":5338,"stargazers_count":505,"open_issues_count":23,"forks_count":47,"subscribers_count":26,"default_branch":"main","last_synced_at":"2025-05-14T18:02:08.351Z","etag":null,"topics":["ai","automl","data-labeling","data-science","labeling","labeling-tool","machine-learning","prediction-engineering","prediction-problem","training-data"],"latest_commit_sha":null,"homepage":"https://compose.alteryx.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alteryx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-28T15:45:37.000Z","updated_at":"2025-03-27T15:21:23.000Z","dependencies_parsed_at":"2023-02-10T22:16:16.264Z","dependency_job_id":"b4d7b574-a630-483e-aff4-100cb21b182a","html_url":"https://github.com/alteryx/compose","commit_stats":{"total_commits":330,"total_committers":14,"mean_commits":"23.571428571428573","dds":0.6787878787878787,"last_synced_commit":"709a7b70e514da5e62b37c364c32821fd412650b"},"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alteryx%2Fcompose","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alteryx%2Fcompose/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alteryx%2Fcompose/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alteryx%2Fcompose/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alteryx","download_url":"https://codeload.github.com/alteryx/compose/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254198453,"owners_count":22030964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","automl","data-labeling","data-science","labeling","labeling-tool","machine-learning","prediction-engineering","prediction-problem","training-data"],"created_at":"2025-03-17T22:09:18.452Z","updated_at":"2025-05-14T18:02:30.863Z","avatar_url":"https://github.com/alteryx.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\u003cimg width=50% src=\"https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/compose.png\" alt=\"Compose\" /\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003ci\u003e\"Build better training examples in a fraction of the time.\"\u003c/i\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/alteryx/compose/actions?query=workflow%3ATests\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://github.com/alteryx/compose/workflows/Tests/badge.svg\" alt=\"Tests\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://codecov.io/gh/alteryx/compose\"\u003e\n        \u003cimg src=\"https://codecov.io/gh/alteryx/compose/branch/main/graph/badge.svg?token=mDz4ueTUEO\"/\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://compose.alteryx.com/en/stable/?badge=stable\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://readthedocs.com/projects/feature-labs-inc-compose/badge/?version=stable\u0026token=5c3ace685cdb6e10eb67828a4dc74d09b20bb842980c8ee9eb4e9ed168d05b00\"\n            alt=\"ReadTheDocs\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://badge.fury.io/py/composeml\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://badge.fury.io/py/composeml.svg?maxAge=2592000\" alt=\"PyPI Version\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://stackoverflow.com/questions/tagged/compose-ml\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/questions-on_stackoverflow-blue.svg?\" alt=\"StackOverflow\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pepy.tech/project/composeml\" target=\"_blank\"\u003e\n        \u003cimg src=\"https://pepy.tech/badge/composeml/month\" alt=\"PyPI Downloads\" /\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\u003chr\u003e\n\n[Compose](https://compose.alteryx.com) is a machine learning tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning. An end user defines an outcome of interest by writing a *labeling function*, then runs a search to automatically extract training examples from historical data. Its result is then provided to [Featuretools](https://docs.featuretools.com/) for automated feature engineering and subsequently to [EvalML](https://evalml.alteryx.com/) for automated machine learning. The workflow of an applied machine learning engineer then becomes:\n\n\u003cbr\u003e\u003cp align=\"center\"\u003e\u003cimg width=90% src=\"https://raw.githubusercontent.com/alteryx/compose/main/docs/source/images/workflow.png\" alt=\"Compose\" /\u003e\u003c/p\u003e\u003cbr\u003e\n\nBy automating the early stage of the machine learning pipeline, our end user can easily define a task and solve it. See the [documentation](https://compose.alteryx.com) for more information.\n\n## Installation\nInstall with pip\n\n```\npython -m pip install composeml\n```\n\nor from the Conda-forge channel on [conda](https://anaconda.org/conda-forge/composeml):\n\n```\nconda install -c conda-forge composeml\n```\n\n### Add-ons\n\n**Update checker** - Receive automatic notifications of new Compose releases\n\n```\npython -m pip install \"composeml[update_checker]\"\n```\n\n## Example\n\u003e Will a customer spend more than 300 in the next hour of transactions?\n\nIn this example, we automatically generate new training examples from a historical dataset of transactions.\n\n```python\nimport composeml as cp\ndf = cp.demos.load_transactions()\ndf = df[df.columns[:7]]\ndf.head()\n```\n\n\u003ctable border=\"0\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003etransaction_id\u003c/th\u003e\n      \u003cth\u003esession_id\u003c/th\u003e\n      \u003cth\u003etransaction_time\u003c/th\u003e\n      \u003cth\u003eproduct_id\u003c/th\u003e\n      \u003cth\u003eamount\u003c/th\u003e\n      \u003cth\u003ecustomer_id\u003c/th\u003e\n      \u003cth\u003edevice\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e298\u003c/td\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 00:00:00\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003e127.64\u003c/td\u003e\n      \u003ctd\u003e2\u003c/td\u003e\n      \u003ctd\u003edesktop\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e10\u003c/td\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 00:09:45\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003e57.39\u003c/td\u003e\n      \u003ctd\u003e2\u003c/td\u003e\n      \u003ctd\u003edesktop\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e495\u003c/td\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 00:14:05\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003e69.45\u003c/td\u003e\n      \u003ctd\u003e2\u003c/td\u003e\n      \u003ctd\u003edesktop\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e460\u003c/td\u003e\n      \u003ctd\u003e10\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 02:33:50\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003e123.19\u003c/td\u003e\n      \u003ctd\u003e2\u003c/td\u003e\n      \u003ctd\u003etablet\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e302\u003c/td\u003e\n      \u003ctd\u003e10\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 02:37:05\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003e64.47\u003c/td\u003e\n      \u003ctd\u003e2\u003c/td\u003e\n      \u003ctd\u003etablet\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nFirst, we represent the prediction problem with a labeling function and a label maker.\n\n```python\ndef total_spent(ds):\n    return ds['amount'].sum()\n\nlabel_maker = cp.LabelMaker(\n    target_dataframe_index=\"customer_id\",\n    time_index=\"transaction_time\",\n    labeling_function=total_spent,\n    window_size=\"1h\",\n)\n```\n\nThen, we run a search to automatically generate the training examples.\n\n```python\nlabel_times = label_maker.search(\n    df.sort_values('transaction_time'),\n    num_examples_per_instance=2,\n    minimum_data='2014-01-01',\n    drop_empty=False,\n    verbose=False,\n)\n\nlabel_times = label_times.threshold(300)\nlabel_times.head()\n```\n\n\u003ctable border=\"0\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003ecustomer_id\u003c/th\u003e\n      \u003cth\u003etime\u003c/th\u003e\n      \u003cth\u003etotal_spent\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 00:00:00\u003c/td\u003e\n      \u003ctd\u003eTrue\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e1\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 01:00:00\u003c/td\u003e\n      \u003ctd\u003eTrue\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e2\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 00:00:00\u003c/td\u003e\n      \u003ctd\u003eFalse\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e2\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 01:00:00\u003c/td\u003e\n      \u003ctd\u003eFalse\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003e3\u003c/td\u003e\n      \u003ctd\u003e2014-01-01 00:00:00\u003c/td\u003e\n      \u003ctd\u003eFalse\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nWe now have labels that are ready to use in [Featuretools](https://docs.featuretools.com/) to generate features.\n\n## Support\n\nThe Innovation Labs open source community is happy to provide support to users of Compose. Project support can be found in three places depending on the type of question:\n\n1. For usage questions, use [Stack Overflow](https://stackoverflow.com/questions/tagged/compose-ml) with the `composeml` tag.\n2. For bugs, issues, or feature requests start a Github [issue](https://github.com/alteryx/compose/issues/new).\n3. For discussion regarding development on the core library, use [Slack](https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA).\n4. For everything else, the core developers can be reached by email at open_source_support@alteryx.com\n\n## Citing Compose\nCompose is built upon a newly defined part of the machine learning process — prediction engineering. If you use Compose, please consider citing this paper:\nJames Max Kanter, Gillespie, Owen, Kalyan Veeramachaneni. [Label, Segment,Featurize: a cross domain framework for prediction engineering.](https://dai.lids.mit.edu/wp-content/uploads/2017/10/Pred_eng1.pdf) IEEE DSAA 2016.\n\nBibTeX entry:\n\n```bibtex\n@inproceedings{kanter2016label,\n  title={Label, segment, featurize: a cross domain framework for prediction engineering},\n  author={Kanter, James Max and Gillespie, Owen and Veeramachaneni, Kalyan},\n  booktitle={2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)},\n  pages={430--439},\n  year={2016},\n  organization={IEEE}\n}\n```\n\n## Acknowledgements \n\nThe open source development has been supported in part by DARPA's Data driven discovery of models program (D3M). \n\n## Alteryx\n\n**Compose** is an open source project maintained by [Alteryx](https://www.alteryx.com). We developed Compose to enable flexible definition of the machine learning task. To see the other open source projects we’re working on visit [Alteryx Open Source](https://www.alteryx.com/open-source). If building impactful data science pipelines is important to you or your business, please get in touch.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.alteryx.com/open-source\"\u003e\n    \u003cimg src=\"https://alteryx-oss-web-images.s3.amazonaws.com/OpenSource_Logo-01.png\" alt=\"Alteryx Open Source\" width=\"800\"/\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falteryx%2Fcompose","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falteryx%2Fcompose","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falteryx%2Fcompose/lists"}