https://github.com/alteryx/compose
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
https://github.com/alteryx/compose
ai automl data-labeling data-science labeling labeling-tool machine-learning prediction-engineering prediction-problem training-data
Last synced: 23 days ago
JSON representation
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
- Host: GitHub
- URL: https://github.com/alteryx/compose
- Owner: alteryx
- License: bsd-3-clause
- Created: 2018-12-28T15:45:37.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2025-03-31T22:05:37.000Z (about 1 month ago)
- Last Synced: 2025-04-08T01:34:09.403Z (about 1 month ago)
- Topics: ai, automl, data-labeling, data-science, labeling, labeling-tool, machine-learning, prediction-engineering, prediction-problem, training-data
- Language: Python
- Homepage: https://compose.alteryx.com
- Size: 5.09 MB
- Stars: 505
- Watchers: 26
- Forks: 47
- Open Issues: 23
-
Metadata Files:
- Readme: README.md
- Contributing: contributing.md
- License: LICENSE
Awesome Lists containing this project
README
"Build better training examples in a fraction of the time."
[Compose](https://compose.alteryx.com) is a machine learning tool for automated prediction engineering. It allows you to structure prediction problems and generate labels for supervised learning. An end user defines an outcome of interest by writing a *labeling function*, then runs a search to automatically extract training examples from historical data. Its result is then provided to [Featuretools](https://docs.featuretools.com/) for automated feature engineering and subsequently to [EvalML](https://evalml.alteryx.com/) for automated machine learning. The workflow of an applied machine learning engineer then becomes:
By automating the early stage of the machine learning pipeline, our end user can easily define a task and solve it. See the [documentation](https://compose.alteryx.com) for more information.
## Installation
Install with pip```
python -m pip install composeml
```or from the Conda-forge channel on [conda](https://anaconda.org/conda-forge/composeml):
```
conda install -c conda-forge composeml
```### Add-ons
**Update checker** - Receive automatic notifications of new Compose releases
```
python -m pip install "composeml[update_checker]"
```## Example
> Will a customer spend more than 300 in the next hour of transactions?In this example, we automatically generate new training examples from a historical dataset of transactions.
```python
import composeml as cp
df = cp.demos.load_transactions()
df = df[df.columns[:7]]
df.head()
```
transaction_id
session_id
transaction_time
product_id
amount
customer_id
device
298
1
2014-01-01 00:00:00
5
127.64
2
desktop
10
1
2014-01-01 00:09:45
5
57.39
2
desktop
495
1
2014-01-01 00:14:05
5
69.45
2
desktop
460
10
2014-01-01 02:33:50
5
123.19
2
tablet
302
10
2014-01-01 02:37:05
5
64.47
2
tablet
First, we represent the prediction problem with a labeling function and a label maker.
```python
def total_spent(ds):
return ds['amount'].sum()label_maker = cp.LabelMaker(
target_dataframe_index="customer_id",
time_index="transaction_time",
labeling_function=total_spent,
window_size="1h",
)
```Then, we run a search to automatically generate the training examples.
```python
label_times = label_maker.search(
df.sort_values('transaction_time'),
num_examples_per_instance=2,
minimum_data='2014-01-01',
drop_empty=False,
verbose=False,
)label_times = label_times.threshold(300)
label_times.head()
```
customer_id
time
total_spent
1
2014-01-01 00:00:00
True
1
2014-01-01 01:00:00
True
2
2014-01-01 00:00:00
False
2
2014-01-01 01:00:00
False
3
2014-01-01 00:00:00
False
We now have labels that are ready to use in [Featuretools](https://docs.featuretools.com/) to generate features.
## Support
The Innovation Labs open source community is happy to provide support to users of Compose. Project support can be found in three places depending on the type of question:
1. For usage questions, use [Stack Overflow](https://stackoverflow.com/questions/tagged/compose-ml) with the `composeml` tag.
2. For bugs, issues, or feature requests start a Github [issue](https://github.com/alteryx/compose/issues/new).
3. For discussion regarding development on the core library, use [Slack](https://join.slack.com/t/alteryx-oss/shared_invite/zt-182tyvuxv-NzIn6eiCEf8TBziuKp0bNA).
4. For everything else, the core developers can be reached by email at [email protected]## Citing Compose
Compose is built upon a newly defined part of the machine learning process — prediction engineering. If you use Compose, please consider citing this paper:
James Max Kanter, Gillespie, Owen, Kalyan Veeramachaneni. [Label, Segment,Featurize: a cross domain framework for prediction engineering.](https://dai.lids.mit.edu/wp-content/uploads/2017/10/Pred_eng1.pdf) IEEE DSAA 2016.BibTeX entry:
```bibtex
@inproceedings{kanter2016label,
title={Label, segment, featurize: a cross domain framework for prediction engineering},
author={Kanter, James Max and Gillespie, Owen and Veeramachaneni, Kalyan},
booktitle={2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)},
pages={430--439},
year={2016},
organization={IEEE}
}
```## Acknowledgements
The open source development has been supported in part by DARPA's Data driven discovery of models program (D3M).
## Alteryx
**Compose** is an open source project maintained by [Alteryx](https://www.alteryx.com). We developed Compose to enable flexible definition of the machine learning task. To see the other open source projects we’re working on visit [Alteryx Open Source](https://www.alteryx.com/open-source). If building impactful data science pipelines is important to you or your business, please get in touch.