{"id":14959088,"url":"https://github.com/fernandonieuwveldt/easyflow","last_synced_at":"2025-10-24T16:31:13.411Z","repository":{"id":37100223,"uuid":"304730836","full_name":"fernandonieuwveldt/easyflow","owner":"fernandonieuwveldt","description":"Easy Tensorflow/Keras feature Preprocessing Pipelines","archived":false,"fork":false,"pushed_at":"2024-04-16T13:42:30.000Z","size":872,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"develop","last_synced_at":"2025-01-31T03:12:27.453Z","etag":null,"topics":["keras-tensorflow","tensorflow","tensorflow-examples"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fernandonieuwveldt.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-16T20:21:34.000Z","updated_at":"2023-08-17T07:08:21.000Z","dependencies_parsed_at":"2024-09-22T08:30:26.265Z","dependency_job_id":null,"html_url":"https://github.com/fernandonieuwveldt/easyflow","commit_stats":{"total_commits":208,"total_committers":1,"mean_commits":208.0,"dds":0.0,"last_synced_commit":"e378970a5b38f5643e3e3d15e5dff5b88662b329"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandonieuwveldt%2Feasyflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandonieuwveldt%2Feasyflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandonieuwveldt%2Feasyflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fernandonieuwveldt%2Feasyflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fernandonieuwveldt","download_url":"https://codeload.github.com/fernandonieuwveldt/easyflow/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237999679,"owners_count":19399920,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["keras-tensorflow","tensorflow","tensorflow-examples"],"created_at":"2024-09-24T13:18:49.542Z","updated_at":"2025-10-24T16:31:12.763Z","avatar_url":"https://github.com/fernandonieuwveldt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EasyFlow: Keras Feature Preprocessing Pipelines\n\n![Keras logo](https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png)\n\n# Table of Contents\n1. [About EasyFlow](#about-EasyFlow)\n2. [Motivation](#motivation)\n3. [Installation](#installation)\n4. [Example](#example)\n5. [Tutorials](#tutorials)\n\n---\n\n## About EasyFlow\n\nThe `EasyFlow` package implements an interface similar to SKLearn's Pipeline API that contains easy feature preprocessing pipelines to build a full training and inference pipeline natively in Keras. All pipelines are implemented as Keras layers. \n\n---\n\n## Motivation\n\nThere is a need to have a similar interface for Keras that mimics the SKLearn Pipeline API such as `Pipeline`, `FeatureUnion` and `ColumnTransformer`, but natively in Keras as Keras layers. The usual design pattern especially for tabular data is to first do preprocessing with SKLearn and then feed the data to a Keras model. With `EasyFlow` you don't need to leave the Tensorflow/Keras ecosystem to build custom pipelines and your preprocessing pipeline is part of your model architecture.\n\nMain interfaces are:\n\n* `FeaturePreprocessor`: This layer applies feature preprocessing steps and returns a separate layer for each step supplied. This gives more flexibility to the user and if a more advance network architecture is needed. For example something like a Wide and Deep network.\n* `FeatureUnion`: This layer is similar to `FeaturePreprocessor` with an extra step that concatenates all layers into a single layer.\n\n---\n\n## Installation:\n\n```bash\npip install easy-tensorflow\n```\n\n---\n\n## Example\n\nLets look at a quick example:\n\n```python\nimport pandas as pd\nimport tensorflow as tf\nfrom tensorflow.keras.layers import Normalization, StringLookup, IntegerLookup\n\n# local imports\nfrom easyflow.data import TensorflowDataMapper\nfrom easyflow.preprocessing import FeatureUnion\nfrom easyflow.preprocessing import (\n    FeatureInputLayer,\n    StringToIntegerLookup,\n)\n\n```\n\n### Read in data and map as tf.data.Dataset\nUse the TensorflowDataMapper class to map pandas data frame to a tf.data.Dataset type.\n\n```python\nfile_url = \"http://storage.googleapis.com/download.tensorflow.org/data/heart.csv\"\ndataframe = pd.read_csv(file_url)\nlabels = dataframe.pop(\"target\")\n\nbatch_size = 32\ndataset_mapper = TensorflowDataMapper() \ndataset = dataset_mapper.map(dataframe, labels)\ntrain_data_set, val_data_set = dataset_mapper.split_data_set(dataset)\ntrain_data_set = train_data_set.batch(batch_size)\nval_data_set = val_data_set.batch(batch_size)\n```\n\n### Set constants\n```python\nNUMERICAL_FEATURES = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope']\nCATEGORICAL_FEATURES = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'ca']\n# thal is represented as a string\nSTRING_CATEGORICAL_FEATURES = ['thal']\n\ndtype_mapper = {\n    \"age\": tf.float32,\n    \"sex\": tf.float32,\n    \"cp\": tf.float32,\n    \"trestbps\": tf.float32,\n    \"chol\": tf.float32,\n    \"fbs\": tf.float32,\n    \"restecg\": tf.float32,\n    \"thalach\": tf.float32,\n    \"exang\": tf.float32,\n    \"oldpeak\": tf.float32,\n    \"slope\": tf.float32,\n    \"ca\": tf.float32,\n    \"thal\": tf.string,\n}\n```\n\n### Setup Preprocessing layer using FeatureUnion\n\nThis is the main part where `EasyFlow` fits in. We can now easily setup a feature preprocessing pipeline as a Keras layer with only a few lines of code.\n\n```python\nfeature_preprocessor_list = [\n    ('numeric_encoder', Normalization(), NUMERICAL_FEATURES),\n    ('categorical_encoder', IntegerLookup(output_mode='multi_hot'), CATEGORICAL_FEATURES),\n    ('string_encoder', StringToIntegerLookup(), STRING_CATEGORICAL_FEATURES)\n]\n\npreprocessor = FeatureUnion(feature_preprocessor_list)\npreprocessor.adapt(train_data_set)\n\nfeature_layer_inputs = FeatureInputLayer(dtype_mapper)\npreprocessing_layer = preprocessor(feature_layer_inputs)\n```\n\n### Set up network\n\n```python\n# setup simple network\nx = tf.keras.layers.Dense(128, activation=\"relu\")(preprocessing_layer)\nx = tf.keras.layers.Dropout(0.5)(x)\noutputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)\nmodel = tf.keras.Model(inputs=feature_layer_inputs, outputs=outputs)\nmodel.compile(\n    optimizer=tf.keras.optimizers.Adam(),\n    loss=tf.keras.losses.BinaryCrossentropy(),\n    metrics=[tf.keras.metrics.BinaryAccuracy(name='accuracy'), tf.keras.metrics.AUC(name='auc')])\n```\n\n### Fit model\n\n```python\nhistory=model.fit(train_data_set, validation_data=val_data_set, epochs=10)\n```\n\n---\n\n## Tutorials\n\n### Migrate an Sklearn training Pipeline to Tensorflow Keras: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fernandonieuwveldt/easyflow/blob/develop/examples/migrating_from_sklearn_to_keras/migrate_sklearn_pipeline.ipynb)\n* In this notebook we look at ways to migrate an Sklearn training pipeline to Tensorflow Keras. There might be a few reasons to move from Sklearn to Tensorflow.\n\n\n### Single Input Multiple Output Preprocessor: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fernandonieuwveldt/easyflow/blob/develop/examples/single_input_multiple_output/single_input_multiple_output_preprocessor.ipynb)\n* In this example we will show case how to apply different transformations and preprocessing steps on the same feature. What we have here is an example of a Single input Multiple output feature transformation scenario.\n\n### Preprocessing module quick intro: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fernandonieuwveldt/easyflow/blob/develop/examples/preprocessing_example/preprocessing_example.ipynb)\n* The `easyflow.preprocessing` module contains functionality similar to what Sklearn does with its `Pipeline`, `FeatureUnion` and `ColumnTransformer` does. This is a quick introduction.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffernandonieuwveldt%2Feasyflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffernandonieuwveldt%2Feasyflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffernandonieuwveldt%2Feasyflow/lists"}