https://github.com/avaiga/dask_taipy_bigdata_demo
Building smart Big Data pipelines with Dask & Taipy (DEMO)
https://github.com/avaiga/dask_taipy_bigdata_demo
Last synced: 7 months ago
JSON representation
Building smart Big Data pipelines with Dask & Taipy (DEMO)
- Host: GitHub
- URL: https://github.com/avaiga/dask_taipy_bigdata_demo
- Owner: Avaiga
- License: apache-2.0
- Created: 2023-09-11T05:06:46.000Z (about 2 years ago)
- Default Branch: develop
- Last Pushed: 2023-09-11T05:07:14.000Z (about 2 years ago)
- Last Synced: 2023-09-11T06:24:35.354Z (about 2 years ago)
- Language: Jupyter Notebook
- Size: 21.5 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Intelligent Big Data Pipelines with Dask & Taipy
## Table of Contents
- [Usage](#usage)
- [About Taipy Core](#about-taipy-core)
- [Demo Type](#demo-type)
- [Directory Structure](#directory-structure)
- [License](#license)
- [Contributing](#contributing)
- [Code of Conduct](#code-of-conduct)## Usage
This README outlines the details and usage of the Dask & Taipy DEMO for building Big Data pipelines.
## About Taipy Core and this Project
Taipy is a Python library aimed at creating Business Applications. More information can be found on our [website](https://www.taipy.io).
This project, Intelligent Big Data Pipelines with Dask & Taipy, serves as a comprehensive demonstration of advanced data manipulation techniques for larger-than-memory datasets. It showcases how to construct robust data pipelines and highlights innovative features of Taipy Core, such as the ability to skip tasks within pipelines for optimized performance.
### Demo Type
- **Level**: Beginner/Intermediate
- **Topic**: Dask, Taipy-Core, Data Pipeline
- **Components/Controls**:
- Taipy Core: DataNode, Pipeline, Scenario## How to Run
The demo works with Python versions greater than 3.8. Install the dependencies from the `Pipfile` and run `app.ipynb` in the `src/` folder. A `requirements.txt` file is also available in the `src/` folder.
## Introduction
### Data Preprocessing and Customer Scoring
Reads customer data and calculates a customer score. The goal is to categorize customers based on different metrics.
### Feature Engineering and Segmentation
Additional features are added and customers are segmented into 'High Value' and 'Low Value' based on their scores.
### Segment Analysis
Performs detailed analysis on each customer segment.
### Additional Analysis
Conducts further statistical analyses based on segment analysis.
## Directory Structure
- `src/`: Contains the source code of the demo.
- `src/algos`: Contains the functions executed as tasks by Taipy.
- `src/config`: Contains the configuration files.
- `src/data`: Contains the data files used in the application.
- `CODE_OF_CONDUCT.md`: Code of conduct for members and contributors.m
- `CONTRIBUTING.md`: Instructions for contributing to the project.
- `INSTALLATION.md`: Installation instructions.
- `LICENSE`: The Apache 2.0 License.
- `Pipfile`: Used by Pipenv to manage project dependencies.
- `README.md`: Current file.## License
Copyright 2023 Avaiga Private Limited
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0.txt)
## Contributing
To contribute to this project, please refer to the [`CONTRIBUTING.md`](CONTRIBUTING.md) file.
## Code of Conduct
For details on how to engage with the community, please refer to [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md).