Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with big-data-processing

A curated list of projects in awesome lists tagged with big-data-processing .

https://github.com/souvik-databricks/dlt-with-debug

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data big-data-processing databricks delta-live-tables dlt etl etl-pipeline python3 spark

Last synced: 01 Nov 2024

https://github.com/anirban166/big-data-ft.-genomics

Analysis, organization and querying of large genomic datasets using C++, Monsoon and various data structures.

big-data-processing bioinformatics data-structures-and-algorithms genomic-sequences

Last synced: 13 Oct 2024

https://github.com/sayamalt/steel-energy-consumption-prediction-using-pyspark

Successfully established a machine learning model using PySpark which can precisely predict the energy consumption of the steel industry, up to an r2 score of approximately 99.5%.

apache-spark big-data-analytics big-data-processing cross-validation data-visualization exploratory-data-analysis hyperparameter-tuning machine-learning model-training-and-evaluation python regression spark sql

Last synced: 11 Oct 2024

https://github.com/khanovico/energy-data-analysis

This is the cloud model analyzing real world dataset with BigQuery and other big-data analyzing tools. I implemented docker image for running this app on cross-platform environments.

big-data-processing bigquery docker google-app-engine jupyter-notebook mlflow python scikit-learn seaborn xgboost

Last synced: 10 Oct 2024

https://github.com/turnipdo/docker-spark-setup

Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.

big-data-processing docker-container setup spark

Last synced: 11 Oct 2024