Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with big-data-processing
A curated list of projects in awesome lists tagged with big-data-processing .
https://github.com/souvik-databricks/dlt-with-debug
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
big-data big-data-processing databricks delta-live-tables dlt etl etl-pipeline python3 spark
Last synced: 01 Nov 2024
https://github.com/anirban166/big-data-ft.-genomics
Analysis, organization and querying of large genomic datasets using C++, Monsoon and various data structures.
big-data-processing bioinformatics data-structures-and-algorithms genomic-sequences
Last synced: 13 Oct 2024
https://github.com/sayamalt/steel-energy-consumption-prediction-using-pyspark
Successfully established a machine learning model using PySpark which can precisely predict the energy consumption of the steel industry, up to an r2 score of approximately 99.5%.
apache-spark big-data-analytics big-data-processing cross-validation data-visualization exploratory-data-analysis hyperparameter-tuning machine-learning model-training-and-evaluation python regression spark sql
Last synced: 11 Oct 2024
https://github.com/khanovico/energy-data-analysis
This is the cloud model analyzing real world dataset with BigQuery and other big-data analyzing tools. I implemented docker image for running this app on cross-platform environments.
big-data-processing bigquery docker google-app-engine jupyter-notebook mlflow python scikit-learn seaborn xgboost
Last synced: 10 Oct 2024
https://github.com/turnipdo/docker-spark-setup
Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.
big-data-processing docker-container setup spark
Last synced: 11 Oct 2024
https://github.com/srking501/csc8101_coursework
A summative coursework for CSC8101 Engineering for AI
apache-parquet apache-spark azure-databricks big-data big-data-analytics big-data-processing data-science databri databricks-notebooks delta-file nyc-taxi-dataset parquet-files pyspark
Last synced: 12 Oct 2024