Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with big-data-analytics

A curated list of projects in awesome lists tagged with big-data-analytics .

https://github.com/ICT-BDA/EasyML

Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.

big-data-analytics learning-platform machine-learning machine-learning-platform machine-learning-studio

Last synced: 30 Oct 2024

https://github.com/ict-bda/easyml

Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.

big-data-analytics learning-platform machine-learning machine-learning-platform machine-learning-studio

Last synced: 21 Dec 2024

https://github.com/mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark

big-data big-data-analytics data-algorithms pyspark spark spark-dataframes spark-rdd

Last synced: 21 Dec 2024

https://github.com/alibaba/v6d

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)

big-data-analytics cloud-native cncf distributed distributed-comp distributed-systems graph-analytics in-memory-storage shared-memory sig-storage tag-storage

Last synced: 14 Dec 2024

https://github.com/v6d-io/v6d

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)

big-data-analytics cloud-native cncf distributed distributed-comp distributed-systems graph-analytics in-memory-storage shared-memory sig-storage tag-storage

Last synced: 28 Oct 2024

https://github.com/mrxujiang/v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

antv big-data big-data-analytics bigdata dooring low-code lowcode nodejs react webgl2

Last synced: 15 Dec 2024

https://github.com/lithops-cloud/lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀

big-data big-data-analytics cloud-computing data-processing distributed kubernetes multicloud multiprocessing object-storage parallel python serverless serverless-computing serverless-functions

Last synced: 15 Dec 2024

https://github.com/rouyang2017/SISSO

A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.

big-data-analytics compressed-sensing machine-learning material-science symbolic-regression

Last synced: 13 Nov 2024

https://github.com/Ashish7129/Graph_Sampling

Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.

big-data big-data-analytics breadth-first-search data-mining graphs induction network network-analysis network-science networkx python random-walk sample sampling social-network-analysis subgraph

Last synced: 27 Nov 2024

https://github.com/archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

analysis apache-spark big-data big-data-analytics dataframe digital-humanities hadoop network-graphing pyspark python3 scala spark text-extraction webarchives

Last synced: 18 Dec 2024

https://github.com/u2i/egis

Egis - a handy Ruby interface for AWS Athena

aws aws-athena big-data big-data-analytics ruby ruby-gem

Last synced: 24 Nov 2024

https://github.com/ingef/conquery

Visual, interactive queries against big databases

big-data big-data-analytics java

Last synced: 17 Nov 2024

https://github.com/arakat-community/arakat

ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform

big-data-analytics business-intelligence cloud-native-applications data-pipelines distributed-systems docker docker-swarm predictive-maintenance

Last synced: 14 Nov 2024

https://github.com/wittline/pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

aws aws-emr big-data big-data-analytics dataengineering ec2-spot ec2-spot-instances emr-cluster pyspark python spark wordcloud-generator

Last synced: 14 Oct 2024

https://github.com/azure/azurekusto

R interface to Azure Data Explorer, aka Kusto

azure azure-data-explorer azure-sdk-r big-data-analytics kusto r

Last synced: 07 Oct 2024

https://github.com/seeratawan01/autocapture.js

Build your own analytics - A single library to grabs every click, touch, page-view, and fill — forever.

analytics autocapture big-data-analytics events heatmap user-behavior-analytics user-behaviour user-events user-interaction

Last synced: 09 Nov 2024

https://github.com/jdvelasq/courses

Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia

analytics big-data big-data-analytics data-science training-materials

Last synced: 05 Dec 2024

https://github.com/fiware/tutorials.big-data-flink

:blue_book: FIWARE 305: Real-time Processing of Context Data using Apache Flink

apache-flink big-data-analytics fiware fiware-cosmos flink orion-flink-connector tutorial

Last synced: 17 Nov 2024

https://github.com/n1ghtf1re/map-of-emergency-incidents

Emergency Map allows you to effectively visualize multi-dimensional information, has an intuitive interface. The developed code is easily modified for use in a variety of areas. The use of color mixing technology enhances the perception and analysis of information

big-data big-data-analytics big-data-visualization bigdata color-mixing colors data data-analytics data-science data-visualization data-visualization-challenges data-visualization-simpler mysql open-source-project php student-project

Last synced: 27 Oct 2024

https://github.com/adityakamble49/loss-ratio-prediction

Predicting Loss Ratios for Auto Insurance Portfolios - ITCS 6100 Big Data Analytics for Competitive Advantage

big-data big-data-analytics data-science insurance jupyter-notebook politics python

Last synced: 27 Nov 2024

https://github.com/asavinov/bistro

A general-purpose data analysis engine radically changing the way batch and stream data is processed

analytics big-data-analytics edge-analytics iot stream-analytics stream-processing

Last synced: 30 Oct 2024

https://github.com/ren294/log-analysis-project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming

Last synced: 11 Oct 2024

https://github.com/bydevmar/master_masd_fpo

Ce dépôt GitHub regroupe tous les cours, TP, TD, projets, et exercices de ma formation en master en mathématiques appliquées pour la science des données. Parcourez-le pour une vue complète de mon parcours académique, offrant une perspective détaillée de mon apprentissage dans ce domaine.

acp afc algebra big-data-analytics dashboards data-analysis datascience economics english graph-theory latex linear-algebra non-linear-algebra probability prog python scientific-research software-package statistics

Last synced: 17 Nov 2024

https://github.com/ren294/covid-data-process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

airflow aws aws-ec2 aws-quicksight big-data big-data-analytics covid19-data docker docker-compose hadoop-hdfs hdfs hive kafka nifi pipeline redpanda spark spark-sql spark-streaming sparksql

Last synced: 11 Oct 2024

https://github.com/dgkanatsios/gameanalyticseventhubfunctionscosmosdatalake

Big data reference architecture and implementation for an online multiplayer game

big-data big-data-analytics data-lake-analytics event-hubs lambda-architecture

Last synced: 08 Nov 2024

https://github.com/angeligareta/spark-flight-prediction

Assignment for Cloud Computing And Big Data Ecosystems Design subject that aims to predict flight arrival time using Apache Spark and Scala.

apache-spark big-data big-data-analytics cloud-computing scala upm

Last synced: 22 Nov 2024

https://github.com/garystafford/dataproc-java-demo

Demonstration of Google Cloud Dataproc for running Spark jobs with Java

big-data-analytics dataproc gcp google java spark

Last synced: 06 Dec 2024

https://github.com/aveek-saha/cricket-score-predictor

A Big data application to predict the outcome of a T20 cricket match.

big-data big-data-analytics clustering pyspark spark spark-mllib

Last synced: 05 Nov 2024

https://github.com/hrolive/big-data-analysis-with-hadoop-and-rhadoop

Foundations of “Big Data” processing by introducing the Hadoop distributed computing architecture and providing an introductory level tutorial for Big Data analysis using Hadoop, Rhadoop, and R libraries parallel, doParallel, foreach and Rmpi.

big-data big-data-analytics hadoop hdfs hpc hpc-clusters jupyter mapreduce mpi python r rstudio unix

Last synced: 09 Nov 2024

https://github.com/srlozano/tinder-big-data-analysis

Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech

big-data big-data-analytics data-science dating-app mongodb python

Last synced: 30 Nov 2024

https://github.com/fiware/tutorials.big-data-spark

:blue_book: FIWARE 306: Real-time Processing of Context Data using Apache Spark

apache-spark big-data-analytics fiware fiware-cosmos orion-spark-connector spark tutorial

Last synced: 17 Nov 2024

https://github.com/msusazureaccelerators/workplace-intelligence-accelerator

The Workplace Intelligence Accelerator leverages machine learning and big data analytics to combine and transform data, allowing customer to easily identify factors that influence how people work in their organization.

accelerator ai artificial-intelligence azure azure-devops big-data-analytics human-resources m365 machine-learning microsoft ml power-bi workplace-analytics

Last synced: 26 Nov 2024

https://github.com/abroniewski/idlecompute-data-management-architecture

Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.

bdma big-data big-data-analytics bigdata dataops hadoop-hdfs machine-learning parquet pipeline pyspark-mllib

Last synced: 12 Nov 2024

https://github.com/harshoza36/movielens_pyspark

MovieLens Dataset analysis using Hadoop and Pyspark

big-data-analytics hadoop movielens movielens-data-analysis pyspark spark spark-sql

Last synced: 12 Nov 2024

https://github.com/bayunova28/sas_visual_data_mining_machine_learning

This repository contains about my weekly projects from Big Data Analytics II course at my college

big-data-analytics big-data-projects data-science machine-learning neural-network

Last synced: 18 Dec 2024

https://github.com/jabhij/predictionmodels_heartdiseases

Comparison of various Machine Learning algorithms for Heart Diseases (Heart Attack) prediction.

big-data-analytics bigdata data-visualization datamodeling decision-tree hive knn logistic-regression machine-learning mapreduce naviebayes random-forest svm

Last synced: 16 Nov 2024

https://github.com/hatoonguls/big-data-analytics

The repositary contains big data analytics projects using Apache Spark, SQL, and Machine Learning models.

apache-spark big-data-analytics machine-learning-algorithms python

Last synced: 16 Nov 2024

https://github.com/aanujkhurana/bigdata-analysis

SocialMedia Big Data Analysis for Eminem (music artist), using RStudio and R lang

big-data big-data-analytics r-programming-language rstudio

Last synced: 11 Nov 2024

https://github.com/jowilf/big-data-showcase

This repository contains a project showcasing the use of Big Data technologies in processing and visualizing real-time data from an eCommerce electronics store using tools such as Apache Kafka, Spark Streaming, Spark SQL, HBase, and Plotly

big-data-analytics hbase kafka plotly-dash spark-sql spark-streaming

Last synced: 14 Nov 2024

https://github.com/rakeshkanneeswaran/project-cytosine-guanine-gc-percentage-in-genome-sequence

This repository on GitHub contains a Python program that uses data science techniques to calculate the percentage of cytosine and guanine in a genome sequence. Cytosine and guanine are two of the four nucleotide bases found in DNA, and their percentage can be used as a measure of the overall composition of a genome.

big-data-analytics data-science data-visualization genome-sequencing matplotlib pandas-library

Last synced: 13 Nov 2024

https://github.com/fosfrancesco/tweet-popularity

Predict the number of retweets that a tweet about a specific museum will have.

big-data-analytics machine-learning

Last synced: 13 Nov 2024

https://github.com/madhurimarawat/big-data-analytics

This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.

big-data big-data-analytics big-data-analytics-techniques hadoop-hdfs hadoop-installation hadoop-mapreduce python

Last synced: 14 Nov 2024

https://github.com/arxiver/airbnb-eda-and-regression

Big data exploration and analysis on Airbnb dataset as well as regression model for price prediction of entities

airbnb analysis big-data big-data-analytics bigdata eda python regression regression-models visualization xgboost

Last synced: 15 Nov 2024

https://github.com/jkhan01/kafka-spark-stream

The Project and workaround repository to generate a producer stream to kafka cluster, consume and then process it.

apache-kafka apache-spark big-data big-data-analytics maven pyspark

Last synced: 30 Nov 2024

https://github.com/r13i/cheapest-phone-call

Small challenge to find the best phone operator to use based on call price

big-data big-data-analytics cheapest data-analysis data-cruncher pandas phone-number pricelist

Last synced: 07 Dec 2024

https://github.com/nathanvilbert/kaiture-agriculture-business-reports-with-power-bi

The project "Kaiture-Agriculture-Business-Reports-with-Power-BI" focuses on utilizing Business Intelligence to optimize agricultural yield and productivity. By integrating Power BI for data analysis, this project provides comprehensive insights into crop production patterns, market trends, and key factors affecting yield.

big-data-analytics data-visualization power-bi sas swot-analysis

Last synced: 07 Dec 2024

https://github.com/bryanfks-dev/klempoken-analysis

Analysis and forcasting model for Klempoken MSMEs

big-data-analytics data-analysis data-forecast data-visualization

Last synced: 14 Dec 2024

https://github.com/nickenshidqia/big_data_analytics_kimia_farma

Big Data Analytics Project gives challenges to create data mart design and dashboard on Kimia Farma

big-data-analytics dashboard data-analyst looker-studio postgresql

Last synced: 05 Nov 2024

https://github.com/hanif-syazul/analyzing-kimia-farma-sales-performance-with-gcp

This repository contains the final project for the Rakamin Big Data Analytics Internship. It include a complete dashboard of Kimia Farma's sales performance analysis from 2020 to 2023.

big-data-analytics bigquery internship-project kimia-farma looker-studio rakamin sql

Last synced: 21 Nov 2024

https://github.com/h-fuzzy-logic/technical-writing

Technical writing samples. Includes walkthroughs and tutorials around data engineering and cloud architectures.

big-data big-data-analytics cloud data-engineering

Last synced: 15 Dec 2024

https://github.com/vara-co/home_sales

Module 22 challenge: Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions

big-data big-data-analytics cache google-colab google-colaboratory parquet pyspark pyspark-sql

Last synced: 07 Dec 2024

https://github.com/ixgnoy/visualize_movie_with_rating

By using Hadoop, visualization.

big-data big-data-analytics hadoop query

Last synced: 08 Dec 2024

https://github.com/nataliabeltranarg/nosql_graphdatabases_neo4j

This repository showcases a practical exercise on graph databases using Neo4j, covering tasks like graph creation, evolution, querying, and similarity algorithms

big-data-analytics cypher-query-language datamanagement neo4j nosql python synthetic-data

Last synced: 10 Oct 2024

https://github.com/zmyzheng/browserassistant

Big Data & Cloud Computing project for recommendation, cluster analysis, data visualization with Hadoop and Spark deployed in auto- scaling cloud environment, youtube link:

angular big-data-analytics cloud cluster-analysis data-visualization elasticsearch flask hadoop recommendation-system spark spring-boot

Last synced: 11 Dec 2024

https://github.com/yash22222/olympic-games-analytics-using-apache-spark

The "Olympic Games Analytics Using Apache Spark Databricks" project explores data from the Olympic Games (1896-2016) to identify trends and insights. Using Apache Spark for big data processing and Databricks for visualization, the project analyzes key factors like top-performing countries and athlete attributes, showcasing real-world analytics.

apache apache-kafka apache-spark big-data-analytics csv data data-analytics data-visualization databricks excel mysql olympics regions

Last synced: 08 Dec 2024

https://github.com/mrham17/spotify_streaming_analytics

Project is stable & documentation will be completed soon. Thank you for your understanding and patience.

big-data-analytics data-analysis google-colab music-data r-programming spotify streaming-analytics

Last synced: 04 Dec 2024

https://github.com/smohanta23/uber_data-engineering_etl-project

This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.

big-data-analytics bigquery cloudcomputing computeengine dashboard-application dataengineering datainsights datamodelling datapipeline datascience datavisualization etl-pipeline gcp-project googlecloudplatform mage opensource python uber uber-api

Last synced: 21 Nov 2024

https://github.com/zmyzheng/stack_overflow_qa_assistant

Big Data Analysis project with recommendation, cluster analysis and graph database

big-data-analytics cluster-analysis data-visualization graph-database hadoop mahout recommendation-system

Last synced: 11 Dec 2024

https://github.com/wlun001/youtube-video-analysis

YouTube video analysis based on datasets on Kaggle

big-data-analytics dataset kaggle scala spark

Last synced: 17 Dec 2024

https://github.com/saraasgari99/customer-big-data-analytics

Comprehensive exploratory & predictive analysis of customer behavior in e-commerce using big data analytics, visualization, and machine learning

big-data-analytics exploratory-data-analysis exploratory-data-visualizations machine-learning pandas pca python random-forest sklearn

Last synced: 07 Nov 2024

https://github.com/bayunova28/sas_viya_programming

This repository contains about my weekly projects from Big Data Analytics II course at my college

big-data-analytics big-data-projects data-science sas sas-viya

Last synced: 18 Dec 2024

https://github.com/rbalbinotti/prevendo_cons_energia_carros

Curso - Big Data Analytics com R e Microsoft Azure Machine Learning - Projeto Conclusão

azure big-data-analytics machine-learning r

Last synced: 08 Nov 2024

https://github.com/abdurrehman7452/search-engine-utilising-hadoop-mapreduce-technology-with-python-on-wikipedia-articles

Developing a Naive Search Engine Utilising Apache Hadoop MapReduce Technology on a dataset in comma-separated values (CSV) format containing around 5 million Wikipedia articles provided by Wikimedia, as part of an assignment for the Fundamental of Big Data Analytics (DS2004) course.

apache-hadoop big-data-analytics data-science hadoop-mapreduce mapreduce mapreduce-python search-engine wikimedia wikipedia wikipedia-articles

Last synced: 09 Nov 2024

https://github.com/srosalino/six_degrees_of_separation_and_engineering_the_perfect_cast

Leveraging PySpark to analyze the IMDB database, answer various queries, and develop machine learning models to predict a movie's popularity based on its cast

big-data big-data-analytics databricks pyspark pyspark-mllib

Last synced: 10 Nov 2024

https://github.com/sayamalt/steel-energy-consumption-prediction-using-pyspark

Successfully established a machine learning model using PySpark which can precisely predict the energy consumption of the steel industry, up to an r2 score of approximately 99.5%.

apache-spark big-data-analytics big-data-processing cross-validation data-visualization exploratory-data-analysis hyperparameter-tuning machine-learning model-training-and-evaluation python regression spark sql

Last synced: 16 Nov 2024

https://github.com/mituskillologies/bigdata-ait-sep24

Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.

apache-hadoop apache-spark big-data big-data-analytics hadoop spark

Last synced: 16 Nov 2024

https://github.com/aalkiyumi/project-4-big-data-analysis-with-pyspark-on-weather-data

In this project, I analyzed weather data from the NCEI Global Surface Summary of Day dataset using PySpark in Jupyter Notebook. Tasks included data cleaning, statistical analysis, and forecasting for temperature, wind speed, precipitation, and extreme weather events. The project also predicts future weather patterns for Cincinnati and Florida.

big-data-analytics cs5165 data-analysis data-cleaning data-engineering data-science introduction-to-cloud-computing jupyter-notebook machine-learning precipitation-analysis predictive-modeling pyspark statistical-analysis temperature-forecasting time-series-forecasting uc uc2026 university-of-cincinnati wind-speed-data

Last synced: 23 Nov 2024