Projects in Awesome Lists tagged with mapreduce-python
A curated list of projects in awesome lists tagged with mapreduce-python .
https://github.com/mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
algorithms apache-hadoop apache-spark big-data data-algorithms data-analysis data-engineering data-partition data-transformation glossary mapreduce mapreduce-algorithm mapreduce-python monoid partitioning-algorithms pyspark pyspark-algorithms-book santa-clara-university spark-dataframes spark-rdd
Last synced: 12 Apr 2025
https://github.com/krishnadey30/newsheadlines
This repository have codes that extracts meaningful information from News headline data-set.
hadoop hadoop-mapreduce mapreduce-python news-dataset python
Last synced: 18 Mar 2025
https://github.com/yoongoing/bigdata_pyspark
⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️
bigdata dataminig jupyter-notebook mapreduce mapreduce-python pyspark spark
Last synced: 04 Apr 2025
https://github.com/thevinh-ha-1710/big-data-pipeline-design
This project builds a data pipeline implementing the ETL process.
big-data etl-pipeline json mapreduce-python mongodb-database
Last synced: 26 Feb 2025
https://github.com/raphael-jin/edfs
Emulation-based System for Distributed File storage and Parallel Computation
distributed-computing distributed-systems mapreduce-python servrless
Last synced: 25 Mar 2025
https://github.com/aryangupta-09/kmeans-using-mapreduce
K-means clustering algorithm using MapReduce.
distributed-systems grpc grpc-python k-means k-means-algorithm k-means-clustering k-means-implementation k-means-implementation-in-python kmeans kmeans-algorithm kmeans-clustering kmeans-clustering-algorithm map-reduce mapreduce mapreduce-algorithm mapreduce-python protobuf-python protobuf3 protocol-buffers remote-communication
Last synced: 23 Feb 2025
https://github.com/antoinewg/ocr-page-rank
PageRank algorithm using Hadoop Streaming
hadoop-streaming mapreduce-python pagerank-algorithm
Last synced: 09 Apr 2025
https://github.com/lesiaukr/goit-algo2-hw-06
Master's | Design & Analysis of Algorithms | Fundamentals of Parallel Computing and the MapReduce Model
goit-algo2-hw-06 mapreduce-python matplotlib python threadpoolexecutor
Last synced: 24 Apr 2025
https://github.com/abdurrehman7452/search-engine-utilising-hadoop-mapreduce-technology-with-python-on-wikipedia-articles
Developing a Naive Search Engine Utilising Apache Hadoop MapReduce Technology on a dataset in comma-separated values (CSV) format containing around 5 million Wikipedia articles provided by Wikimedia, as part of an assignment for the Fundamental of Big Data Analytics (DS2004) course.
apache-hadoop big-data-analytics data-science hadoop-mapreduce mapreduce mapreduce-python search-engine wikimedia wikipedia wikipedia-articles
Last synced: 22 Feb 2025
https://github.com/aditeyabaral/mapreduce-word2vec
Implementation of Word2Vec for large datasets as a Map-Reduce Job using Hadoop Streaming.
hadoop-streaming machine-learning mapreduce-python mapreduce-word2vec nlp word-embeddings word2vec
Last synced: 09 Mar 2025
https://github.com/ashwinpn/wikisea
Search Engine for Wikipedia.
mapreduce-python search-engine wikipedia
Last synced: 05 Mar 2025
https://github.com/vigneshss-07/bigdata_technologies
This repo contains all technical knowledge and implementation of big data technologies.
big-data hadoop hadoop-hdfs hbase hive hive-metastore kafka mapreduce-python pyspark spark sparksql
Last synced: 05 Mar 2025
https://github.com/bayunova28/spotify_lyrics
This repository contains my personal project to generate mapreduce using apache hadoop
apache-derby apache-hadoop apache-hive hadoop-mapreduce mapreduce-python spotify
Last synced: 05 Apr 2025
https://github.com/yevheniidatsenko/goit-algo2-hw-06
🗒️ Home Task - Design and Analysis of Algorithms (Fundamentals of Parallel Computing and the MapReduce Model)
goit-algo2-hw-06 mapreduce-python matplotlib python
Last synced: 27 Feb 2025