An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with mapreduce-python

A curated list of projects in awesome lists tagged with mapreduce-python .

https://github.com/krishnadey30/newsheadlines

This repository have codes that extracts meaningful information from News headline data-set.

hadoop hadoop-mapreduce mapreduce-python news-dataset python

Last synced: 18 Mar 2025

https://github.com/yoongoing/bigdata_pyspark

⚡️공개용 맵리듀스 플랫폼인 Spark를 사용하여 데이터마이닝을 해보자⚡️

bigdata dataminig jupyter-notebook mapreduce mapreduce-python pyspark spark

Last synced: 04 Apr 2025

https://github.com/thevinh-ha-1710/big-data-pipeline-design

This project builds a data pipeline implementing the ETL process.

big-data etl-pipeline json mapreduce-python mongodb-database

Last synced: 26 Feb 2025

https://github.com/raphael-jin/edfs

Emulation-based System for Distributed File storage and Parallel Computation

distributed-computing distributed-systems mapreduce-python servrless

Last synced: 25 Mar 2025

https://github.com/antoinewg/ocr-page-rank

PageRank algorithm using Hadoop Streaming

hadoop-streaming mapreduce-python pagerank-algorithm

Last synced: 09 Apr 2025

https://github.com/lesiaukr/goit-algo2-hw-06

Master's | Design & Analysis of Algorithms | Fundamentals of Parallel Computing and the MapReduce Model

goit-algo2-hw-06 mapreduce-python matplotlib python threadpoolexecutor

Last synced: 24 Apr 2025

https://github.com/abdurrehman7452/search-engine-utilising-hadoop-mapreduce-technology-with-python-on-wikipedia-articles

Developing a Naive Search Engine Utilising Apache Hadoop MapReduce Technology on a dataset in comma-separated values (CSV) format containing around 5 million Wikipedia articles provided by Wikimedia, as part of an assignment for the Fundamental of Big Data Analytics (DS2004) course.

apache-hadoop big-data-analytics data-science hadoop-mapreduce mapreduce mapreduce-python search-engine wikimedia wikipedia wikipedia-articles

Last synced: 22 Feb 2025

https://github.com/aditeyabaral/mapreduce-word2vec

Implementation of Word2Vec for large datasets as a Map-Reduce Job using Hadoop Streaming.

hadoop-streaming machine-learning mapreduce-python mapreduce-word2vec nlp word-embeddings word2vec

Last synced: 09 Mar 2025

https://github.com/ashwinpn/wikisea

Search Engine for Wikipedia.

mapreduce-python search-engine wikipedia

Last synced: 05 Mar 2025

https://github.com/vigneshss-07/bigdata_technologies

This repo contains all technical knowledge and implementation of big data technologies.

big-data hadoop hadoop-hdfs hbase hive hive-metastore kafka mapreduce-python pyspark spark sparksql

Last synced: 05 Mar 2025

https://github.com/bayunova28/spotify_lyrics

This repository contains my personal project to generate mapreduce using apache hadoop

apache-derby apache-hadoop apache-hive hadoop-mapreduce mapreduce-python spotify

Last synced: 05 Apr 2025

https://github.com/yevheniidatsenko/goit-algo2-hw-06

🗒️ Home Task - Design and Analysis of Algorithms (Fundamentals of Parallel Computing and the MapReduce Model)

goit-algo2-hw-06 mapreduce-python matplotlib python

Last synced: 27 Feb 2025