An open API service indexing awesome lists of open source software.

https://github.com/tr-reny/mapreduce_wikipedia

A MapReduce-based approach to compute relative frequencies of each word that occurs in 100,000 Wikipedia documents, output the top 100-word pairs sorted in a decreasing order of relative frequency.
https://github.com/tr-reny/mapreduce_wikipedia

Last synced: about 2 months ago
JSON representation

A MapReduce-based approach to compute relative frequencies of each word that occurs in 100,000 Wikipedia documents, output the top 100-word pairs sorted in a decreasing order of relative frequency.

Awesome Lists containing this project

README

          


MapReduce_Wikipedia


📄 🚀



Description


The goal of the project is to Explored a set of 100,000 Wikipedia documents: [100KWikiText.txt](https://web.njit.edu/~chasewu/Courses/Spring2022/CS644BigData/HW/100KWikiText.txt), in which each line consists of the plain text extracted from an individual Wikipedia document


MapReduce_Wikipedia is released under the MIT license.



Stars