https://github.com/antoinewg/ocr-page-rank
PageRank algorithm using Hadoop Streaming
https://github.com/antoinewg/ocr-page-rank
hadoop-streaming mapreduce-python pagerank-algorithm
Last synced: 2 months ago
JSON representation
PageRank algorithm using Hadoop Streaming
- Host: GitHub
- URL: https://github.com/antoinewg/ocr-page-rank
- Owner: antoinewg
- Created: 2019-09-11T21:34:41.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-09-11T21:41:58.000Z (almost 6 years ago)
- Last Synced: 2025-02-14T23:49:21.559Z (4 months ago)
- Topics: hadoop-streaming, mapreduce-python, pagerank-algorithm
- Language: Python
- Homepage:
- Size: 438 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🔍PageRank
### Setup
- `hadoop fs -put _movies/graph/adj_list /input` to place the adj list in the input folder.
### Commands
- `hadoop jar hadoop-streaming.jar -input /input/adj_list -output /out -mapper ./mapper.py -reducer reducer.py && hadoop fs -cat /out/part-00000` to rrun the Map Reduce algorithm. (Use _adj_ file for smaller tests)
- `hadoop jar hadoop-streaming.jar -input /out/part-00000 -output /maxs -mapper ./max_page_rank.py && hadoop fs -cat /maxs/part-00000`: to show the 20 top rresults
## 20 tops results
After the first iteration:
```
1138 0.000126
1139 0.000126
1140 0.000126
1141 0.000126
1142 0.000126
1143 0.000126
1170 0.000123
1197 0.000126
1198 0.000126
1199 0.000126
12 0.000134
1200 0.000126
1209 0.000126
1210 0.000126
3701 0.000147
3966 0.000120
4 0.000120
524 0.000120
5322 0.000128
93 0.000183
```