https://github.com/niloth-p/page-ranker-with-node-graphs

Implementation of the page rank algorithm with node graphs for visualization of the data
https://github.com/niloth-p/page-ranker-with-node-graphs

networkx nodegraph pagerank-algorithm

Last synced: 4 months ago
JSON representation

Implementation of the page rank algorithm with node graphs for visualization of the data

Host: GitHub
URL: https://github.com/niloth-p/page-ranker-with-node-graphs
Owner: Niloth-p
License: mit
Created: 2017-11-25T18:11:01.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-08-08T22:27:41.000Z (almost 5 years ago)
Last Synced: 2025-01-08T05:07:30.373Z (5 months ago)
Topics: networkx, nodegraph, pagerank-algorithm
Language: Python
Homepage:
Size: 1.01 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README
- License: LICENSE

Awesome Lists containing this project

README

This program ranks all the web pages in any given dataset, according to the Page Rank Algorithm.
It implements topic specific page rank algo, on large graphs.
At the end, it displays the graphs of the nodes and their edges in multiple variations, according to the needs of the user,
and the size of the node graph i.e the number of pages.

Command to run the program:
python PageRanker.py

PACKAGES REQUIRED:

Packages time, deciaml, linecache, collections, random, matplotlib, networkx have been used in this program.
To install any required packages, type
pip install

FILES IN THE REPOSITORY:

IR Assignment-2.pdf is the document describing the problem given as the assignment for the course Information Retrieval
DesignDoc.pdf - The Design document
hollins.dat - the dataset that I have used
PageRank1.txt, rankednodesreadable.txt - files generated by the program during runtime, attached for reference
PageRanker.py - the program file
ExamplePlots folder - has categorized figures of some plots that are drawn by the program

MY DATASET:

I have used the corpus from:
Kenneth Massey's Information Retrieval webpage:
Hollins.edu webbot crawl from January 2004
6012 nodes(webpages), 23875 edges(links)
my data file - hollins.dat

IMPLEMENTATION ON OTHER DATASETS:

To implement the page rank algorithm on other datasets,
create a data file from the dataset in the following format:
1st line - #ofnodes|#ofedges
2nd to (N+1)th line - webpageID|URl
(N+2)th line till the end - node|node (directed edges)
Then the global variable file has to be altered, to the name of your data file

RUNNING TIME : approximately 105s for the algo

CONVERGENCE VALUE (the variable sumofdiffs in the program)
I have found that setting the convergence value to 0.5 (which is done within the 1st iteration) itself gives pretty good results approximately. For more accuracy, you can reduce the s value to some 0.001 too, but that would take many iterations and the time taken will be #ofiterations*runningtime(as specified above), although the running time displayed in each run is the total running time of the algorithm.
The plots are set to be drawn from the last iteration, by default.

USER INPUT:
Taking input n from the user for every graph individually,

1. All 6012 nodes (pretty messy)
2. n popular nodes
The 1st 10 nodes have only 6 nodes with edges inbetween them
The 1st 20 nodes have 11 nodes sharing edges
The 1st 30 nodes have 14 nodes sharing edges
The 1st 50 nodes have 31 nodes sharing edges
3. 20 specific nodes
All of them have at least 1 common edge
4. n random nodes
Higher the value of n, higher is the probability of getting nodes with common edges
On avg, for n=200 to 250, 30-40 nodes will be displayed

All these graphs are displayed sequentially
i.e the next opens on closing the previous one

TO DRAW THE PLOTS OF OTHER DISTRIBUTIONS OF NODES

Write a function to get the inputs for
-> list of edges,
-> the node size ratios,
-> and the mapping of the node labels with the list of nodes,
then call the draw_graph function with those objects as arguments

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/niloth-p/page-ranker-with-node-graphs

Awesome Lists containing this project

README