https://github.com/niloth-p/page-ranker-with-node-graphs
Implementation of the page rank algorithm with node graphs for visualization of the data
https://github.com/niloth-p/page-ranker-with-node-graphs
networkx nodegraph pagerank-algorithm
Last synced: 4 months ago
JSON representation
Implementation of the page rank algorithm with node graphs for visualization of the data
- Host: GitHub
- URL: https://github.com/niloth-p/page-ranker-with-node-graphs
- Owner: Niloth-p
- License: mit
- Created: 2017-11-25T18:11:01.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-08-08T22:27:41.000Z (almost 5 years ago)
- Last Synced: 2025-01-08T05:07:30.373Z (5 months ago)
- Topics: networkx, nodegraph, pagerank-algorithm
- Language: Python
- Homepage:
- Size: 1.01 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
- License: LICENSE
Awesome Lists containing this project
README
This program ranks all the web pages in any given dataset, according to the Page Rank Algorithm.
It implements topic specific page rank algo, on large graphs.
At the end, it displays the graphs of the nodes and their edges in multiple variations, according to the needs of the user,
and the size of the node graph i.e the number of pages.Command to run the program:
python PageRanker.pyPACKAGES REQUIRED:
Packages time, deciaml, linecache, collections, random, matplotlib, networkx have been used in this program.
To install any required packages, type
pip installFILES IN THE REPOSITORY:
IR Assignment-2.pdf is the document describing the problem given as the assignment for the course Information Retrieval
DesignDoc.pdf - The Design document
hollins.dat - the dataset that I have used
PageRank1.txt, rankednodesreadable.txt - files generated by the program during runtime, attached for reference
PageRanker.py - the program file
ExamplePlots folder - has categorized figures of some plots that are drawn by the programMY DATASET:
I have used the corpus from:
Kenneth Massey's Information Retrieval webpage:
Hollins.edu webbot crawl from January 2004
6012 nodes(webpages), 23875 edges(links)
my data file - hollins.datIMPLEMENTATION ON OTHER DATASETS:
To implement the page rank algorithm on other datasets,
create a data file from the dataset in the following format:
1st line - #ofnodes|#ofedges
2nd to (N+1)th line - webpageID|URl
(N+2)th line till the end - node|node (directed edges)
Then the global variable file has to be altered, to the name of your data fileRUNNING TIME : approximately 105s for the algo
CONVERGENCE VALUE (the variable sumofdiffs in the program)
I have found that setting the convergence value to 0.5 (which is done within the 1st iteration) itself gives pretty good results approximately. For more accuracy, you can reduce the s value to some 0.001 too, but that would take many iterations and the time taken will be #ofiterations*runningtime(as specified above), although the running time displayed in each run is the total running time of the algorithm.
The plots are set to be drawn from the last iteration, by default.USER INPUT:
Taking input n from the user for every graph individually,1. All 6012 nodes (pretty messy)
2. n popular nodes
The 1st 10 nodes have only 6 nodes with edges inbetween them
The 1st 20 nodes have 11 nodes sharing edges
The 1st 30 nodes have 14 nodes sharing edges
The 1st 50 nodes have 31 nodes sharing edges
3. 20 specific nodes
All of them have at least 1 common edge
4. n random nodes
Higher the value of n, higher is the probability of getting nodes with common edges
On avg, for n=200 to 250, 30-40 nodes will be displayedAll these graphs are displayed sequentially
i.e the next opens on closing the previous oneTO DRAW THE PLOTS OF OTHER DISTRIBUTIONS OF NODES
Write a function to get the inputs for
-> list of edges,
-> the node size ratios,
-> and the mapping of the node labels with the list of nodes,
then call the draw_graph function with those objects as arguments