Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bashmocha/pagerank-algorithm
PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results
https://github.com/bashmocha/pagerank-algorithm
Last synced: about 1 month ago
JSON representation
PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results
- Host: GitHub
- URL: https://github.com/bashmocha/pagerank-algorithm
- Owner: BashMocha
- Created: 2022-10-10T12:43:12.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-10-10T12:59:44.000Z (over 2 years ago)
- Last Synced: 2024-12-06T09:38:41.248Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PageRank Algorithm in Python
## Task:
This algorithhm is one of the projects I've written for [CS50AI](https://cs50.harvard.edu/ai/2020/). You can check my [projects](https://github.com/CheesyFrappe/cs50-ai).
Using a Random Surfer Markov Chain and an Iterative Algorithm, write an AI to rank web pages by importance.## Background:
Search engines like Google display search results in order of importance using a page-ranking algorithm. The Google PageRank algorithm considers a website as more important (higher page rank), when it is linked to by other imporant websites. Links from less important websites have a lower importance weighting.
## Random Surfer Model:
One way to calculate the PageRanks of a corpus of webpages is to use a random surfer model. This models the behaviour of a web surfer who will start on a random web page and with some probability d (where d is known as the damping factor) randomly click any link on the page, or with with probability 1-d randomly go to any other page in the corpus. This model can be interpreted as a Markov Chain, with each page representing a state, and each page having its own transition model for the probability of moving to any other page next. By sampling a large number states from the Markov Chain, a distribution of the number of visits to each page in the corpus can be obtained, and from it, an estimate of the relative page ranking for each page from 0 to 1 can be calculated.
## Iterative Algorithm:
A page's PageRank can also be calculated using the following recursive mathematical expression:
where:
* PR(p) is the PageRank of a given page p
* d is the damping factor
* N is the number of pages in the corpus
* i is each possible page in the corpus
* PR(i) is the PageRank of page i
* NumLinks(i) is the number of links on page iIn order to calculate the PageRank for each page in the corpus, we can start with a basic PageRank for each page of 1 / N, and then iteratively calculate new PageRank values for each page. This continues until PageRank values converge (changes on an iteration are less than a threshold value).
## Usage:
Requires Python(3) to run:python pagerank.py (corpus_directory)