https://github.com/im-rises/page_rank
PageRank algorithm implemented in Python with Jupyter Notebook
https://github.com/im-rises/page_rank
google jupyter-notebook pagerank python search-algorithm
Last synced: 2 months ago
JSON representation
PageRank algorithm implemented in Python with Jupyter Notebook
- Host: GitHub
- URL: https://github.com/im-rises/page_rank
- Owner: Im-Rises
- License: mit
- Created: 2022-05-25T14:14:04.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-07T22:16:11.000Z (over 2 years ago)
- Last Synced: 2025-03-24T02:43:50.337Z (3 months ago)
- Topics: google, jupyter-notebook, pagerank, python, search-algorithm
- Language: Jupyter Notebook
- Homepage:
- Size: 856 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# page_rank
![]()
![]()
## Description
Implementation and explanation of the Google's Page Rank algorithm for website reference.
The file `page_rank_exercise` is an explanation of the page rank and the `page_rank_exercise.ipynb` is an application of the page rank on a big dataset.## Page rank
The page rank is a Google algorithm for website referencing in google search engine.
I made this program in a learning goal, its implementation is interesting to apprehend the basis of website reference.
![]()
> **Note**
> Nowadays Google use other powerful algorithms.## Quick start
To start each program, please install Python (at least version 3.8):
There is only one package needed:
- numpyYou can install it using the pip python package manager:
```bash
pip install numpy
```You'll also need an IDE like Pycharm, VsCode or any other to start the Jupyter Notebook files.
## Project architecture
The project is composed of two main files.
The first one is made for the basis learning of the Page Rank algorithm with the basis notions :
- website successors and predecessors referencing influence
- the spider-trap and teleport
- the dead endsThe second file is a test of the algorithm for a bigger set of website than the previous example.
It needs the `hollins.dat` file to work, of course this files is provided with the project.**Note**
> If you want to learn more about Google's current algorithm, check the Google link in the `documentations` section.## Output
The two jupyter notebook files will display the same way the output.
The website with the most important reference value will be displayed in first than the other more important website.
page_rank.ipynb:
```
1 : ('B', 0.2956000000000001)
2 : ('C', 0.28648000000000007)
3 : ('D', 0.26280000000000003)
4 : ('A', 0.15512000000000004)
```page_rank_exercise.ipynb:
```
1 : ('http://www.hollins.edu/', 0.0254580229107189)
2 : ('http://www.hollins.edu/admissions/visit/visit.htm', 0.01105341950078871)
3 : ('http://www.hollins.edu/about/about_tour.htm', 0.010282488037670167)
```## Documentations
Wikipedia:
Google:
Search Engine Land:
## Contributors
Quentin MOREL :
- @Im-Rises
-[](https://github.com/Im-Rises/page_rank/graphs/contributors)