https://github.com/im-rises/page_rank

PageRank algorithm implemented in Python with Jupyter Notebook
https://github.com/im-rises/page_rank

google jupyter-notebook pagerank python search-algorithm

Last synced: 2 months ago
JSON representation

PageRank algorithm implemented in Python with Jupyter Notebook

Host: GitHub
URL: https://github.com/im-rises/page_rank
Owner: Im-Rises
License: mit
Created: 2022-05-25T14:14:04.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-01-07T22:16:11.000Z (over 2 years ago)
Last Synced: 2025-03-24T02:43:50.337Z (3 months ago)
Topics: google, jupyter-notebook, pagerank, python, search-algorithm
Language: Jupyter Notebook
Homepage:
Size: 856 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# page_rank

## Description

Implementation and explanation of the Google's Page Rank algorithm for website reference.
The file `page_rank_exercise` is an explanation of the page rank and the `page_rank_exercise.ipynb` is an application of the page rank on a big dataset.

## Page rank

The page rank is a Google algorithm for website referencing in google search engine.
I made this program in a learning goal, its implementation is interesting to apprehend the basis of website reference.

wikipediaImgPageRank

> **Note**
> Nowadays Google use other powerful algorithms.

## Quick start

To start each program, please install Python (at least version 3.8):

There is only one package needed:
- numpy

You can install it using the pip python package manager:

```bash
pip install numpy
```

You'll also need an IDE like Pycharm, VsCode or any other to start the Jupyter Notebook files.

## Project architecture

The project is composed of two main files.

The first one is made for the basis learning of the Page Rank algorithm with the basis notions :

- website successors and predecessors referencing influence
- the spider-trap and teleport
- the dead ends

The second file is a test of the algorithm for a bigger set of website than the previous example.
It needs the `hollins.dat` file to work, of course this files is provided with the project.

**Note**
> If you want to learn more about Google's current algorithm, check the Google link in the `documentations` section.

## Output

The two jupyter notebook files will display the same way the output.

The website with the most important reference value will be displayed in first than the other more important website.

page_rank.ipynb:

```
1 : ('B', 0.2956000000000001)
2 : ('C', 0.28648000000000007)
3 : ('D', 0.26280000000000003)
4 : ('A', 0.15512000000000004)
```

page_rank_exercise.ipynb:

```
1 : ('http://www.hollins.edu/', 0.0254580229107189)
2 : ('http://www.hollins.edu/admissions/visit/visit.htm', 0.01105341950078871)
3 : ('http://www.hollins.edu/about/about_tour.htm', 0.010282488037670167)
```

## Documentations

Wikipedia:

Google:

Search Engine Land:

## Contributors

Quentin MOREL :

- @Im-Rises
-

[![GitHub contributors](https://contrib.rocks/image?repo=Im-Rises/page_rank)](https://github.com/Im-Rises/page_rank/graphs/contributors)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/im-rises/page_rank

Awesome Lists containing this project

README