Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/velocitatem/modeling_search_engines
This repository contains the code and resources for understanding and implementing search engine algorithms, with a specific focus on Google's PageRank.
https://github.com/velocitatem/modeling_search_engines
google pagerank search-engine
Last synced: about 1 month ago
JSON representation
This repository contains the code and resources for understanding and implementing search engine algorithms, with a specific focus on Google's PageRank.
- Host: GitHub
- URL: https://github.com/velocitatem/modeling_search_engines
- Owner: velocitatem
- Created: 2024-04-06T15:26:47.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-05-16T12:01:06.000Z (7 months ago)
- Last Synced: 2024-05-17T12:30:10.539Z (7 months ago)
- Topics: google, pagerank, search-engine
- Language: Jupyter Notebook
- Homepage:
- Size: 2.01 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Modeling Search Engines with Linear Algebra
![Contributors](https://img.shields.io/badge/contributors-5-brightgreen.svg)
[![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/velocitatem/modeling_search_engines/blob/main/engine.ipynb)Welcome to our project on **Modeling Search Engines Through Linear Algebra**! This repository contains the code and resources for understanding and implementing search engine algorithms, with a specific focus on Google's PageRank.
## π Abstract
In an increasingly interconnected world, understanding networks is key to designing and improving technology. This project explores the mathematics behind search engines and leverages linear algebra to model user behavior on the web. We focus on the PageRank algorithm, which evaluates the structure of web pages to determine their importance, offering a more reliable way of ranking web pages than traditional keyword-based methods.## π Contents
- **Code**: Jupyter notebook containing code for implementing and visualizing PageRank.## π Getting Started
### Prerequisites
Ensure you have the following installed:
- Python 3.11
- Jupyter Notebook
- Required Python packages: `numpy`, `matplotlib`, `networkx`### Installation
Clone this repository:
```bash
git clone https://github.com/velocitatem/modeling_search_engines
```
Navigate to the project directory:
```bash
cd Modeling-Search-Engines
```
Install the required packages:
```bash
pip install -r requirements.txt
```### Running the Code
To explore the code and run the PageRank algorithm:
1. Open the Jupyter notebook:
```bash
jupyter notebook engine.ipynb
```
2. Follow the instructions within the notebook to execute the cells and visualize the results.## π Features
- **PageRank Implementation**: Understand and implement the PageRank algorithm using linear algebra.
- **Network Graphs**: Visualize web structures as directed graphs.
- **Interactive Visualization**: Explore an interactive network graph through a simple web interface.## π¨ Visualization
Our interactive network graph can be accessed [here](https://662a217970a974846a9569ac--magical-figolla-a3f256.netlify.app/). This graph showcases the interconnectedness of web pages and highlights the significance of each node based on PageRank.## π Theory & Formulas
The project dives into the theory behind PageRank, including:
- Adjacency matrices and transition matrices
- Probability transition and eigenvectors
- Handling sinks and infinite loops with damping factors## π Further Exploration
The project also compares PageRank with modern search algorithms such as semantic search and personalized search engines, highlighting the strengths and limitations of each.## π Contributors
- Daniel Alves RΓΆsel
- Isabel de Valenzuela
- Jaskaran Singh Ghai
- Aswin Subramanian Maheswaran
- Anna Payne