https://github.com/ankurgel/search_engine
A search engine for studying crawling, indexing, querying and different page ranking algorithms in Ruby.
https://github.com/ankurgel/search_engine
Last synced: 3 months ago
JSON representation
A search engine for studying crawling, indexing, querying and different page ranking algorithms in Ruby.
- Host: GitHub
- URL: https://github.com/ankurgel/search_engine
- Owner: AnkurGel
- Created: 2014-05-14T09:15:35.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2014-05-15T18:16:08.000Z (about 11 years ago)
- Last Synced: 2025-02-08T04:26:54.969Z (4 months ago)
- Language: Ruby
- Size: 289 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Search Engine
This is a basic web search engine in **Ruby** to study different constituents of search engine architecture -
- Web crawler (currently in progress)
- Index-er.
- Query componentAnother prime motive during the process is to learn more about different page-ranking algorithms such as **PageRank** algorithm, **Frequency Ranking**, **Distance Ranking**, **Location Ranking** etc.
This project also features a small _web appliation_ for better user experience and to visually analyze the ranking algorithms :wink:
The results look something like [this](https://dl.dropboxusercontent.com/u/102071534/scr2/Screenshot%20from%202014-05-12%2004%3A57%3A40.png) and [this](https://dl.dropboxusercontent.com/u/102071534/scr2/Screenshot%20from%202014-05-12%2004%3A17%3A50.png).
## NoteIt is **NOT recommended** for use in any critical program. This project is a learning experiment and is subjected to many bugs.
Feel free to improve the project by contributing with tests, bug-reports and improvements.
## To setup
`bundle install`
In [seeds.yaml](seeds.yaml), setup the initial seeds for crawling+indexing process to begin.
## To run
1. Begin by crawling and indexing of pages by - `ruby spider.rb`
2. After you have substantial data for ranking; compute **PageRank** iteratively by - `ruby pagerank.rb`
3. Now, run our web application by `ruby web.rb`. This will run the local server on http://localhost:4567, by default.
4. Query and analyze! :sunglasses:## Dependencies
- [**Nokogiri**](http://nokogiri.org/) - An HTML, XML parser with awesome ability to traverse documents via XPath and CSS selectors.
- [**DataMapper**](http://datamapper.org) - A robust ORM in Ruby.
- [**Sinatra**](www.sinatrarb.com) - A lightweight framework for quickly creating web applications in Ruby.
- [**SQLite**](http://www.sqlite.org/) - One of the most popular and easily-configurable database engine.
- **Debugger** - The Holy-Grail.## TODO
All TO-DOs are being tracked here - [Issue #1](https://github.com/AnkurGel/search_engine/issues/1).Copyright (c) [Ankur Goel](http://github.com/AnkurGel) & [Nitish Sharma](https://github.com/sharma1nitish).