https://github.com/krisluczka/osse

Open Source Search Engine with built-in web/document crawler and an indexing method.
https://github.com/krisluczka/osse

cpp document-indexing document-search document-searching indexing-engine search-engine web-crawler web-crawling web-indexer web-indexing

Last synced: about 1 month ago
JSON representation

Open Source Search Engine with built-in web/document crawler and an indexing method.

Host: GitHub
URL: https://github.com/krisluczka/osse
Owner: krisluczka
License: mit
Created: 2024-04-30T20:09:37.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2024-05-04T18:33:39.000Z (about 1 year ago)
Last Synced: 2025-04-15T08:59:10.119Z (about 1 month ago)
Topics: cpp, document-indexing, document-search, document-searching, indexing-engine, search-engine, web-crawler, web-crawling, web-indexer, web-indexing
Language: C++
Homepage:
Size: 58.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE.txt

Awesome Lists containing this project

README

# Open Source Search Engine

## Search engine
Due to reverse-indexing, the sites are stored in such way, that the keywords are used as keys. Thus, whenever you type a word, the search engine will spit every indexed by this word site.

## Web crawler
Crawler in order to see the web will divide the work into threads. Then it goes through every link that will be encountered in a given document, and will estimate it's keywords. It stores them by a method called 'reverse-index' (where keyword functions as a key and the values are URL's). This makes it significantly faster than normal indexing, because search engines are using keywords in order to search for websites.
Currently there is no algorithm for ranking pages.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/krisluczka/osse

Awesome Lists containing this project

README