https://github.com/maxhalford/inverted-index-search-engine
https://github.com/maxhalford/inverted-index-search-engine
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/maxhalford/inverted-index-search-engine
- Owner: MaxHalford
- Created: 2015-01-28T13:45:53.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2015-05-15T14:36:50.000Z (over 10 years ago)
- Last Synced: 2025-03-29T02:21:49.492Z (10 months ago)
- Language: Python
- Size: 652 KB
- Stars: 5
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Inverted index
## Theory
An inverted index is a data structure that indicates which documents contain a given word. In other words the inverted index contains the frequencies of each words in each document.
The first step is to index a collection of documents to build the inverted index. Once the inverted index is stored, queries can be parsed and compared to each document to return pertinent results.
In this case I simply computed the base logarithm of the summed up frequencies of each word of the query. However more sophisticated models exist (this wasn't the point of this project). Check out the [Perl search engine](https://github.com/MaxHalford/Wikisid) I coded for more advanced scoring methods.
## Database
The inverted indexs are stored as JSON files with MongoDB. Sadly if you want this run you will have to install MongoDB on your computer. You don't have to know *anything* about MongoDB to run this yourself. However if you are interested in MongoDB please check out the [IPython notebook tutorial](http://maxhalford.com/resources/notebooks/genetic-algorithms) I wrote.
## Interface
For the interface I used Qt. There is a very useful tool called QtDesigner where you can design your interface with a point-and-click interface. Once saved this produces a .ui file which you can convert to python with the ``uiToPython.sh`` script. This produces a python file which contains a class (normally called ``Ui_MainWindow``) with which you will have to work. The philosophy is that you connect interfaces actions (button clicks, scrolling, etc.) with python function.
## Example
As an example I parsed the New Testament. The following image is a query example with the list of pertinent documents.

## Running it yourself
- Install MongoDB, Python3, Qt.
- Run ``Storage.py`` to store the files in ``data/New Testament`` in MongoDB.
- Run main.py.
- Profit!
>>>>>>> dcb01928e071379626fd6345f7d3128450edc979