Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/noureldin2303/inverted-index-python

Writing a simple Inverted Index in Python
https://github.com/noureldin2303/inverted-index-python

communityexchange educative github-campus-experts inverted-index learn preprocessing python search-engine student-vscode

Last synced: 2 days ago
JSON representation

Writing a simple Inverted Index in Python

Awesome Lists containing this project

README

        

# Inverted-index-python
Writing a simple Inverted Index in Python

## What is an Inverted Index?

```
The Inverted Index is the data structure used to support full text search over a set of documents.
It is constituted by a big table where there is one entry per word in all the documents processed,
along with a list of the key pairs: document id, frequency of the term in the document.
```

## How does it work?


* Collect the documents to be indexed – I will use simple strings for while;
* Tokenize the text, turning each document into a list of tokens
* Do linguistic preprocessing, producing a list of indexing terms
* Index the documents that each term occurs in by creating an inverted index, consisting of a dictionary and postings.

--------------------------------------------------------------------------------------------------

### For Example:

![inverted index](https://hdscorp--c.na74.content.force.com/servlet/servlet.ImageServer?id=0151J000004MM9EQAW&oid=00Do0000000IJig)