Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/noureldin2303/inverted-index-python
Writing a simple Inverted Index in Python
https://github.com/noureldin2303/inverted-index-python
communityexchange educative github-campus-experts inverted-index learn preprocessing python search-engine student-vscode
Last synced: 2 days ago
JSON representation
Writing a simple Inverted Index in Python
- Host: GitHub
- URL: https://github.com/noureldin2303/inverted-index-python
- Owner: Noureldin2303
- License: mit
- Created: 2023-03-15T15:18:56.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-04-07T14:05:18.000Z (almost 2 years ago)
- Last Synced: 2024-01-26T10:42:18.695Z (about 1 year ago)
- Topics: communityexchange, educative, github-campus-experts, inverted-index, learn, preprocessing, python, search-engine, student-vscode
- Language: Jupyter Notebook
- Homepage:
- Size: 12.7 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# Inverted-index-python
Writing a simple Inverted Index in Python## What is an Inverted Index?
```
The Inverted Index is the data structure used to support full text search over a set of documents.
It is constituted by a big table where there is one entry per word in all the documents processed,
along with a list of the key pairs: document id, frequency of the term in the document.
```## How does it work?
* Collect the documents to be indexed – I will use simple strings for while;
* Tokenize the text, turning each document into a list of tokens
* Do linguistic preprocessing, producing a list of indexing terms
* Index the documents that each term occurs in by creating an inverted index, consisting of a dictionary and postings.--------------------------------------------------------------------------------------------------
### For Example:
![inverted index](https://hdscorp--c.na74.content.force.com/servlet/servlet.ImageServer?id=0151J000004MM9EQAW&oid=00Do0000000IJig)