Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/manel15279/intelligent-information-retrieval-system

An implementation of an Information Retrieval System that indexes documents and matches queries using vector space models (Jaccard, Cosine, Scalar), boolean model, and probabilistic model (BM25).
https://github.com/manel15279/intelligent-information-retrieval-system

information-retrieval ntlk pyqt5 python

Last synced: about 1 month ago
JSON representation

An implementation of an Information Retrieval System that indexes documents and matches queries using vector space models (Jaccard, Cosine, Scalar), boolean model, and probabilistic model (BM25).

Awesome Lists containing this project

README

        

# Information Retrieval System: Indexing and Query Matching

## Overview

This project implements an Information Retrieval System (IRS) that indexes documents and matches queries using various retrieval models. It aims to apply concepts learned in information retrieval courses, utilizing the LISA dataset for testing.

## Features

- **Indexing**: Implement algorithms for extracting terms, removing stopwords, and normalizing terms in documents using NLTK. Create descriptor and inverse files to facilitate retrieval.
- **Query Matching**: Implement retrieval models such as scalar product, cosine measure, Jaccard measure, boolean models (AND, OR, NOT), and BM25 probabilistic model.
- **Evaluation**: Compare retrieval models based on average precision, P@5, P@10, recall, F-measure, and plot precision-recall curves.

## Usage

1. **Clone Repository**: Clone the repository to your local machine.
2. **Install Dependencies**: Install required dependencies using `pip install -r requirements.txt`.
3. **Prepare Data**: Obtain the LISA dataset from the University of Glasgow website and concatenate the files.
4. **Execute Application**: Run the app.py file to launch the application `python main.py`.
5. **Interact with GUI**: Use the graphical user interface to perform indexing, query research, query matching, and evaluation.
6. **View Results**: Evaluate the performance of different retrieval models and visualize precision-recall curves.