Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/manel15279/intelligent-information-retrieval-system
An implementation of an Information Retrieval System that indexes documents and matches queries using vector space models (Jaccard, Cosine, Scalar), boolean model, and probabilistic model (BM25).
https://github.com/manel15279/intelligent-information-retrieval-system
information-retrieval ntlk pyqt5 python
Last synced: about 1 month ago
JSON representation
An implementation of an Information Retrieval System that indexes documents and matches queries using vector space models (Jaccard, Cosine, Scalar), boolean model, and probabilistic model (BM25).
- Host: GitHub
- URL: https://github.com/manel15279/intelligent-information-retrieval-system
- Owner: manel15279
- Created: 2023-12-30T01:13:46.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-02-16T03:33:22.000Z (10 months ago)
- Last Synced: 2024-02-16T05:30:23.473Z (10 months ago)
- Topics: information-retrieval, ntlk, pyqt5, python
- Language: Python
- Homepage:
- Size: 15.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Information Retrieval System: Indexing and Query Matching
## Overview
This project implements an Information Retrieval System (IRS) that indexes documents and matches queries using various retrieval models. It aims to apply concepts learned in information retrieval courses, utilizing the LISA dataset for testing.
## Features
- **Indexing**: Implement algorithms for extracting terms, removing stopwords, and normalizing terms in documents using NLTK. Create descriptor and inverse files to facilitate retrieval.
- **Query Matching**: Implement retrieval models such as scalar product, cosine measure, Jaccard measure, boolean models (AND, OR, NOT), and BM25 probabilistic model.
- **Evaluation**: Compare retrieval models based on average precision, P@5, P@10, recall, F-measure, and plot precision-recall curves.## Usage
1. **Clone Repository**: Clone the repository to your local machine.
2. **Install Dependencies**: Install required dependencies using `pip install -r requirements.txt`.
3. **Prepare Data**: Obtain the LISA dataset from the University of Glasgow website and concatenate the files.
4. **Execute Application**: Run the app.py file to launch the application `python main.py`.
5. **Interact with GUI**: Use the graphical user interface to perform indexing, query research, query matching, and evaluation.
6. **View Results**: Evaluate the performance of different retrieval models and visualize precision-recall curves.