An open API service indexing awesome lists of open source software.

https://github.com/bydmiller/recommender_system

This repository contains an implementation of a text document recommender system using Python. The system recommends similar documents based on vector representations and similarity calculations.
https://github.com/bydmiller/recommender_system

Last synced: 2 months ago
JSON representation

This repository contains an implementation of a text document recommender system using Python. The system recommends similar documents based on vector representations and similarity calculations.

Awesome Lists containing this project

README

          



🚀 Recommender System


This repository contains an implementation of a text document recommender system using Python. The system recommends similar documents based on vector representations and similarity calculations.
Overview

Recommender systems such as this are a core application of statistical AI. At the heart of recommender systems is a similarity calculation. In this implementation, we use vector representations of documents and a document similarity calculation to recommend relevant articles to users.

# Features

- Load text documents from various datasets (e.g., BBC news articles, scientific abstracts, Wikipedia articles).
- Preprocess text data to remove stopwords and perform lemmatization.
- Vectorize documents using TF-IDF vectorization.
- Calculate document similarity using cosine similarity.
- Generate recommendations based on the most similar documents to a selected document, while also including some less similar documents to provide diversity.
- Avoid recommending the same document or documents with the same title as the selected document.

# Usage

Clone the repository:
```bash

git clone https://github.com/your_username/text-document-recommender.git
```

Run the recommender system:
```bash
python recommender.py
```

# Dataset
The datasets used in this project include:

- BBC news articles
- Scientific abstracts
- Wikipedia articles

These datasets have been adapted for this task.

# License

This project is licensed under the MIT License. See the [LICENSE file](./LICENSE) for details.