Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nourmorsy/topic_modelling


https://github.com/nourmorsy/topic_modelling

Last synced: 5 days ago
JSON representation

Awesome Lists containing this project

README

        

# Topic Modelling

## Overview
This project applies topic modeling techniques on the ArXiv dataset to uncover hidden thematic structures within academic research papers. Using natural language processing (NLP) and machine learning, we analyze the dataset to categorize papers into topics, allowing for insights into prevailing research areas and trends in scientific literature.

---

## Dependencies
To run this project, ensure you have the following dependencies installed:
- Python 3.x
- Jupyter Notebook
- Libraries:
- `pandas`
- `numpy`
- `sklearn`
- `nltk`
- `gensim`
- `matplotlib`
- `seaborn`

You can install these dependencies using pip:
```bash
pip install pandas numpy sklearn nltk gensim matplotlib seaborn
```

---

## Usage and Files
This project is structured around a Jupyter Notebook for ease of use and reproducibility.

- **`topic_modelling.ipynb`**: The primary Jupyter Notebook that contains code for data loading, preprocessing, topic modeling, and visualization. Each section in the notebook guides you through the process step-by-step.

---

## Dataset Used

This project uses the ArXiv dataset, which contains metadata of research papers hosted on ArXiv. The dataset can be found and downloaded from Kaggle: [ArXiv](https://www.kaggle.com/datasets/Cornell-University/arxiv)

## Running the Project

To run this project, follow these steps:

1. **Download the Dataset**: Download the ArXiv dataset (see the link in the Dataset Used section) and place it in the `/data` directory within the project folder.
2. ** Run the project**:
```bash
jupyter notebook topic_modelling.ipynb
```

---