https://github.com/nourmorsy/topic_modelling

jupyter-notebook numpy pandas python

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/nourmorsy/topic_modelling
Owner: nourmorsy
Created: 2023-12-15T01:05:41.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-01T22:48:15.000Z (8 months ago)
Last Synced: 2025-01-13T14:31:43.652Z (6 months ago)
Topics: jupyter-notebook, numpy, pandas, python
Language: Jupyter Notebook
Homepage:
Size: 106 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Topic Modelling

## Overview
This project applies topic modeling techniques on the ArXiv dataset to uncover hidden thematic structures within academic research papers. Using natural language processing (NLP) and machine learning, we analyze the dataset to categorize papers into topics, allowing for insights into prevailing research areas and trends in scientific literature.

---

## Dependencies
To run this project, ensure you have the following dependencies installed:
- Python 3.x
- Jupyter Notebook
- Libraries:
- `pandas`
- `numpy`
- `sklearn`
- `nltk`
- `gensim`
- `matplotlib`
- `seaborn`

You can install these dependencies using pip:
```bash
pip install pandas numpy sklearn nltk gensim matplotlib seaborn
```

---

## Usage and Files
This project is structured around a Jupyter Notebook for ease of use and reproducibility.

- **`topic_modelling.ipynb`**: The primary Jupyter Notebook that contains code for data loading, preprocessing, topic modeling, and visualization. Each section in the notebook guides you through the process step-by-step.

---

## Dataset Used

This project uses the ArXiv dataset, which contains metadata of research papers hosted on ArXiv. The dataset can be found and downloaded from Kaggle: [ArXiv](https://www.kaggle.com/datasets/Cornell-University/arxiv)

## Running the Project

To run this project, follow these steps:

1. **Download the Dataset**: Download the ArXiv dataset (see the link in the Dataset Used section) and place it in the `/data` directory within the project folder.
2. ** Run the project**:
```bash
jupyter notebook topic_modelling.ipynb
```

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nourmorsy/topic_modelling

Awesome Lists containing this project

README