Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nourmorsy/topic_modelling
https://github.com/nourmorsy/topic_modelling
Last synced: 5 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/nourmorsy/topic_modelling
- Owner: nourmorsy
- Created: 2023-12-15T01:05:41.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2023-12-15T01:06:19.000Z (11 months ago)
- Last Synced: 2023-12-15T02:55:07.015Z (11 months ago)
- Language: Jupyter Notebook
- Size: 104 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Topic Modelling
## Overview
This project applies topic modeling techniques on the ArXiv dataset to uncover hidden thematic structures within academic research papers. Using natural language processing (NLP) and machine learning, we analyze the dataset to categorize papers into topics, allowing for insights into prevailing research areas and trends in scientific literature.---
## Dependencies
To run this project, ensure you have the following dependencies installed:
- Python 3.x
- Jupyter Notebook
- Libraries:
- `pandas`
- `numpy`
- `sklearn`
- `nltk`
- `gensim`
- `matplotlib`
- `seaborn`You can install these dependencies using pip:
```bash
pip install pandas numpy sklearn nltk gensim matplotlib seaborn
```---
## Usage and Files
This project is structured around a Jupyter Notebook for ease of use and reproducibility.- **`topic_modelling.ipynb`**: The primary Jupyter Notebook that contains code for data loading, preprocessing, topic modeling, and visualization. Each section in the notebook guides you through the process step-by-step.
---
## Dataset Used
This project uses the ArXiv dataset, which contains metadata of research papers hosted on ArXiv. The dataset can be found and downloaded from Kaggle: [ArXiv](https://www.kaggle.com/datasets/Cornell-University/arxiv)
## Running the Project
To run this project, follow these steps:
1. **Download the Dataset**: Download the ArXiv dataset (see the link in the Dataset Used section) and place it in the `/data` directory within the project folder.
2. ** Run the project**:
```bash
jupyter notebook topic_modelling.ipynb
```---