https://github.com/sam9111/bertopic-tamil

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/sam9111/bertopic-tamil
Owner: sam9111
Created: 2023-03-24T17:02:18.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-05-02T20:13:36.000Z (about 2 years ago)
Last Synced: 2025-04-23T00:45:31.659Z (about 1 month ago)
Language: Jupyter Notebook
Size: 8.44 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# C16 - TOPIC MODELING APPROACH TO CONTENT-BASED RECOMMENDER SYSTEM FOR TAMIL NEWS ARTICLES

### Source for Dataset

https://www.kaggle.com/datasets/vijayabhaskar96/tamil-news-classification-dataset-tamilmurasu

### Software and Hardware requirements

1. Minimum Python 3.9 version

2. Anaconda Environment

### Detailed instructions to execute the source code

1. Install all the required Python packages in a conda environment

```sh

pip install -r requirements.txt
```

2. Create a /data folder and place the downloaded dataset inside it
3. Run the preprocessing/tamilmurasu_preprocessing.ipynb notebook to preprocess the dataset
4. To test the Adapted BERTopic model finetuned for this dataset, run BERTopic/BERTopic_final.ipynb
5. To test the comparison between different Tamil word embeddings for BERTopic and LDA, run the files under comparison folder
6. To test the LDA model, first run preprocessing/preprocessing_pipeline.ipynb notebook and then run comparsion/LDA.ipynb
7. To test the recommender system, run recommendation_system/recommender_system_experiments.ipynb

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sam9111/bertopic-tamil

Awesome Lists containing this project

README