https://github.com/sam9111/bertopic-tamil
https://github.com/sam9111/bertopic-tamil
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/sam9111/bertopic-tamil
- Owner: sam9111
- Created: 2023-03-24T17:02:18.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-02T20:13:36.000Z (about 2 years ago)
- Last Synced: 2025-04-23T00:45:31.659Z (about 1 month ago)
- Language: Jupyter Notebook
- Size: 8.44 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# C16 - TOPIC MODELING APPROACH TO CONTENT-BASED RECOMMENDER SYSTEM FOR TAMIL NEWS ARTICLES
### Source for Dataset
https://www.kaggle.com/datasets/vijayabhaskar96/tamil-news-classification-dataset-tamilmurasu
### Software and Hardware requirements
1. Minimum Python 3.9 version
2. Anaconda Environment
### Detailed instructions to execute the source code
1. Install all the required Python packages in a conda environment
```sh
pip install -r requirements.txt
```2. Create a /data folder and place the downloaded dataset inside it
3. Run the preprocessing/tamilmurasu_preprocessing.ipynb notebook to preprocess the dataset
4. To test the Adapted BERTopic model finetuned for this dataset, run BERTopic/BERTopic_final.ipynb
5. To test the comparison between different Tamil word embeddings for BERTopic and LDA, run the files under comparison folder
6. To test the LDA model, first run preprocessing/preprocessing_pipeline.ipynb notebook and then run comparsion/LDA.ipynb
7. To test the recommender system, run recommendation_system/recommender_system_experiments.ipynb