https://github.com/githubasr2001/ramayanakandas
https://github.com/githubasr2001/ramayanakandas
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/githubasr2001/ramayanakandas
- Owner: githubasr2001
- Created: 2024-06-02T03:21:20.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-02T03:24:45.000Z (over 1 year ago)
- Last Synced: 2025-03-31T05:34:48.657Z (10 months ago)
- Language: Jupyter Notebook
- Size: 585 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Project: Analysis of Ramayana Kandas
### Overview
This project involves a comprehensive analysis of five datasets representing different sections (Kandas) of the ancient Indian epic, the Ramayana. Each dataset contains content in Sanskrit and its corresponding explanation in English. The analyses performed include word frequency analysis, sentiment analysis, translation quality analysis, keyword extraction, and topic modeling. Additionally, a classification task is carried out to distinguish between short and long English explanations.
### Dataset Description
Each dataset consists of two main columns:
- **Content**: Contains the text in Sanskrit.
- **Explanation**: Contains the English translation or explanation of the corresponding Sanskrit text.
The datasets used in this project are:
- aranyakanda.csv
- ayodhyakand.csv
- balakanda.csv
- kishkindakanda.csv
- sundarakanda.csv
### Analyses Performed
#### Word Frequency Analysis
The word frequency analysis identifies the 20 most common words in both Sanskrit and English texts, providing insights into the recurring themes and terminologies.
#### Sentiment Analysis
Sentiment analysis is conducted on the English explanations to determine the polarity (positive/negative sentiment) and subjectivity (objective/subjective nature) of the texts. This helps in understanding the emotional tone conveyed in the explanations.
#### Translation Quality Analysis
Translation quality analysis compares the length of Sanskrit sentences with their English translations to evaluate consistency. Scatter plots visualize the relationship between the lengths of the original and translated texts.
#### Keyword Extraction and Topic Modeling
Keyword extraction uses TF-IDF vectorization to identify the top 20 keywords in the English explanations. Topic modeling, using Latent Dirichlet Allocation (LDA), discovers underlying topics in the text data.
### Classification Task
The classification task aims to distinguish between short and long English explanations. Various machine learning models were used, including:
- Logistic Regression
- Naive Bayes
- Support Vector Machine (SVM)
- Random Forest
- Gradient Boosting
#### Results of the Classification Task
The classification task yielded the following results:
- **Logistic Regression**: Best hyperparameters were C: 10 and solver: lbfgs, achieving an accuracy of 0.92.
- **Naive Bayes**: Best hyperparameter was alpha: 0.1, achieving an accuracy of 0.88.
- **Support Vector Machine (SVM)**: Best hyperparameters were C: 1 and kernel: linear, achieving an accuracy of 0.89.
- **Random Forest**: Best hyperparameters were n_estimators: 100, max_depth: None, and min_samples_split: 2, achieving an accuracy of 0.91.
- **Gradient Boosting**: Best hyperparameters were n_estimators: 100, learning_rate: 0.1, and max_depth: 3, achieving an accuracy of 0.90.
### How to Run the Project
1. **Clone the Repository**: Clone the project repository to your local machine.
```bash
git clone
cd
```
2. **Install Dependencies**: Install the required Python packages.
```bash
pip install pandas numpy matplotlib seaborn scikit-learn textblob
```
### Conclusion
This project integrates classical literature with modern data analysis techniques, offering insights into the linguistic characteristics, emotional tones, translation quality, and underlying topics in the Ramayana Kandas. The classification task demonstrates the applicability of machine learning models in distinguishing between short and long English explanations.