https://github.com/keerthanapalanikumar/prodigy-infotech

This repository contains data science projects from my Prodigy Infotech internship, including data visualization, cleaning and EDA on the Titanic dataset, a decision tree classifier for the Bank Marketing dataset, and Twitter sentiment analysis.
https://github.com/keerthanapalanikumar/prodigy-infotech

data-cleaning-and-eda data-visualization decision-tree-classifier sentiment-analysis

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/keerthanapalanikumar/prodigy-infotech
Owner: KeerthanaPalanikumar
Created: 2024-06-16T04:39:53.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-16T05:22:56.000Z (over 1 year ago)
Last Synced: 2025-05-25T14:46:06.597Z (8 months ago)
Topics: data-cleaning-and-eda, data-visualization, decision-tree-classifier, sentiment-analysis
Language: Jupyter Notebook
Homepage:
Size: 4.59 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Internship ReadMe: Prodigy Infotech Data Science Tasks

## Overview:-
This repository contains the tasks completed during my internship at Prodigy Infotech. Each task demonstrates a different aspect of data science, including data visualization, data cleaning, exploratory data analysis, machine learning, and sentiment analysis. The tasks use various datasets to showcase different techniques and methods commonly used in data science projects.

# Task-01: Data Visualization

## Objective:-
Created a histogram to visualize the distribution of a categorical or continuous variable, such as the distribution of ages or genders in a population.
## Dataset:-
World Bank Population Data: https://data.worldbank.org/indicator/SP.POP.TOTL
## Description:-
* Loaded the population data from the World Bank.
* Processed the data to extract the relevant categorical or continuous variable.
* Created a bar chart or histogram to visualize the distribution.
* Used Python libraries such as pandas for data manipulation and matplotlib/seaborn for visualization.

# Task-02: Data Cleaning and Exploratory Data Analysis (EDA)

## Objective:-
Perform data cleaning and exploratory data analysis on a dataset to explore relationships between variables and identify patterns and trends.
## Dataset:-
Titanic Dataset from Kaggle: https://www.kaggle.com/c/titanic/data
## Description:-
* Loaded the Titanic dataset.
* Cleaned the data by handling missing values, encoding categorical variables, and normalizing numerical variables.
* Conducted EDA to explore the relationships between variables and identify patterns and trends.
* Visualized the data using various plots (e.g., scatter plots, box plots, heatmaps).

# Task-03: Decision Tree Classifier

## Objective:-
Build a decision tree classifier to predict whether a customer will purchase a product or service based on their demographic and behavioral data.
## Dataset:-
Bank Marketing Dataset from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
## Description:-
* Loaded the Bank Marketing dataset.
* Preprocessed the data by encoding categorical variables and splitting the data into training and test sets.
* Built a decision tree classifier using scikit-learn.
* Evaluated the classifier's performance using metrics such as accuracy, precision, recall, and F1-score.

# Task-04: Sentiment Analysis
## Objective:-
Analyze and visualize sentiment patterns in social media data to understand public opinion and attitudes towards specific topics or brands.
## Dataset:-
Twitter Entity Sentiment Analysis Dataset from Kaggle: https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis
## Description:-
* Loaded the Twitter sentiment analysis dataset.
* Preprocessed the data by cleaning text, tokenizing, and vectorizing.
* Analyzed sentiment patterns using natural language processing techniques.
* Visualized the sentiment distribution and identified key trends.

# Requirements
To run the scripts and reproduce the results, the following Python libraries are required:

* Pandas
* NumPy
* Matplotlib
* Seaborn
* Scikit-learn

# Conclusion
This repository showcases my data science skills through various tasks involving data visualization, cleaning, exploratory analysis, machine learning, and sentiment analysis. Each task demonstrates my ability to work with different datasets and apply appropriate techniques to extract meaningful insights.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/keerthanapalanikumar/prodigy-infotech

Awesome Lists containing this project

README