Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ugly-custard/codsoft

Project-based internship projects
https://github.com/ugly-custard/codsoft

Last synced: 2 months ago
JSON representation

Project-based internship projects

Host: GitHub
URL: https://github.com/ugly-custard/codsoft
Owner: ugly-custard
Created: 2023-09-07T12:39:28.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-09-07T12:46:31.000Z (over 1 year ago)
Last Synced: 2023-10-02T11:31:11.860Z (over 1 year ago)
Language: Jupyter Notebook
Size: 252 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Codsoft internship projects

This repo lists the projects I worked on while doing the internship for Codsoft.

The projects are:
- Customer Churn Prediction
- Movie Genre Classification
- SMS Spam Detection

### Customer Churn Prediction
The Dataset used: https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction

In this project, using the powerful Random Forest Classifier, we leveraged
historical data like usage behavior and demographics of customers to predict
customer churn.
Initially, the model achieved an impressive 0.87 accuracy. However, when
addressing class imbalance via random under-sampling, accuracy dipped to 0.76,
highlighting the balance challenge. In contrast, random over-sampling boosted
accuracy to a remarkable 0.94!
The feature analysis identified Age as the top predictor, followed by Balance,
Estimated Salary, and Credit Score, in that order. These insights underscore
their pivotal roles in predicting customer churn.
In summary, the analysis of customer churn prediction yielded valuable insights.
The choice between under-sampling and over-sampling depends on the balance-
accuracy trade-off, with these key features playing crucial roles in customer
retention analysis.

### Movie Genre Classification
The Dataset used: https://www.kaggle.com/datasets/hijest/genre-classification-dataset-imdb

In this project, I compared three popular classifiers – SGD, Multinomial NB,
and Logistic Regression – to classify movies into 27 different genres based
on the movie synopsis.
The results show that Logistic Regression outperforms the other models for
this task. With an accuracy of 0.60, precision at 0.58, recall of 0.60, and
a solid F1-score of 0.56, it consistently showcases balanced performance across
various metrics.
This means the Logistic Regression model effectively handles the complexities
of the data, making it the optimal choice for this multi-class classification
task.

### SMS Spam Detection
The Dataset used: https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset

In this project, I put the spotlight on two trusty classifiers – SGD and
Multinomial NB – to tackle the task of identifying spam messages with precision.
The testing and evaluation reveal the following verdict: both the SGD Classifier
and Multinomial NB models delivered exceptional performance! With identical high
scores of 0.96 for accuracy, precision, recall, and F1-score, it's safe to say
that these models are equally well-equipped for the task.
In summary, the SGD Classifier and Multinomial NB models stand shoulder to
shoulder, showcasing excellent classification capabilities and proving their
mettle in SMS spam detection.

These are projects I have worked on for the project-based Machine Learning Internship by Codsoft.