Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/theultrabadduck/dinoanalysis

Dino who tries to understand their future
https://github.com/theultrabadduck/dinoanalysis

data-analysis data-science datascience jupyter jupyter-notebook jupyter-notebooks jupyterlab python python3

Last synced: 9 days ago
JSON representation

Dino who tries to understand their future

Awesome Lists containing this project

README

        

# Dino analysis.

___

What’s as big as a dinosaur but weighs nothing?

*A dinosaur’s shadow.*

___

## 1) What is this repo for?

This is the analysis report for several datasets I found. It contains many problems, solutions and improvements. It may have some typos or errors, but this is the best I want to send my learning to everyone.

## 2) Project frame:

- A data problem: This shows what interesting the dataset is, what each feature means and some modification if necessary (filling null values, etc.).
- Analysis: This illustrates the distribution of each feature, and the relationship between the features. This also consists of data modification (transforming into a normal distribution, scaler, outlier treatment, etc.).
- Modeling: This will run the models. In most cases, I use Linear Regression, Logistic Regression, K-Nearest Neighbours, Decision Tree and Random Forest. If the model becomes more complex, I will consider using Neural Network.

## 3) Who can read and use the codes?

- Everyone can. Credit is not necessary but appreciated.
- However, you may need to have a base knowledge of data analyst or data science to understand the projects.

## 4) Problems mentioned:

### Topic problems:

| Problem | Solved | Note |
|-----------------------------|--------|----------------------------|
| General Data Regression | ✓ | In Boston Housing Analysis |
| General Data Classification | ✓ | In Palmer Penguin Analysis (pretty simple) |
| Clustering | | |
| Anomaly Detection | | |
| Time series | | |
| Recommendation System | | Coming soon |
| Text classification | | |

### Data problems:

| Problem | Solved | Note |
|-----------------------------|--------|----------------------------|
| Null Treatment | | Only dropping null |
| Transforming Into Normal | ✓ | |
| Outlier Treatment | ✓ | |
| Data Imbalance | ✓ | Using SMOTE |
| Data Scaling | ✓ | Using Standard Scaler |
| K-fold Cross Validation | ✓ | |

### Model problems:

| Problem | Solved | Note |
|-----------------------------|--------|----------------------------|
| Linear Regression | ✓ | |
| Logistic Regression | ✓ | Can handle multi-class label |
| K-Nearest Neighbours | ✓ | |
| Decision Tree | ✓ | |
| Random Forest | ✓ | |
| Support Vector Machine | | |
| Naive Bayes | | |
| K-Means Clustering | | |
| Principal Component Analysis| | |
| DBSCAN | | |
| Apriori | | |
| Matrix Factorisation | | Coming soon |
| ARIMA and its alternatives | | |
| Neural Network | | |
| ... | | |