Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/theultrabadduck/dinoanalysis

Dino who tries to understand their future
https://github.com/theultrabadduck/dinoanalysis

data-analysis data-science datascience jupyter jupyter-notebook jupyter-notebooks jupyterlab python python3

Last synced: about 2 months ago
JSON representation

Dino who tries to understand their future

Host: GitHub
URL: https://github.com/theultrabadduck/dinoanalysis
Owner: TheUltraBadDuck
Created: 2024-03-03T07:42:35.000Z (12 months ago)
Default Branch: main
Last Pushed: 2024-03-08T17:30:57.000Z (12 months ago)
Last Synced: 2024-11-08T19:19:15.580Z (3 months ago)
Topics: data-analysis, data-science, datascience, jupyter, jupyter-notebook, jupyter-notebooks, jupyterlab, python, python3
Language: Jupyter Notebook
Homepage:
Size: 3.01 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Dino analysis.

___

What’s as big as a dinosaur but weighs nothing?

*A dinosaur’s shadow.*

___

## 1) What is this repo for?

This is the analysis report for several datasets I found. It contains many problems, solutions and improvements. It may have some typos or errors, but this is the best I want to send my learning to everyone.

## 2) Project frame:

- A data problem: This shows what interesting the dataset is, what each feature means and some modification if necessary (filling null values, etc.).

- Analysis: This illustrates the distribution of each feature, and the relationship between the features. This also consists of data modification (transforming into a normal distribution, scaler, outlier treatment, etc.).

- Modeling: This will run the models. In most cases, I use Linear Regression, Logistic Regression, K-Nearest Neighbours, Decision Tree and Random Forest. If the model becomes more complex, I will consider using Neural Network.

## 3) Who can read and use the codes?

- Everyone can. Credit is not necessary but appreciated.

- However, you may need to have a base knowledge of data analyst or data science to understand the projects.

## 4) Problems mentioned:

### Topic problems:

| Problem                     | Solved | Note                       |

|-----------------------------|--------|----------------------------|

| General Data Regression     | ✓      | In Boston Housing Analysis |

| General Data Classification | ✓      | In Palmer Penguin Analysis (pretty simple) |

| Clustering                  |        |                            |

| Anomaly Detection           |        |                            |

| Time series                 |        |                            |

| Recommendation System       |        | Coming soon                |

| Text classification         |        |                            |

### Data problems:

| Problem                     | Solved | Note                       |

|-----------------------------|--------|----------------------------|

| Null Treatment              |        | Only dropping null         |

| Transforming Into Normal    | ✓      |                            |

| Outlier Treatment           | ✓      |                            |

| Data Imbalance              | ✓      | Using SMOTE                |

| Data Scaling                | ✓      | Using Standard Scaler      |

| K-fold Cross Validation     | ✓      |                            |

### Model problems:

| Problem                     | Solved | Note                       |

|-----------------------------|--------|----------------------------|

| Linear Regression           | ✓      |                            |

| Logistic Regression         | ✓      | Can handle multi-class label |

| K-Nearest Neighbours        | ✓      |                            |

| Decision Tree               | ✓      |                            |

| Random Forest               | ✓      |                            |

| Support Vector Machine      |        |                            |

| Naive Bayes                 |        |                            |

| K-Means Clustering          |        |                            |

| Principal Component Analysis|        |                            |

| DBSCAN                      |        |                            |

| Apriori                     |        |                            |

| Matrix Factorisation        |        | Coming soon                |

| ARIMA and its alternatives  |        |                            |

| Neural Network              |        |                            |

| ...                         |        |                            |