Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/theultrabadduck/dinoanalysis
Dino who tries to understand their future
https://github.com/theultrabadduck/dinoanalysis
data-analysis data-science datascience jupyter jupyter-notebook jupyter-notebooks jupyterlab python python3
Last synced: 9 days ago
JSON representation
Dino who tries to understand their future
- Host: GitHub
- URL: https://github.com/theultrabadduck/dinoanalysis
- Owner: TheUltraBadDuck
- Created: 2024-03-03T07:42:35.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-03-08T17:30:57.000Z (8 months ago)
- Last Synced: 2024-03-09T13:54:17.305Z (8 months ago)
- Topics: data-analysis, data-science, datascience, jupyter, jupyter-notebook, jupyter-notebooks, jupyterlab, python, python3
- Language: Jupyter Notebook
- Homepage:
- Size: 3.01 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Dino analysis.
___
What’s as big as a dinosaur but weighs nothing?
*A dinosaur’s shadow.*
___
## 1) What is this repo for?
This is the analysis report for several datasets I found. It contains many problems, solutions and improvements. It may have some typos or errors, but this is the best I want to send my learning to everyone.
## 2) Project frame:
- A data problem: This shows what interesting the dataset is, what each feature means and some modification if necessary (filling null values, etc.).
- Analysis: This illustrates the distribution of each feature, and the relationship between the features. This also consists of data modification (transforming into a normal distribution, scaler, outlier treatment, etc.).
- Modeling: This will run the models. In most cases, I use Linear Regression, Logistic Regression, K-Nearest Neighbours, Decision Tree and Random Forest. If the model becomes more complex, I will consider using Neural Network.## 3) Who can read and use the codes?
- Everyone can. Credit is not necessary but appreciated.
- However, you may need to have a base knowledge of data analyst or data science to understand the projects.## 4) Problems mentioned:
### Topic problems:
| Problem | Solved | Note |
|-----------------------------|--------|----------------------------|
| General Data Regression | ✓ | In Boston Housing Analysis |
| General Data Classification | ✓ | In Palmer Penguin Analysis (pretty simple) |
| Clustering | | |
| Anomaly Detection | | |
| Time series | | |
| Recommendation System | | Coming soon |
| Text classification | | |### Data problems:
| Problem | Solved | Note |
|-----------------------------|--------|----------------------------|
| Null Treatment | | Only dropping null |
| Transforming Into Normal | ✓ | |
| Outlier Treatment | ✓ | |
| Data Imbalance | ✓ | Using SMOTE |
| Data Scaling | ✓ | Using Standard Scaler |
| K-fold Cross Validation | ✓ | |### Model problems:
| Problem | Solved | Note |
|-----------------------------|--------|----------------------------|
| Linear Regression | ✓ | |
| Logistic Regression | ✓ | Can handle multi-class label |
| K-Nearest Neighbours | ✓ | |
| Decision Tree | ✓ | |
| Random Forest | ✓ | |
| Support Vector Machine | | |
| Naive Bayes | | |
| K-Means Clustering | | |
| Principal Component Analysis| | |
| DBSCAN | | |
| Apriori | | |
| Matrix Factorisation | | Coming soon |
| ARIMA and its alternatives | | |
| Neural Network | | |
| ... | | |