Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/darenasc/eda
https://github.com/darenasc/eda
Last synced: 23 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/darenasc/eda
- Owner: darenasc
- License: mit
- Created: 2023-10-28T20:58:09.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-15T16:26:35.000Z (about 1 month ago)
- Last Synced: 2024-10-16T22:16:04.240Z (about 1 month ago)
- Language: Jupyter Notebook
- Size: 3.14 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Exploratory Data Analysis
- 12.10.2024. Slides CorrelCon 2024 available [here](https://docs.google.com/presentation/d/18An6Y9cGu1lSO2enbFsOwY-xxaPBPh12tw7MJAZluOk/edit?usp=sharing).
- 11.12.2023. Slides DataKindUK SDS available [here](https://docs.google.com/presentation/d/1KWrMbF5Q-gffoc5sXn0B31lyiACdFHiiPWay0MI5Vdw/edit?usp=sharing).
- 11.11.2023. Slides CorrelCon 2023 available [here](https://docs.google.com/presentation/d/1mYtzt5Tfk_xbYSWUBRgiNH5TumJNUWlRVDMeyntee_w/edit?usp=sharing).## How to use this repository
You can clone this repository and use it locally as follows.
```bash
git clone https://github.com/darenasc/eda.git
cd eda
pip install pipenv
pipenv install Pipfile
```Or you can go to the [eda.ipynb](notebooks/eda_dkuk_sds.ipynb) notebook and open it in Colab.
# EDA Mindmap
Mindmap created with [freeplane](https://docs.freeplane.org).
![](eda.png)# Linux commands
Some useful commands for the terminal.```bash
# Explore directories
ls
# Explore content of files
cat
more
less
head
tail
# Count number of lines
wc -l
# Search in files
grep
# Get documentation of commands
man
# Download data or files
wget
curl
# Monitor resources
htop
btop
# Modify files
vim
sed
```# Python libraries
Some python libraries to explore data.
- [pandas](https://github.com/pandas-dev/pandas)
- [ydata-profiling](https://github.com/ydataai/ydata-profiling) (former pandas-profiling)
- [sweetviz](https://github.com/fbdesignpro/sweetviz)
- [pygwalker](https://github.com/Kanaries/pygwalker)
- [facets](https://github.com/pair-code/facets)
- [datapane](https://github.com/datapane/datapane)
- [streamlit](https://github.com/streamlit/streamlit)
- [gradio](https://github.com/gradio-app/gradio)
- [geopandas](https://github.com/geopandas/geopandas)
- [pysal](https://github.com/pysal/pysal)
- [networkx](https://github.com/networkx/networkx)
- [word_cloud](https://github.com/amueller/word_cloud)
- [great_expectations](https://github.com/great-expectations/great_expectations)
- [featuretools](https://github.com/alteryx/featuretools)
- [superset](https://github.com/apache/superset)
- [metabase](https://github.com/metabase/metabase)
- [openml-python](https://github.com/openml/openml-python)# EDA Checklist
- What are the formats?
- Are there files with problems? (can't be opened)
- How many files, tables, databases?
- Per item: How many columns and rows?
- Are there any encoding issues?
- Verify data types of columns: Discrete, Continuous, Dates, GIS, network, other.
- Univariate analysis
- Histogram
- Bar plot
- Boxplot
- Multivariate analysis
- Correlations
- Use target variable to visualize other features