Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/amueller/dabl
Data Analysis Baseline Library
https://github.com/amueller/dabl
Last synced: 3 months ago
JSON representation
Data Analysis Baseline Library
- Host: GitHub
- URL: https://github.com/amueller/dabl
- Owner: amueller
- License: bsd-3-clause
- Fork: true (dabl/dabl)
- Created: 2020-01-30T18:26:49.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2024-01-09T19:00:03.000Z (10 months ago)
- Last Synced: 2024-08-05T18:32:06.529Z (3 months ago)
- Language: Jupyter Notebook
- Homepage: https://dabl.github.io/
- Size: 115 MB
- Stars: 130
- Watchers: 5
- Forks: 9
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dabl
[![CI](https://github.com/dabl/dabl/actions/workflows/ci.yml/badge.svg)](https://github.com/dabl/dabl/actions/workflows/ci.yml)
The data analysis baseline library.
- "Mr Sanchez, are you a data scientist?"
- "I dabl, Mr president."Find more information on the [website](https://dabl.github.io/).
## Try it out
```
pip install dabl
```or [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dabl/dabl/main)
## Current scope and upcoming features
This library is very much still under development. Current code focuses mostly on exploratory visualization and preprocessing.
There are also drop-in replacements for GridSearchCV and RandomizedSearchCV using successive halfing.
There are preliminary portfolios in the style of
[POSH
auto-sklearn](https://ml.informatik.uni-freiburg.de/papers/18-AUTOML-AutoChallenge.pdf)
to find strong models quickly. In essence that boils down to a quick search
over different gradient boosting models and other tree ensembles and
potentially kernel methods.Check out the [the website](https://dabl.github.io/dev/) and [example gallery](https://dabl.github.io/0.1.9/auto_examples/index.html) to get an idea of the visualizations that are available.
Stay Tuned!
## Related packages
## Lux
[Lux](https://github.com/lux-org/lux) is an awesome project for easy interactive visualization of pandas dataframes within notebooks.### Pandas Profiling
[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) can
provide a thorough summary of the data in only a single line of code. Using the
```ProfileReport()``` method, you are able to access a HTML report of your data
that can help you find correlations and identify missing data.`dabl` focuses less on statistical measures of individual columns, and more on
providing a quick overview via visualizations, as well as convienient
preprocessing and model search for machine learning.