https://github.com/dabl/dabl

Data Analysis Baseline Library
https://github.com/dabl/dabl

Last synced: 5 months ago
JSON representation

Data Analysis Baseline Library

Host: GitHub
URL: https://github.com/dabl/dabl
Owner: dabl
License: bsd-3-clause
Created: 2018-09-14T19:11:47.000Z (almost 8 years ago)
Default Branch: main
Last Pushed: 2024-12-16T23:28:59.000Z (over 1 year ago)
Last Synced: 2025-10-21T20:48:00.522Z (9 months ago)
Language: Jupyter Notebook
Homepage: https://dabl.github.io/
Size: 7.42 MB
Stars: 728
Watchers: 21
Forks: 103
Open Issues: 89
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - dabl/dabl
awesome-python-data-science - dabl - Data Analysis Baseline Library (Exploration)

README

# dabl

[![CI](https://github.com/dabl/dabl/actions/workflows/ci.yml/badge.svg)](https://github.com/dabl/dabl/actions/workflows/ci.yml)

The data analysis baseline library.

- "Mr Sanchez, are you a data scientist?"
- "I dabl, Mr president."

Find more information on the [website](https://dabl.github.io/).

## Try it out

```
pip install dabl
```

or [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dabl/dabl/main)

## Current scope and upcoming features
This library is very much still under development. Current code focuses mostly on exploratory visualization and preprocessing.
There are also drop-in replacements for GridSearchCV and RandomizedSearchCV using successive halfing.
There are preliminary portfolios in the style of
[POSH
auto-sklearn](https://ml.informatik.uni-freiburg.de/papers/18-AUTOML-AutoChallenge.pdf)
to find strong models quickly. In essence that boils down to a quick search
over different gradient boosting models and other tree ensembles and
potentially kernel methods.

Check out the [the website](https://dabl.github.io/dev/) and [example gallery](https://dabl.github.io/0.1.9/auto_examples/index.html) to get an idea of the visualizations that are available.

Stay Tuned!

## Related packages

## Lux
[Lux](https://github.com/lux-org/lux) is an awesome project for easy interactive visualization of pandas dataframes within notebooks.

### Pandas Profiling
[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) can
provide a thorough summary of the data in only a single line of code. Using the
```ProfileReport()``` method, you are able to access a HTML report of your data
that can help you find correlations and identify missing data.

`dabl` focuses less on statistical measures of individual columns, and more on
providing a quick overview via visualizations, as well as convienient
preprocessing and model search for machine learning.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dabl/dabl

Awesome Lists containing this project

README