Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/skn0tt/datapact
Expectations on Your Data
https://github.com/skn0tt/datapact
Last synced: 6 days ago
JSON representation
Expectations on Your Data
- Host: GitHub
- URL: https://github.com/skn0tt/datapact
- Owner: Skn0tt
- License: mit
- Created: 2022-04-19T12:11:24.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-26T21:40:34.000Z (about 1 month ago)
- Last Synced: 2024-10-06T04:02:55.190Z (about 1 month ago)
- Language: Jupyter Notebook
- Homepage: https://datapact.netlify.app/
- Size: 4.26 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 36
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# `datapact` - pytest, but for dataframes
[![All Contributors](https://img.shields.io/badge/all_contributors-3-orange.svg?style=flat-square)](#contributors-)
`datapact` is a Python library for verifying your data.
```py
import datapactdp = datapact.test(df)
dp.age.must.be_positive()
dp.name.should.not_be_empty()
```It works with Pandas + Dask DataFrames, and has special support for Jupyter Notebooks.
![jupyter notebooks screenshot](./doc/jupyter_screenshot.png)
Here's some features:
- dozens of existing assertions, easy to add your own
- great in-editor documentation via docstrings + types
- two severence levels (`.should` for warnings, `.must` for failures)
- failure notifications via E-Mail, MS Teams, Slack or PagerDuty (via Datapact Track)Get Started here: https://datapact.dev
## `Datapact` Track
Datapact Track is an optional, browser-based data tracking service.
![Datapact Track dataset overview. code snippet for how to connect test suite to service](./doc/track_screenshot_dataset.png)
It's fully self-hostable via Docker and Postgres, and there's a hosted version available at `track.datapact.dev`.
Connecting your test suite is one line of code:
```py
dp.connect(
server="track.datapact.dev",
token="..." # get this from the UI
)
```Datapact track gives you:
- notifications via E-Mail, Slack, MS Teams and PagerDuty
- a central documentation of your datasets
- history of data expectations + reality
- data quality trackingTry out Datapact Track at [track.datapact.dev](https://track.datapact.dev), or follow the [self-hosting guide](https://datapact.dev/track.html) to deploy your own instance.
## `datapact` vs [Great Expectations](https://greatexpectations.io)
Both datapact and Great Expectations help you improve Data Quality, but with a different approach.
Great Expectations has its own JSON-based storage format for expectation suites, and it gives you a custom UI to edit them.
It's way bigger than datapact - in project size, project scope, but also in complexity.`datapact` is a lot younger, community-run, and more of a _library_ than a _framework_.
The main differentiator is that it allows you to express your test suites in Python code, right along your other code.
This works in Python Scripts, Jupyter Notebooks, Pipeline Tests - everywhere that Python runs.
And by having your tests _in code_, you can co-locate them with the rest of your code, and version control + review them just like all of it.If you already know how to use Great Expectations, you should use it.
If you found its learning curve to steep, maybe look at `datapact` - it's designed to be easy to get started, and intuitive to use.## Contributors β¨
Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
Simon Knott
π» π π€ π§
st-sch
π
Jeremias DΓΆtterl
π
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!