Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/great-expectations/great_expectations
Always know what to expect from your data.
https://github.com/great-expectations/great_expectations
cleandata data-engineering data-profilers data-profiling data-quality data-science data-unit-tests datacleaner datacleaning dataquality dataunittest eda exploratory-analysis exploratory-data-analysis exploratorydataanalysis mlops pipeline pipeline-debt pipeline-testing pipeline-tests
Last synced: about 2 months ago
JSON representation
Always know what to expect from your data.
- Host: GitHub
- URL: https://github.com/great-expectations/great_expectations
- Owner: great-expectations
- License: apache-2.0
- Created: 2017-09-11T00:18:46.000Z (almost 7 years ago)
- Default Branch: develop
- Last Pushed: 2024-04-12T20:28:09.000Z (2 months ago)
- Last Synced: 2024-04-13T21:52:06.232Z (2 months ago)
- Topics: cleandata, data-engineering, data-profilers, data-profiling, data-quality, data-science, data-unit-tests, datacleaner, datacleaning, dataquality, dataunittest, eda, exploratory-analysis, exploratory-data-analysis, exploratorydataanalysis, mlops, pipeline, pipeline-debt, pipeline-testing, pipeline-tests
- Language: Python
- Homepage: https://docs.greatexpectations.io/
- Size: 189 MB
- Stars: 9,420
- Watchers: 81
- Forks: 1,464
- Open Issues: 189
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING_CODE.md
- License: LICENSE
- Citation: CITATION.cff
- Codeowners: CODEOWNERS
Lists
- awesome-python-data-science - great_expectations - Always know what to expect from your data. (Data Validation / Synthetic Data)
- awesome-seml - Great Expectations - Data validation and testing with integration in pipelines. (Tooling)
- awesome-stars - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- my-awesome-stars - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-stars - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-stars - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-starred - great-expectations/great_expectations - Always know what to expect from your data. (data-science)
- awesome-data-quality - great-expectations - tool for data testing, documentation, and profiling. (Table of Contents / Frameworks and Libraries)
- awesome-stars - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-stars - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-stars - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-d5-links - Great Expectations
- awesome-machine-learning-engineer - Great Expectations - How to test and document your data and data pipelines (30 min) (DevOps / Data pipelines)
- awesome-python-machine-learning-resources - GitHub - 12% open · ⏱️ 26.08.2022): (数据管道和流处理)
- awesome-python-data-science - great_expectations - F framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests. (Data Gathering / Ranking/Recommender)
- my-awesome - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-python-data-science - great_expectations - Always know what to expect from your data. (Data Validation / NLP)
- awesome-stars - great_expectations - expectations | 9167 | (Python)
- best-of-python - GitHub - 11% open · ⏱️ 06.06.2024): (Data Pipelines & Streaming)
- awesome-python-machine-learning - Great Expectations - Great Expectations is a framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests. (Uncategorized / Uncategorized)
- project-awesome - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-stars - great_expectations - expectations | 9579 | (Python)
- awesome-llmops - Great Expectations - expectations/great_expectations.svg?style=flat-square) | (Security / Observability)
- awesome-stars - great_expectations - expectations | 9592 | (Python)
- Awesome-LLMOps - Great Expectations - expectations/great_expectations.svg?style=social) - Always know what to expect from your data. (Observability)
- awesome-list - Great Expectations - Helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. (Data Management & Processing / Database & Cloud Management)
- jimsghstars - great-expectations/great_expectations - Always know what to expect from your data. (Python)
- awesome-data-centric-ai - Great Expectations - Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. (📊 Data Profiling)
- StarryDivineSky - great-expectations/great_expectations
README
[![Python Versions](https://img.shields.io/pypi/pyversions/great_expectations.svg)](https://pypi.python.org/pypi/great_expectations)
[![PyPI](https://img.shields.io/pypi/v/great_expectations)](https://pypi.org/project/great-expectations/#history)
[![PyPI Downloads](https://img.shields.io/pypi/dm/great-expectations)](https://pypistats.org/packages/great-expectations)
[![Build Status](https://img.shields.io/azure-devops/build/great-expectations/bedaf2c2-4c4a-4b37-87b0-3877190e71f5/1)](https://dev.azure.com/great-expectations/great_expectations/_build/latest?definitionId=1&branchName=develop)
[![pre-commit.ci Status](https://results.pre-commit.ci/badge/github/great-expectations/great_expectations/develop.svg)](https://results.pre-commit.ci/latest/github/great-expectations/great_expectations/develop)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5683574.svg)](https://doi.org/10.5281/zenodo.5683574)
[![Twitter Follow](https://img.shields.io/twitter/follow/expectgreatdata?style=social)](https://twitter.com/expectgreatdata)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://greatexpectations.io/slack)
[![Contributors](https://img.shields.io/github/contributors/great-expectations/great_expectations)](https://github.com/great-expectations/great_expectations/graphs/contributors)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v1.json)](https://github.com/charliermarsh/ruff)Great Expectations
================================================================================*Always know what to expect from your data.*
Important announcements regarding our upcoming 1.0 release
================================================================================
We’re planning a ton of work to take GX OSS to the next level as we officially graduate it to 1.0!Our biggest goal is to improve the user and contributor experiences by **streamlining the API**, based on the feedback
we’ve
gotten from the community (thank you!) over the years.Learn more about our plans for 1.0 and how we’ll be making this transition in
our [blog post](https://greatexpectations.io/blog/the-next-step-for-gx-oss-1-0).As we gear up for the launch of our 1.0 release early next year, we want to share an important update regarding our
current development process.Temporary hold on PRs
--------------------------------------------------------------------------------We’re temporarily pausing the acceptance of new pull requests (PRs). We’re going to be updating the API and codebase
frequently and significantly over the next few months—we don’t want contributors to spend time and effort only to find
that we’ve just implemented a breaking change for their work.Looking forward
--------------------------------------------------------------------------------
We deeply value the contributions and engagement of our community. Please hold onto your fantastic ideas and PRs until
after the 1.0 release, when we will be excited to resume accepting them. We appreciate your understanding and support
as we make this final push toward this exciting milestone. Please watch for updates in our
[slack](https://greatexpectations.io/slack) community, and thank you for being a crucial part of our journey!What is GX?
--------------------------------------------------------------------------------Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing,
documentation, and profiling.Data practitioners know that testing and documentation are essential for managing complex data pipelines. GX makes it
possible for data science and engineering teams to quickly deploy extensible, flexible data quality testing into their
data stacks. Its human-readable documentation makes the results accessible to technical and nontechnical users.[See Down with Pipeline Debt!](https://greatexpectations.io/blog/down-with-pipeline-debt-introducing-great-expectations/)
for an introduction to our philosophy of pipeline data quality testing.Key features
--------------------------------------------------### Seamless operation
GX fits into your existing tech stack, and can integrate with your CI/CD pipelines to add data quality exactly where you
need it. Connect to and validate your data wherever it already is, so you can focus on honing your Expectation Suites to
perfectly meet your data quality needs.### Start fast
Get useful results quickly even for large data volumes. GX’s Data Assistants provide curated Expectations for different
domains, so you can accelerate your data discovery to rapidly deploy data quality throughout your pipelines.
Auto-generated Data Docs ensure your DQ documentation will always be up-to-date.![data_assistant_plot_expectations_and_metrics](./docs/readme_assets/data_assistant_plot_expectations_and_metrics.png)
### Unified understanding
Expectations are GX’s workhorse abstraction: each Expectation declares an expected state of the data. The Expectation
library provides a flexible, extensible vocabulary for data quality—one that’s human-readable, meaningful for technical
and nontechnical users alike. Bundled into Expectation Suites, Expectations are the ideal tool for characterizing
exactly what you expect from your data.- `expect_column_values_to_not_be_null`
- `expect_column_values_to_match_regex`
- `expect_column_values_to_be_unique`
- `expect_column_values_to_match_strftime_format`
- `expect_table_row_count_to_be_between`
- `expect_column_median_to_be_between`
- ...and [many more](https://greatexpectations.io/expectations)### Secure and transparent
GX doesn’t ask you to exchange security for your insight. It processes your data in place, on your systems, so your
security and governance procedures can maintain control at all times. And because GX’s core is and always will be open
source, its complete transparency is the opposite of a black box.### Data contracts support
Checkpoints are a transparent, central, and automatable mechanism for testing Expectations and evaluating your data
quality. Every Checkpoint run produces human-readable Data Docs reporting the results. You can also configure
Checkpoints to take Actions based on the results of the evaluation, like sending alerts and preventing low-quality data
from moving further in your pipelines.![Image of data contact support](./docs/readme_assets/data-contract-support.png)
### Readable for collaboration
Everyone stays on the same page about your data quality with GX’s inspectable, shareable, and human-readable Data Docs.
You can publish Data Docs to the locations where you need them in a variety of formats, making it easy to integrate Data
Docs into your existing data catalogs, dashboards, and other reporting and data governance tools.![Image of data docs](./docs/readme_assets/datadocs-update.jpg)
Quick start
-------------------------------------------------------------To see Great Expectations in action on your own data:
You can install it using pip
```
pip install great_expectations
```and then run
```python
import great_expectations as gxcontext = gx.get_context()
```(We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks,
or git, you may want to check out
the [Supporting Resources](https://docs.greatexpectations.io/docs/terms/supporting_resource/), which will teach you how
to get up and running in minutes.)For full documentation, visit [https://docs.greatexpectations.io/](https://docs.greatexpectations.io/).
If you need help, hop into our [Slack channel](https://greatexpectations.io/slack)—there are always contributors
and other users there.Integrations
-------------------------------------------------------------------------------
Great Expectations works with the tools and systems that you're already using with your data, including:
Integration
Notes
DataHub
Data Catalog
AWS Glue
Data Integration
Athena
Data Source
AWS Redshift
Data Source
AWS S3
Data Source
BigQuery
Data Source
Databricks
Data Source
Deepnote
Collaborative data notebook
Google Cloud Platform (GCP)
Data Source
Microsoft Azure Blob Storage
Data Source
Microsoft SQL Server
Data Source
MySQL
Data Source
Pandas
Data Source
PostgreSQL
Data Source
Snowflake
Data Source
Spark
Data Source
SQLite
Data Source
Trino
Data Source
Apache Airflow
Orchestrator
Flyte
Orchestrator
Meltano
Orchestrator
Prefect
Orchestrator
ZenML
Orchestrator
Slack
Plugin
Jupyter Notebooks
Utility
What is GX _not_?
-------------------------------------------------------------Great Expectations is _not_ a pipeline execution framework. Instead, it integrates seamlessly with DAG execution tools
like [Spark]( https://spark.apache.org/), [Airflow](https://airflow.apache.org/), [dbt]( https://www.getdbt.com/)
, [prefect](https://www.prefect.io/), [dagster]( https://github.com/dagster-io/dagster)
, [Kedro](https://github.com/quantumblacklabs/kedro), [Flyte](https://flyte.org/), etc. GX carries out your data quality
pipeline testing while these tools execute the pipelines.Great Expectations is _not_ a database or storage software. It processes your data in place, on your existing systems.
Expectations and Validation Results that GX produces are metadata about your data.Great Expectations is _not_ a data versioning tool. If you want to bring your data itself under version control, check
out tools like [DVC](https://dvc.org/), [Quilt](https://github.com/quiltdata/quilt),
and [lakeFS](https://github.com/treeverse/lakeFS/).Great Expectations is _not_ a language-agnostic platform. Instead, it follows the philosophy of “take the compute to the
data” by using the popular Python language to support native execution of Expectations in pandas, SQL (via SQLAlchemy),
and Spark environments.Great Expectations is _not_ exclusive to Python programming environments. It can be invoked from the command line
without a Python environment. However, if you’re working into another ecosystem, you may want to explore
ecosystem-specific alternatives such as [assertR](https://github.com/ropensci/assertr) (for R environments)
or [TFDV](https://www.tensorflow.org/tfx/guide/tfdv) (for Tensorflow environments).Who maintains Great Expectations?
-------------------------------------------------------------Great Expectations OSS is under active development by GX Labs and the Great Expectations community.
What's the best way to get in touch with the Great Expectations team?
--------------------------------------------------------------------------------If you have questions, comments, or just want to have a good old-fashioned chat about data quality, please hop on our
public [Slack channel](https://greatexpectations.io/slack) or post in
our [Discourse](https://discourse.greatexpectations.io/).Can I contribute to the library?
--------------------------------------------------------------------------------Absolutely. Yes, please.
See [Contributing code](https://github.com/great-expectations/great_expectations/blob/develop/CONTRIBUTING_CODE.md)
, [Contributing Expectations](https://github.com/great-expectations/great_expectations/blob/develop/CONTRIBUTING_EXPECTATIONS.md)
, [Contributing packages](https://github.com/great-expectations/great_expectations/blob/develop/CONTRIBUTING_PACKAGES.md)
,
or [Contribute to Great Expectations documentation](https://github.com/great-expectations/great_expectations/tree/develop/docs)
, and please don't be shy with questions.How do I stay up to date with Great Expectations?
--------------------------------------------------------------------------------You can get updates on everything GX with our email
newsletter. [Subscribe here!](https://greatexpectations.io/newsletter?utm_source=github&utm_medium=social&utm_campaign=newsletter&utm_content=form)