{"id":14977357,"url":"https://github.com/serialbandicoot/great-assertions","last_synced_at":"2025-10-28T03:31:20.964Z","repository":{"id":44633627,"uuid":"407559266","full_name":"serialbandicoot/great-assertions","owner":"serialbandicoot","description":"This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.","archived":false,"fork":false,"pushed_at":"2022-02-03T12:07:37.000Z","size":963,"stargazers_count":10,"open_issues_count":7,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-04-20T00:46:22.982Z","etag":null,"topics":["data-science","data-testing","databricks","great-expectations","jupyter-notebook","python","python3","quality-assurance","testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/serialbandicoot.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-17T13:54:58.000Z","updated_at":"2024-03-04T14:00:02.000Z","dependencies_parsed_at":"2022-09-26T21:51:09.846Z","dependency_job_id":null,"html_url":"https://github.com/serialbandicoot/great-assertions","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serialbandicoot%2Fgreat-assertions","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serialbandicoot%2Fgreat-assertions/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serialbandicoot%2Fgreat-assertions/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serialbandicoot%2Fgreat-assertions/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/serialbandicoot","download_url":"https://codeload.github.com/serialbandicoot/great-assertions/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238590593,"owners_count":19497351,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","data-testing","databricks","great-expectations","jupyter-notebook","python","python3","quality-assurance","testing"],"created_at":"2024-09-24T13:55:30.429Z","updated_at":"2025-10-28T03:31:15.916Z","avatar_url":"https://github.com/serialbandicoot.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Great Assertions\n================\n\n|serialbandicoot| |flake8 Lint| |codecov| |CodeQL|\n\nThis library is inspired by the Great Expectations library. The library\nhas made the various expectations found in Great Expectations available\nwhen using the inbuilt python unittest assertions.\n\n\nInstall\n-------\n\n.. code:: bash\n\n    pip install great-assertions\n\nCode example Pandas\n-------------------\n\n.. code:: python\n\n    from great_assertions import GreatAssertions\n    import pandas as pd\n\n    class GreatAssertionTests(GreatAssertions):\n        def test_expect_table_row_count_to_equal(self):\n            df = pd.DataFrame({\"col_1\": [100, 200, 300], \"col_2\": [10, 20, 30]})\n            self.expect_table_row_count_to_equal(df, 3)\n\nCode example PySpark\n--------------------\n\n.. code:: python\n\n    from great_assertions import GreatAssertions\n    from pyspark.sql import SparkSession\n\n    class GreatAssertionTests(GreatAssertions):\n\n        def setUp(self):\n            self.spark = SparkSession.builder.getOrCreate()\n\n        def test_expect_table_row_count_to_equal(self):\n            df = self.spark.createDataFrame(\n                [\n                    {\"col_1\": 100, \"col_2\": 10},\n                    {\"col_1\": 200, \"col_2\": 20},\n                    {\"col_1\": 300, \"col_2\": 30},\n                ]\n            )\n            self.expect_table_row_count_to_equal(df, 3)\n\nList of available assertions\n----------------------------\n\n+--------------------------------------------------+---------------------+---------------------+\n|                                                  | Pandas              | PySpark             |\n+==================================================+=====================+=====================+\n| expect_table_row_count_to_equal                  | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_table_row_count_to_be_greater_than        | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_table_row_count_to_be_less_than           | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_table_has_no_duplicate_rows               | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_value_to_equal                     | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_values_to_be_between               | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_values_to_match_regex              | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_values_to_be_in_set                | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_values_to_be_of_type               | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_table_columns_to_match_ordered_list       | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_table_columns_to_match_set                | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_date_range_to_be_more_than                | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_date_range_to_be_less_than                | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_date_range_to_be_between                  | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_mean_to_be_between                 | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_value_counts_percent_to_be_between | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_frame_equal                               | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_has_no_duplicate_rows              | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_value_to_equal_if                  | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n| expect_column_value_to_be_greater_if             | :white_check_mark:: | :white_check_mark:: |\n+--------------------------------------------------+---------------------+---------------------+\n\nAssertion Descriptions\n----------------------\n\nFor a description of the assertions see `Assertion\nDefinitions \u003cdocs/assertion_definitions.md\u003e`__\n\nRunning the tests\n-----------------\n\nExecuting the tests still require unittest, the following options have\nbeen tested with the examples provided.\n\nOption 1\n~~~~~~~~\n\n.. code:: python\n\n    import unittest\n    suite = unittest.TestLoader().loadTestsFromTestCase(GreatAssertionTests)\n    runner = unittest.TextTestRunner(verbosity=2)\n    runner.run(suite) \n\nOptions 2\n~~~~~~~~~\n\n.. code:: python\n\n    if __name__ == '__main__':\n        unittest.main()   \n\nPie Charts and Tables\n---------------------\n\nFor a more visual representation of the results, when using in Databricks or Jupyter Notebooks. \nThe results can be outputted as tables or pie-chart.\n\n.. code:: python\n\n    import unittest\n    from great_assertions import GreatAssertionResult, GreatAssertions\n\n    class DisplayTest(GreatAssertions):\n        def test_pass1(self):\n            assert True is True\n\n        def test_fail(self):\n            assert \"Hello\" == \"World\"    \n\n    suite = unittest.TestLoader().loadTestsFromTestCase(DisplayTest)\n    test_runner = unittest.runner.TextTestRunner(resultclass = GreatAssertionResult)\n    result = test_runner.run(suite)\n\n    result.to_barh() #Also available: result.to_pie()\n\n.. image:: docs/img/barh.png\n    :width: 300\n    :alt: Bar Horizonal\n\n.. code:: python\n\n    result.to_results_table()\n\n.. image:: docs/img/results_table.png\n    :width: 300\n    :alt: Results Table   \n\n.. code:: python\n\n    result.to_full_results_table()\n\n.. image:: docs/img/full_results_table.png\n    :width: 500\n    :alt: Full Results Table   \n\n\nRunnng with XML-Runner\n----------------------\n\nTo run with xml-runner, there is no difference to how it's currently used. \nHowever you will not be able to get method like to_results_table as these use a different `resultclass`   \n\n.. code:: python\n\n    import xmlrunner\n    suite = unittest.TestLoader().loadTestsFromTestCase(DisplayTest)\n    test_runner = xmlrunner.XMLRunner(output=\"test-results\")\n    test_runner.run(suite)\n\nProduction Monitoring\n---------------------\n\nThe assertions provided by GA will also allow the validation of the any environment including Production. \nCurrently GA only supports saving the results to Spark, for example databricks.\n\nOnce the run has completed there is a `save` method, as seen below.\n\n.. code:: python\n\n    import xmlrunner\n    suite = unittest.TestLoader().loadTestsFromTestCase(DisplayTest)\n    test_runner = xmlrunner.XMLRunner(output=\"test-results\")\n    result = test_runner.run(suite)\n    result.save(format=\"databricks\")\n\nThe image below shows a simple graph of the accumulation of tests over test run. \nHowever much more complex analysis can be performed with the extended data being generated by GA.\n\n.. image:: docs/img/reporting.png\n    :width: 500\n    :alt: No Tests Vs Test Run \n\nThe extended table of results contains the following:\n\n+--------------+-------------------+----------+-----------------------------------+-------+------+---------------------------------------------------------------------------------------------------------------------------------------+\n|        run_id|          timestamp|    method|                        information|test_id|status|                                                                                                                               extended|\n+--------------+-------------------+----------+-----------------------------------+-------+------+---------------------------------------------------------------------------------------------------------------------------------------+\n|20211222093029|2021-12-22 09:30:29|test_fail8|Traceback (most recent call last...|     13|  Fail|{\"id\": 13, \"name\": \"expect_date_range_to_be_less_than\", \"values\": {\"expected_max_date\": \"2019-05-13\", \"actual_max_date\": \"2019-05-13\"}}|\n+--------------+-------------------+----------+-----------------------------------+-------+------+---------------------------------------------------------------------------------------------------------------------------------------+\n|20211222093029|2021-12-22 09:30:29|test_fail9|Traceback (most recent call last...|     14|  Fail|{\"id\": 14, \"name\": \"expect_date_range_to_be_more_than\", \"values\": {\"expected_min_date\": \"2015-10-01\", \"actual_min_date\": \"2015-10-01\"}}|\n+--------------+-------------------+----------+-----------------------------------+-------+------+---------------------------------------------------------------------------------------------------------------------------------------+\n\nFrom the extended column you can get further details about the type test, which was executed and the results. \nFor example if we look at the test `expect_table_row_count_to_be_less_than` we should assert that the max row should not be breached. \n\nIn the code below, the expected was 100 and the actual was 205, which caused the test to fail. \nTherefore Analysts can query the extended data to get a picture of the size of the breach.\n\n.. code:: python\n\n    extended = {\n        \"id\": 2,\n        \"name\": expect_table_row_count_to_be_less_than,\n        \"values\": {\n            \"exp_max_count\": 100,\n            \"act_count\": 205,\n        },\n    }\n\nIn production monitoring these types of results can allow the prevention of skewed results. \nFor example, if you had a result, where the expected values were withing a range of 0-100 \nand you got an exceptionally large value. \n\nThe large value could cause business functionality to be skewed such that a defect could causes \ndamage or loss of income or incorrect reporting to a downstream system.\n\nTherefore, GA will allow you to provide benchmarks to the production validation and an \nexperienced analyst can create reports on top of the data.\n\nAn example of the extended dataset:\n\n.. image:: docs/img/extended_result_table.png\n    :width: 500\n    :alt: Extended Result Table\n\nNotes\n-----\n\nIf you get an arrows function warning when running in Databricks, this will happen \nbecause a toPandas() method is being used for many of the assertions. The plan is \nto remove Pandas conversion for pure PySpark code. If this is an issue, please raise \nan issue so this method can be prioritised. For now, it’s advisable to make sure the \ndatasets are not too big, which cause the driver to crash.\n\nDevelopment\n-----------\n\nTo create a development environment, create a virtualenv and make a\ndevelopment installation\n\n::\n\n    virtualenv ve\n    source ve/bin/activate\n\nTo run tests, just use pytest\n\n::\n\n    (ve) pytest     \n\n.. |serialbandicoot| image:: https://circleci.com/gh/serialbandicoot/great-assertions.svg?style=svg\n   :target: LINK\n.. |flake8 Lint| image:: https://github.com/serialbandicoot/great-assertions/actions/workflows/flake8.yml/badge.svg\n   :target: https://github.com/serialbandicoot/great-assertions/actions/workflows/flake8.yml\n.. |codecov| image:: https://codecov.io/gh/serialbandicoot/great-assertions/branch/master/graph/badge.svg?token=OKBB0E5EUC\n   :target: https://codecov.io/gh/serialbandicoot/great-assertions\n.. |CodeQL| image:: https://github.com/serialbandicoot/great-assertions/workflows/CodeQL/badge.svg\n   :target: https://github.com/serialbandicoot/great-assertions/actions?query=workflow%3ACodeQL\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserialbandicoot%2Fgreat-assertions","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fserialbandicoot%2Fgreat-assertions","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserialbandicoot%2Fgreat-assertions/lists"}