{"id":14065503,"url":"https://github.com/frictionlessdata/goodtables-py","last_synced_at":"2025-07-29T20:33:11.381Z","repository":{"id":62459332,"uuid":"437775708","full_name":"frictionlessdata/goodtables-py","owner":"frictionlessdata","description":"Goodtables is a framework to validate tabular data [MAINTENANCE MODE]","archived":true,"fork":false,"pushed_at":"2021-12-13T07:29:54.000Z","size":7232,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-04T04:33:44.125Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/frictionlessdata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-13T07:28:19.000Z","updated_at":"2023-07-11T09:28:38.000Z","dependencies_parsed_at":"2022-11-02T00:45:32.550Z","dependency_job_id":null,"html_url":"https://github.com/frictionlessdata/goodtables-py","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/frictionlessdata/goodtables-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frictionlessdata%2Fgoodtables-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frictionlessdata%2Fgoodtables-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frictionlessdata%2Fgoodtables-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frictionlessdata%2Fgoodtables-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/frictionlessdata","download_url":"https://codeload.github.com/frictionlessdata/goodtables-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/frictionlessdata%2Fgoodtables-py/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267754854,"owners_count":24139435,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-13T07:04:31.685Z","updated_at":"2025-07-29T20:33:09.418Z","avatar_url":"https://github.com/frictionlessdata.png","language":"Python","readme":"# goodtables-py\n\n[![Travis](https://img.shields.io/travis/frictionlessdata/goodtables-py/master.svg)](https://travis-ci.org/frictionlessdata/goodtables-py)\n[![Coveralls](http://img.shields.io/coveralls/frictionlessdata/goodtables-py.svg?branch=master)](https://coveralls.io/r/frictionlessdata/goodtables-py?branch=master)\n[![PyPi](https://img.shields.io/pypi/v/goodtables.svg)](https://pypi.python.org/pypi/goodtables)\n[![Github](https://img.shields.io/badge/github-master-brightgreen)](https://github.com/frictionlessdata/goodtables-py)\n[![Gitter](https://img.shields.io/gitter/room/frictionlessdata/chat.svg)](https://gitter.im/frictionlessdata/chat)\n\nGoodtables is a framework to validate tabular data. It can check the structure\nof your data (e.g. all rows have the same number of columns), and its contents\n(e.g. all dates are valid).\n\n\u003e **[Important Notice]** `goodtables` was renamed to `frictionless` since version 3. The framework got various improvements and was extended to be a complete data solution. The change in not breaking for the existing software so no actions are required. Please read the [Migration Guide](https://framework.frictionlessdata.io/docs/development/migration#from-goodtables) to start working with Frictionless for Python.\n\u003e - we continue to bug-fix `goodtables@2.x` in this [branch](https://github.com/frictionlessdata/goodtables-py/tree/goodtables) as well as it's available on [PyPi](https://pypi.org/project/goodtables/) as it was before\n\u003e - please note that `frictionless@3.x` version's API, we're working on at the moment, is not stable\n\u003e - we will release `frictionless@4.x` by the end of 2020 to be the first SemVer/stable version\n\n## Features\n\n* **Structural checks**: Ensure that there are no empty rows, no blank headers, etc.\n* **Content checks**: Ensure that the values have the correct types (\"string\", \"number\", \"date\", etc.), that their format is valid (\"string must be an e-mail\"), and that they respect the constraints (\"age must be a number greater than 18\").\n* **Support for multiple tabular formats**: CSV, Excel files, LibreOffice, Data Package, etc.\n* **Parallelized validations for multi-table datasets**\n* **Command line interface**\n\n## Contents\n\n\u003c!--TOC--\u003e\n\n  - [Getting Started](#getting-started)\n    - [Installing](#installing)\n    - [Running on CLI](#running-on-cli)\n    - [Running on Python](#running-on-python)\n  - [Documentation](#documentation)\n    - [Report](#report)\n    - [Checks](#checks)\n    - [Presets](#presets)\n    - [Data Quality Errors](#data-quality-errors)\n    - [Frequently Asked Questions](#frequently-asked-questions)\n  - [API Reference](#api-reference)\n    - [`cli`](#cli)\n    - [`validate`](#validate)\n    - [`preset`](#preset)\n    - [`check`](#check)\n    - [`Error`](#error)\n    - [`spec`](#spec)\n    - [`GoodtablesException`](#goodtablesexception)\n  - [Contributing](#contributing)\n  - [Changelog](#changelog)\n\n\u003c!--TOC--\u003e\n\n## Getting Started\n\n\u003e For faster goodtables-combatible Pandas dataframes validation take a look at https://github.com/ezwelty/goodtables-pandas-py\n\n### Installing\n\n```\npip install goodtables\npip install goodtables[ods]  # If you need LibreOffice's ODS file support\n```\n\n### Running on CLI\n\n```\ngoodtables data.csv\n```\n\nUse `goodtables --help` to see the different options.\n\n### Running on Python\n\n```python\nfrom goodtables import validate\n\nreport = validate('invalid.csv')\nreport['valid'] # false\nreport['table-count'] # 1\nreport['error-count'] # 3\nreport['tables'][0]['valid'] # false\nreport['tables'][0]['source'] # 'invalid.csv'\nreport['tables'][0]['errors'][0]['code'] # 'blank-header'\n```\n\nYou can read a more in depth explanation on using goodtables with Python on\nthe [developer documentation](#developer-documentation) section. Check also\nthe [examples](examples) folder for other examples.\n\n## Documentation\n\nGoodtables validates your tabular dataset to find structural and content\nerrors. Consider you have a file named `invalid.csv`. Let's validate it:\n\n```python\nreport = validate('invalid.csv')\n```\n\nWe could also pass a remote URI instead of a local path. It supports CSV, XLS,\nXLSX, ODS, JSON, and all other formats supported by the [tabulator][tabulator]\nlibrary.\n\n### Report\n\n\u003e The validation report follows the JSON Schema defined on [goodtables/schemas/report.json][validation-jsonschema].\n\nThe output of the `validate()` method is a report dictionary. It includes\ninformation if the data was valid, count of errors, list of table reports, which\nindividual checks failed, etc. A report will be looking like this:\n\n```json\n{\n    \"time\": 0.009,\n    \"error-count\": 1,\n    \"warnings\": [\n        \"Table \\\"data/invalid.csv\\\" inspection has reached 1 error(s) limit\"\n    ],\n    \"preset\": \"table\",\n    \"valid\": false,\n    \"tables\": [\n        {\n            \"errors\": [\n                {\n                    \"row-number\": null,\n                    \"message\": \"Header in column 3 is blank\",\n                    \"row\": null,\n                    \"column-number\": 3,\n                    \"code\": \"blank-header\"\n                }\n            ],\n            \"error-count\": 1,\n            \"headers\": [\n                \"id\",\n                \"name\",\n                \"\",\n                \"name\"\n            ],\n            \"scheme\": \"file\",\n            \"row-count\": 2,\n            \"valid\": false,\n            \"encoding\": \"utf-8\",\n            \"time\": 0.007,\n            \"schema\": null,\n            \"format\": \"csv\",\n            \"source\": \"data/invalid\"\n        }\n    ],\n    \"table-count\": 1\n}\n```\n\nThe errors are divided in one of the following categories:\n\n- `source` - data can't be loaded or parsed\n- `structure` - general tabular errors like duplicate headers\n- `schema` - error of checks against [Table Schema](http://specs.frictionlessdata.io/table-schema/)\n- `custom` - custom checks errors\n\n### Checks\n\nCheck is a main validation actor in goodtables. The list of enabled checks can\nbe changed using `checks` and `skip_checks` arguments. Let's explore the options\non an example:\n\n```python\nreport = validate('data.csv') # by default structure and schema (if available) checks\nreport = validate('data.csv', checks=['structure']) # only structure checks\nreport = validate('data.csv', checks=['schema']) # only schema (if available) checks\nreport = validate('data.csv', checks=['bad-headers']) # check only 'bad-headers'\nreport = validate('data.csv', skip_checks=['bad-headers']) # exclude 'bad-headers'\n```\n\nBy default a dataset will be validated against all available Data Quality Spec\nerrors. Some checks can be unavailable for validation. For example, if the\nschema isn't provided, only the `structure` checks will be done.\n\n### Presets\n\nGoodtables support different formats of tabular datasets. They're called\npresets. A tabular dataset is some data that can be split in a list of data\ntables, as:\n\n![Dataset](data/dataset.png)\n\nWe can change the preset using the `preset` argument for `validate()`. By\ndefault, it'll be inferred from the source, falling back to `table`. To validate\na [data package][datapackage], we can do:\n\n```python\nreport = validate('datapackage.json') # implicit preset\nreport = validate('datapackage.json', preset='datapackage') # explicit preset\n```\n\nThis will validate all tabular resources in the datapackage.\n\nIt's also possible to validate a list of files using the \"nested\" preset. To do\nso, the first argument to `validate()` should be a list of dictionaries, where\neach key in the dictionary is named after a parameter on `validate()`. For example:\n\n```python\nreport = validate([{'source': 'data1.csv'}, {'source': 'data2.csv'}]) # implicit preset\nreport = validate([{'source': 'data1.csv'}, {'source': 'data2.csv'}], preset='nested') # explicit preset\n```\n\nIs similar to:\n\n```python\nreport_data1 = validate('data1.csv')\nreport_data2 = validate('data2.csv')\n```\n\nThe difference is that goodtables validates multiple tables in parallel, so\ncalling using the \"nested\" preset should run faster.\n\n### Data Quality Errors\n\nBase report errors are standardized and described in\n[Data Quality Spec](https://github.com/frictionlessdata/data-quality-spec/blob/master/spec.json).\n\n#### Source errors\n\nThe basic checks can't be disabled, as they deal with goodtables being able to read the files.\n\n| check | description |\n| --- | --- |\n| io-error | Data reading error because of IO error. |\n| http-error | Data reading error because of HTTP error. |\n| source-error | Data reading error because of not supported or inconsistent contents. |\n| scheme-error | Data reading error because of incorrect scheme. |\n| format-error | Data reading error because of incorrect format. |\n| encoding-error | Data reading error because of an encoding problem. |\n\n#### Structure errors\n\nThese checks validate that the structure of the file are valid.\n\n| check | description |\n| --- | --- |\n| blank-header | There is a blank header name. All cells in the header row must have a value. |\n| duplicate-header | There are multiple columns with the same name. All column names must be unique. |\n| blank-row | Rows must have at least one non-blank cell. |\n| duplicate-row | Rows can't be duplicated. |\n| extra-value | A row has more columns than the header. |\n| missing-value | A row has less columns than the header. |\n\n#### Schema errors\n\nThese checks validate the contents of the file. To use them, you need to pass a [Table Schema][tableschema]. If you don't have a schema, goodtables can infer it if you use the `infer_schema` option.\n\nIf your schema only covers part of the data, you can use the `infer_fields` to infer the remaining fields.\n\nLastly, if the order of the fields in the data is different than in your schema, enable the `order_fields` option.\n\n| check | description |\n| --- | --- |\n| schema-error | Schema is not valid. |\n| non-matching-header | The header's name in the schema is different from what's in the data. |\n| extra-header | The data contains a header not defined in the schema. |\n| missing-header | The data doesn't contain a header defined in the schema. |\n| type-or-format-error | The value can’t be cast based on the schema type and format for this field. |\n| required-constraint | This field is a required field, but it contains no value. |\n| pattern-constraint | This field value's should conform to the defined pattern. |\n| unique-constraint | This field is a unique field but it contains a value that has been used in another row. |\n| enumerable-constraint | This field value should be equal to one of the values in the enumeration constraint. |\n| minimum-constraint | This field value should be greater or equal than constraint value. |\n| maximum-constraint | This field value should be less or equal than constraint value. |\n| minimum-length-constraint | A length of this field value should be greater or equal than schema constraint value. |\n| maximum-length-constraint | A length of this field value should be less or equal than schema constraint value. |\n\n#### Custom errors\n\n| check | description |\n| --- | --- |\n| [blacklisted-value](#blacklisted-value) | Ensure there are no cells with the blacklisted values. |\n| [deviated-value](#deviated-value) | Ensure numbers are within a number of standard deviations from the average. |\n| [foreign-key](#foreign-key) | Ensure foreign keys are valid within a data package |\n| [sequential-value](#sequential-value) | Ensure numbers are sequential. |\n| [truncated-value](#truncated-value) | Detect values that were potentially truncated. |\n| [custom-constraint](#custom-constraint) | Defines a constraint based on the values of other columns (e.g. `value * quantity == total`). |\n\n##### blacklisted-value\n\nSometimes we have to check for some values we don't want to have in out dataset. It accepts following options:\n\n| option | type | description |\n| --- | --- | --- |\n| column | int/str | Column number or name |\n| blacklist | list of str | List of blacklisted values |\n\nConsider the following CSV file:\n\n```csv\nid,name\n1,John\n2,bug\n3,bad\n5,Alex\n```\n\nLet's check that the `name` column doesn't contain rows with `bug` or `bad`:\n\n```python\nfrom goodtables import validate\n\nreport = validate('data.csv', checks=[\n    {'blacklisted-value': {'column': 'name', 'blacklist': ['bug', 'bad']}},\n])\n# error on row 3 with code \"blacklisted-value\"\n# error on row 4 with code \"blacklisted-value\"\n```\n\n##### deviated-value\n\nThis check helps to find outlines in a column containing positive numbers. It accepts following options:\n\n| option | type | description |\n| --- | --- | --- |\n| column | int/str | Column number or name |\n| average | str | Average type, either \"mean\", \"median\" or \"mode\" |\n| interval | int | Values must be inside range `average ± standard deviation * interval` |\n\nConsider the following CSV file:\n\n```csv\ntemperature\n1\n-2\n7\n0\n1\n2\n5\n-4\n100\n8\n3\n```\n\nWe use `median` to get an average of the column values and allow interval of 3 standard deviations. For our case median is `2.0` and standard deviation is `29.73` so all valid values must be inside the `[-87.19, 91.19]` interval.\n\n```python\nreport = validate('data.csv', checks=[\n    {'deviated-value': {'column': 'temperature', 'average': 'median', 'interval': 3}},\n])\n# error on row 10 with code \"deviated-value\"\n```\n\n##### foreign-key\n\n\u003e We support here relative paths. It MUST be used only for trusted data sources.\n\nThis check validate foreign keys within a data package. Consider we have a data package defined below:\n\n```python\nDESCRIPTOR = {\n  'resources': [\n    {\n      'name': 'cities',\n      'data': [\n        ['id', 'name', 'next_id'],\n        [1, 'london', 2],\n        [2, 'paris', 3],\n        [3, 'rome', 4],\n        # [4, 'rio', None],\n      ],\n      'schema': {\n        'fields': [\n          {'name': 'id', 'type': 'integer'},\n          {'name': 'name', 'type': 'string'},\n          {'name': 'next_id', 'type': 'integer'},\n        ],\n        'foreignKeys': [\n          {\n            'fields': 'next_id',\n            'reference': {'resource': '', 'fields': 'id'},\n          },\n          {\n            'fields': 'id',\n            'reference': {'resource': 'people', 'fields': 'label'},\n          },\n        ],\n      },\n    }, {\n      'name': 'people',\n      'data': [\n        ['label', 'population'],\n        [1, 8],\n        [2, 2],\n        # [3, 3],\n        # [4, 6],\n      ],\n    },\n  ],\n}\n```\n\nRunning `goodtables` on it will raise a few `foreign-key` errors because we have commented some rows in the data package's data:\n\n```python\nreport = validate(DESCRIPTOR, checks=['structure', 'schema', 'foreign-key'])\nprint(report)\n```\n\n```\n{'error-count': 2,\n 'preset': 'datapackage',\n 'table-count': 2,\n 'tables': [{'datapackage': '...',\n             'error-count': 2,\n             'errors': [{'code': 'foreign-key',\n                         'message': 'Foreign key \"[\\'next_id\\']\" violation in '\n                                    'row 4',\n                         'message-data': {'fields': ['next_id']},\n                         'row-number': 4},\n                        {'code': 'foreign-key',\n                         'message': 'Foreign key \"[\\'id\\']\" violation in row 4',\n                         'message-data': {'fields': ['id']},\n                         'row-number': 4}],\n             'format': 'inline',\n             'headers': ['id', 'name', 'next_id'],\n             'resource-name': 'cities',\n             'row-count': 4,\n             'schema': 'table-schema',\n             'source': 'inline',\n             'time': 0.031,\n             'valid': False},\n            {'datapackage': '...',\n             'error-count': 0,\n             'errors': [],\n             'format': 'inline',\n             'headers': ['label', 'population'],\n             'resource-name': 'people',\n             'row-count': 3,\n             'source': 'inline',\n             'time': 0.038,\n             'valid': True}],\n 'time': 0.117,\n 'valid': False,\n 'warnings': []}\n```\n\nIt experimetally supports external resource checks, for example, for a `foreignKey` definition like these:\n\n```json\n{\"package\": \"../people/datapackage.json\", \"resource\": \"people\", \"fields\": \"label\"}\n{\"package\": \"http:/example.com/datapackage.json\", \"resource\": \"people\", \"fields\": \"label\"}\n```\n\n##### sequential-value\n\nThis checks is for pretty common case when a column should have integers that sequentially increment.  It accepts following options:\n\n| option | type | description |\n| --- | --- | --- |\n| column | int/str | Column number or name |\n\nConsider the following CSV file:\n\n```csv\nid,name\n1,one\n2,two\n3,three\n5,five\n```\n\nLet's check if the `id` column contains sequential integers:\n\n```python\nfrom goodtables import validate\n\nreport = validate('data.csv', checks=[\n    {'sequential-value': {'column': 'id'}},\n])\n# error on row 5 with code \"sequential-value\"\n```\n\n##### truncated-value\n\nSome database or spreadsheet software (like MySQL or Excel) could cutoff values on saving. There are some well-known heuristics to find this bad values. See https://github.com/propublica/guides/blob/master/data-bulletproofing.md for more detailed information.\n\nConsider the following CSV file:\n\n```csv\nid,amount,comment\n1,14000000,good\n2,2147483647,bad\n3,32767,bad\n4,234234234,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbad\n```\n\nTo detect all probably truncated values we could use `truncated-value` check:\n\n```python\nreport = validate('data.csv', checks=[\n    'truncated-value',\n])\n# error on row 3 with code \"truncated-value\"\n# error on row 4 with code \"truncated-value\"\n# error on row 5 with code \"truncated-value\"\n```\n\n##### custom-constraint\n\nWith Table Schema we could create constraints for an individual field but sometimes it's not enough. With a custom constraint check every row could be checked against given limited python expression in which variable names resolve to column values. See list of [available operators]( https://github.com/danthedeckie/simpleeval#operators). It accepts following options:\n\n\u003cdl\u003e\n  \u003cdt\u003econstraint (str)\u003c/dt\u003e\n  \u003cdd\u003eConstraint definition (e.g. \u003ccode\u003ecol1 + col2 == col3\u003c/code\u003e)\u003c/dd\u003e\n\u003c/dl\u003e\n\nConsider csv file like this:\n\n```csv\nid,name,salary,bonus\n1,Alex,1000,200\n2,Sam,2500,500\n3,Ray,1350,500\n4,John,5000,1000\n```\n\nLet's say our business rule is to be shy on bonuses:\n\n```python\nreport = validate('data.csv', checks=[\n    {'custom-constraint': {'constraint': 'salary \u003e bonus * 4'}},\n])\n# error on row 4 with code \"custom-constraint\"\n```\n\n### Frequently Asked Questions\n\n#### How can I add a new custom check?\n\nTo create a custom check user could use a `check` decorator. This way the builtin check could be overridden (use the spec error code like `duplicate-row`) or could be added a check for a custom error (use `type`, `context` and `position` arguments):\n\n```python\nfrom goodtables import validate, check, Error\n\n@check('custom-check', type='custom', context='body')\ndef custom_check(cells):\n    errors = []\n    for cell in cells:\n        message = 'Custom error on column {column_number} and row {row_number}'\n        error = Error(\n            'custom-error',\n            cell,\n            message\n        )\n        errors.append(error)\n    return errors\n\nreport = validate('data.csv', checks=['custom-check'])\n```\n\nRecommended steps:\n- let's discuss in comment proposed checks first\n- select name for a new check like `possible-noise-text`\n- copy https://github.com/frictionlessdata/goodtables-py/blob/master/goodtables/contrib/checks/blacklisted_value.py to new check module\n- add new check module to configuration - https://github.com/frictionlessdata/goodtables-py/blob/master/goodtables/config.py\n- write actual code for the new check\n- write tests and readme for the new check\n\n#### How can I add support for a new tabular file type?\n\nTo create a custom preset user could use a `preset` decorator. This way the builtin preset could be overridden or could be added a custom preset.\n\n```python\nfrom tabulator import Stream\nfrom tableschema import Schema\nfrom goodtables import validate\n\n@preset('custom-preset')\ndef custom_preset(source, **options):\n    warnings = []\n    tables = []\n    for table in source:\n        try:\n            tables.append({\n                'source':  str(source),\n                'stream':  Stream(...),\n                'schema': Schema(...),\n                'extra': {...},\n            })\n        except Exception:\n            warnings.append('Warning message')\n    return warnings, tables\n\nreport = validate(source, preset='custom-preset')\n```\n\nFor now this documentation section is incomplete. Please see builtin presets to learn more about the dataset extraction protocol.\n\n## API Reference\n\n### `cli`\n```python\ncli()\n```\nCommand-line interface\n\n```\nUsage: cli.py [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n  --version  Show the version and exit.\n  --help     Show this message and exit.\n\nCommands:\n  validate*  Validate tabular files (default).\n  init       Init data package from list of files.\n```\n\n\n### `validate`\n```python\nvalidate(source, **options)\n```\nValidates a source file and returns a report.\n\n__Arguments__\n\n- __source (Union[str, Dict, List[Dict], IO])__:\n        The source to be validated.\n        It can be a local file path, URL, dict, list of dicts, or a\n        file-like object. If it's a list of dicts and the `preset` is\n        \"nested\", each of the dict key's will be used as if it was passed\n        as a keyword argument to this method.\n\n        The file can be a CSV, XLS, JSON, and any other format supported by\n        `tabulator`_.\n- __checks (List[str])__:\n        List of checks names to be enabled. They can be\n        individual check names (e.g. `blank-headers`), or check types (e.g.\n        `structure`).\n- __skip_checks (List[str])__:\n        List of checks names to be skipped. They can\n        be individual check names (e.g. `blank-headers`), or check types\n        (e.g.  `structure`).\n- __infer_schema (bool)__:\n        Infer schema if one wasn't passed as an argument.\n- __infer_fields (bool)__:\n        Infer schema for columns not present in the received schema.\n- __order_fields (bool)__:\n        Order source columns based on schema fields order.\n        This is useful when you don't want to validate that the data\n        columns' order is the same as the schema's.\n- __error_limit (int)__:\n        Stop validation if the number of errors per table exceeds this value.\n- __table_limit (int)__:\n        Maximum number of tables to validate.\n- __row_limit (int)__:\n        Maximum number of rows to validate.\n- __preset (str)__:\n        Dataset type could be `table` (default), `datapackage`,\n        `nested` or custom. Usually, the preset can be inferred from the\n        source, so you don't need to define it.\n- __Any (Any)__:\n        Any additional arguments not defined here will be passed on,\n        depending on the chosen `preset`. If the `preset` is `table`, the\n        extra arguments will be passed on to `tabulator`_, if it is\n        `datapackage`, they will be passed on to the `datapackage`_\n        constructor.\n\n__Raises__\n- `GoodtablesException`: Raised on any non-tabular error.\n\n__Returns__\n\n`dict`: The validation report.\n\n\n### `preset`\n```python\npreset(name)\n```\nRegister a custom preset (decorator)\n\n__Example__\n\n\n```python\n@preset('custom-preset')\ndef custom_preset(source, **options):\n    # ...\n```\n\n__Arguments__\n- __name (str)__: preset name\n\n\n### `check`\n```python\ncheck(name, type=None, context=None, position=None)\n```\nRegister a custom check (decorator)\n\n__Example__\n\n\n```python\n@check('custom-check', type='custom', context='body')\ndef custom_check(cells):\n    # ...\n```\n\n__Arguments__\n- __name (str)__: preset name\n- __type (str)__: has to be `custom`\n- __context (str)__: has to be `head` or `body`\n- __position (str)__: has to be `before:\u003ccheck-name\u003e` or `after:\u003ccheck-name\u003e`\n\n\n### `Error`\n```python\nError(self, code, cell=None, row_number=None, message=None, message_substitutions=None)\n```\nDescribes a validation check error\n\n__Arguments__\n- __code (str)__: The error code. Must be one in the spec.\n- __cell (dict, optional)__: The cell where the error occurred.\n- __row_number (int, optional)__: The row number where the error occurs.\n- __message (str, optional)__:\n        The error message. Defaults to the message from the Data Quality Spec.\n- __message_substitutions (dict, optional)__:\n        Dictionary with substitutions to be used when\n        generating the error message and description.\n\n__Raises__\n- `KeyError`: Raised if the error code isn't known.\n\n\n### `spec`\ndict() -\u003e new empty dictionary\ndict(mapping) -\u003e new dictionary initialized from a mapping object's\n    (key, value) pairs\ndict(iterable) -\u003e new dictionary initialized as if via:\n    d = {}\n    for k, v in iterable:\n        d[k] = v\ndict(**kwargs) -\u003e new dictionary initialized with the name=value pairs\n    in the keyword argument list.  For example:  dict(one=1, two=2)\n### `GoodtablesException`\n```python\nGoodtablesException(self, /, *args, **kwargs)\n```\nBase goodtables exception\n\n## Contributing\n\n\u003e The project follows the [Open Knowledge International coding standards](https://github.com/okfn/coding-standards).\n\nRecommended way to get started is to create and activate a project virtual environment.\nTo install package and development dependencies into active environment:\n\n```bash\n$ make install\n```\n\nTo run tests with linting and coverage:\n\n```bash\n$ make test\n```\n\n## Changelog\n\nHere described only breaking and the most important changes. The full changelog and documentation for all released versions could be found in nicely formatted [commit history](https://github.com/frictionlessdata/goodtables-py/commits/master).\n\n##### v2.5\n\n- Added `check.check_headers_hook` to support headers check for body-contexted checks (see https://github.com/frictionlessdata/goodtables-py/tree/v3 for native support)\n\n##### v2.4\n\n- Added integrity checks for data packages. If `resource.bytes` or `resource.hash` (sha256) is provided it will be verified against actual values\n\n##### v2.3\n\n- Added a [foreign keys check](#foreign-key)\n\n##### v2.2\n\n- Improved missing/non-matching-headers detection ([#298](https://github.com/frictionlessdata/goodtables-py/issues/298))\n\n##### v2.1\n\n- A new key added to the `error.to_dict` return: `message-data`\n\n##### v2.0\n\nBreaking changes:\n\n- Checks method signature now only receives the current row's `cells` list\n- Checks raise errors by returning an array of `Error` objects\n- Cells have the row number in the `row-number` key\n- Files with ZIP extension are presumed to be datapackages, so `goodtables mydatapackage.zip` works\n- Improvements to goodtables CLI ([#233](https://github.com/frictionlessdata/goodtables-py/issues/233))\n- New `goodtables init \u003cdata paths\u003e` command to create a new `datapackage.json` with the files passed as parameters and their inferred schemas.\n\nBug fixes:\n- Fix bug with `truncated-values` check on date fields ([#250](https://github.com/frictionlessdata/goodtables-py/issues/250))\n\n##### v1.5\n\nNew API added:\n- Validation `source` now could be a `pathlib.Path`\n\n##### v1.4\n\nImproved behaviour:\n- rebased on Data Quality Spec v1\n- rebased on Data Package Spec v1\n- rebased on Table Schema Spec v1\n- treat primary key as required/unique field\n\n##### v1.3\n\nNew advanced checks added:\n- `blacklisted-value`\n- `custom-constraint`\n- `deviated-value`\n- `sequential-value`\n- `truncated-value`\n\n##### v1.2\n\nNew API added:\n- `report.preset`\n- `report.tables[].schema`\n\n##### v1.1\n\nNew API added:\n- `report.tables[].scheme`\n- `report.tables[].format`\n- `report.tables[].encoding`\n\n##### v1.0\n\nThis version includes various big changes. A migration guide is under development and will be published here.\n\n##### v0.6\n\nFirst version of `goodtables`.\n\n[tableschema]: https://specs.frictionlessdata.io/table-schema/\n[tabulator]: https://github.com/frictionlessdata/tabulator-py/\n[datapackage]: https://specs.frictionlessdata.io/data-package/ \"Data Package specification\"\n[semver]: https://semver.org/ \"Semantic Versioning\"\n[validation-jsonschema]: goodtables/schemas/report.json \"Validation Report JSON Schema\"\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrictionlessdata%2Fgoodtables-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffrictionlessdata%2Fgoodtables-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffrictionlessdata%2Fgoodtables-py/lists"}