{"id":18513137,"url":"https://github.com/cody-scott/arclint","last_synced_at":"2025-05-14T12:17:04.099Z","repository":{"id":175122029,"uuid":"279322737","full_name":"cody-scott/ArcLint","owner":"cody-scott","description":"A flexible tool to validate and improve your data in ArcGIS using regex and other methods","archived":false,"fork":false,"pushed_at":"2021-05-19T21:22:08.000Z","size":15,"stargazers_count":0,"open_issues_count":4,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-25T21:09:33.552Z","etag":null,"topics":["arcgis","arcgispro","data","lint","regex","validation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cody-scott.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-07-13T14:17:25.000Z","updated_at":"2021-05-19T21:22:10.000Z","dependencies_parsed_at":"2023-12-19T01:01:32.435Z","dependency_job_id":"5266787e-d862-4d4f-8e5b-55b353f9e039","html_url":"https://github.com/cody-scott/ArcLint","commit_stats":null,"previous_names":["namur007/arclint","cody-scott/arclint"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cody-scott%2FArcLint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cody-scott%2FArcLint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cody-scott%2FArcLint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cody-scott%2FArcLint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cody-scott","download_url":"https://codeload.github.com/cody-scott/ArcLint/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239225853,"owners_count":19603189,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arcgis","arcgispro","data","lint","regex","validation"],"created_at":"2024-11-06T15:36:37.931Z","updated_at":"2025-02-17T03:19:38.987Z","avatar_url":"https://github.com/cody-scott.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **ArcLint**\n\nA tool to let you create data validation rules using flexible data patterns with regex and ranges within ArcGIS.\n\nFlexible, repeatable and open, by using regex patterns to apply data validation to your fields, or groups of fields, to help clean and flag data issues in any table readable by ArcGIS\n\n## **Why**\n\nMany of these data checks should be completed with domains. This does not replace domains as a number of these parameters can be solved by using domains to restrict inputs.\n\nThat said, there is cases when a domain is unable to fully match data for cleaning. Examples of this are fields with blank values, leading or trailing whitespace in text, line returns and so forth. Domains are unable to flag these errors, which is where this tool can help improve the data consistancy and structure.\n\nAt its root the regex patterns give you flexibility to produce complex rules to flag your data. They must conform to the python spec for the re module for regex \n\n**[re link](https://docs.python.org/3/library/re.html)**\n\n## **Input**\n\nYour patterns and rules should be defined in a .json file with the following specifications. See the example folder for examples of the different types.\n\n## **Structure**\n\n### **Rules**\n\neach rule, as a dictionary, should at minimum contain a \"ruleName\". If the rule is not a global rule (described below), it should also contain a \"type\" key indicating what type of rule it is in addition to the type specific fields. \n\nSee below for the various rule types available. Each rule type will expect different values beyond two required below.\n\n    {\n        'ruleName': 'sample_rule',\n        'type': 'regex'\n    }\n\n### **Fields**\n\nwithin the top level you define which fields you want to capture and apply the rules to. the root level should have a ***\"fields\"*** key with an array as a value containing the indvidual fields to validate.\n\n    {\n        \"fields\": []\n    }\n\nEach value of the array should follow the field structure, specifying the name of the field, and an array of rules to process.\n\n    {\n        \"fieldName\": \"some field name\" # field name should be a string\n        \"rules\": [] # an array of rules to apply\n    }\n\n\nHere is an example of regex rule that matches the text ***Site A*** within a ***SiteName*** field.\n\n    {\n    \"fields\": [\n        { \n            \"fieldName\": \"SiteName\",\n            \"rules\": [\n                {\n                    \"ruleName\": \"site_rule\",\n                    \"type\": \"regex\",\n                    \"pattern\": \"(Site A)\",\n                }\n            ]\n        }\n    ]\n    }\n\nEach rule should have a unique rule name for that field. Rules names may be duplicated across different fields, but must be unique for each rule in a single field.\n\n## **Global Rules**\n\nIf you would like to apply a rule to many different fields, you should create a global rule and specify it in the rules array. Like within the field rules, global rules should be unique to its scope.\n\nAt the root level of your json file, a \"globalRule\" should be defined. This should have \"globalRules\" as the key and an array as the value.\n\n    {\n        \"globalRules\": []\n    }\n\nThe rules should follow the same structure as above, with a \"ruleName\", \"type\" and other required parameters.\n\nTo use the global rule, simply specify its ruleName as the ruleName value within your fields rule array. See below for example.\n\nThis rule can then be shared across multiple fields simply by specifying it again.\n\n    {\n        \"globalRules\": [\n            {\n                \"ruleName\": \"build_year\",\n                \"type\": \"range\",\n                \"fromValue\": 1960,\n                \"toValue\": 2000\n            }\n        ],\n        \"fields\": [\n            { \n                \"fieldName\": \"SiteName\",\n                \"rules\": [\n                    {\n                        \"ruleName\": \"site_rule\",\n                        \"type\": \"regex\",\n                        \"pattern\": \"(Site A)\"\n                    }\n                ]\n            },\n            {\n                \"fieldName\": \"BuildYear\",            \n                \"rules\": [\n                    {\n                        \"ruleName\": \"build_year\"\n                    }\n                ]\n            }\n        ]\n    }\n\n\n\n## **Rule Groups**\n\nSince rules are evaluated on a per-row + per-field basis, rules can be combined into rule groups to validate across many fields of a single row. An example of this would be validating that the site name is \"Site A\" and it is built between 1960 and 1980. \n\nFurther, you can also specify to require that all items within the rule group match, or any of the items by specifying a \"match\" parameter.\n\nLike the fields and global rules, a key of ***\"ruleGroups\"*** is required, with an array of your rule groups. \n\nEach group should have a ***groupName*** and an array of \"rules\" specifying which rules apply to that group.\n\neach rule should be a dictionary specifying the field name and the rule within that field. Finally a description can be added to provide details of the rule group.\n\nSee examples/rule_groups.json or examples/global_rule_groups.json for a full specification.\n\n    \"ruleGroups\": [\n            {\n                \"groupName\": \"site_group_rule\",\n                \"description\": \"Site A built between 1960 and 2000\",\n                \"rules\": [\n                    {\"fieldName\": \"SiteName\", \"ruleName\": \"site_rule\"},\n                    {\"fieldName\": \"BuildYear\", \"ruleName\": \"build_year\"}\n                ],\n                \"match\": \"all\"\n            }\n        ]\n\n\n## **Output**\n\nthe tool with output a results.json file to the specified folder containing the OID of the rows with errors.\n\nIt will specify the individual rows that had field errors as well as the rows that match your group rules.\n\n    {\n        \"run_datetime\": \"2020-07-13 14:58:52\",\n        \"fields\": {\n            \"YR_INST\": [\n                {\n                    \"ruleName\": \"1963\",\n                    \"errorIDs\": [\n                        2,\n                        3,\n                        4,\n                        5,\n                        6,\n                        9,\n                        10,\n                        11\n                    ]\n                }\n            ]\n        },\n        \"groups\": {\n            \"facility_test\": {\n                \"errorIDs\": [\n                    5,\n                    6\n                ],\n                \"description\": \"Is it facility, in zone 2e and outside of range 1960 to 2000\"\n            }\n        }\n    }\n\n\n# Specifications.\n\n## **Nomenclature**\n\nkeys should follow camelCase structure.\n\n    {\n        \"globalRules\": [\u003crule\u003e], -\u003e series of rule objects\n        \"fields\": [\u003cfield object\u003e], -\u003e series of field objects\n        \"ruleGroups\": [\u003crule group objects\u003e] \u003e series of rule group objects\n    }\n\n### **Field Object**\n\n    {\n        \"fieldName\": \"\" -\u003e required name of field\n        \"rules\": [\u003crule\u003e] -\u003e \n    }\n\n### **Rule Group Object**\n\n    {\n        \"groupName\": \"\",\n        \"description\": \"\",\n        \"rules\": [\n            {\"fieldName\": \"\", \"ruleName\": \"\"} -\u003e this is an array of objects indicating the field name and the rule name within that field\n        ]\n    }\n    \n\n\n\n\n## **Rules**\n\n    Required\n    \"ruleName\": \"\" -\u003e string of rule name or global rule name. only one required if global rule used\n\n    Optional\n    \"output\": true or false -\u003e show this rule in output data. Default = true\n\n**Regex Rule**\n    \n    \"type\": \"regex\" -\u003e type of rule\n    \"pattern\": \"\" -\u003e regex pattern to use\n    \"flags\": [\n        \"IGNORECASE\",\n        \"LOCALE\",\n        \"MULTILINE\",\n        \"DOTMATCH\",\n        \"UNICODE\",\n        \"VERBOSE\"\n    ] -\u003e none or any of these flags\n\n\n**Range Rule**\n\n    \"type\": \"range\" -\u003e type of rule\n    \"fromValue\": 0 -\u003e number of from value,\n    \"toValue\": 0 -\u003e number of to value\n    \"outside\": true or false -\u003e boolean flag to mark values inside or outside of range as accepted\n    \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcody-scott%2Farclint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcody-scott%2Farclint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcody-scott%2Farclint/lists"}