{"id":14065726,"url":"https://github.com/sr-murthy/oedtools","last_synced_at":"2025-04-11T05:32:11.360Z","repository":{"id":57448344,"uuid":"198267426","full_name":"sr-murthy/oedtools","owner":"sr-murthy","description":"File validation and data sampling toolkit for the Simplitium Open Exposure Data (OED) (re)insurance exposure data format","archived":true,"fork":false,"pushed_at":"2020-05-19T22:56:10.000Z","size":2414,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-26T08:48:58.023Z","etag":null,"topics":["catastrophe-modelling","csv-validator","exposure","insurance","oed","reinsurance","risk","simplitium","validation"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/oedtools/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sr-murthy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":["sr-murthy"],"custom":null}},"created_at":"2019-07-22T16:58:41.000Z","updated_at":"2024-11-23T15:52:26.000Z","dependencies_parsed_at":"2022-09-18T12:01:55.855Z","dependency_job_id":null,"html_url":"https://github.com/sr-murthy/oedtools","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sr-murthy%2Foedtools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sr-murthy%2Foedtools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sr-murthy%2Foedtools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sr-murthy%2Foedtools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sr-murthy","download_url":"https://codeload.github.com/sr-murthy/oedtools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248347547,"owners_count":21088676,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["catastrophe-modelling","csv-validator","exposure","insurance","oed","reinsurance","risk","simplitium","validation"],"created_at":"2024-08-13T07:04:40.119Z","updated_at":"2025-04-11T05:32:10.742Z","avatar_url":"https://github.com/sr-murthy.png","language":"Python","funding_links":["https://github.com/sponsors/sr-murthy"],"categories":["Python"],"sub_categories":[],"readme":"[![LGTM Code Quality Grade: Python](https://img.shields.io/lgtm/grade/python/g/sr-murthy/oedtools.svg?logo=lgtm\u0026logoWidth=18)](https://lgtm.com/projects/g/sr-murthy/oedtools/context:python)\n[![codecov](https://codecov.io/gh/sr-murthy/oedtools/branch/master/graph/badge.svg)](https://codecov.io/gh/sr-murthy/oedtools)\n[![PyPI version](https://badge.fury.io/py/oedtools.svg)](https://badge.fury.io/py/oedtools)\n[![Build Status](https://travis-ci.com/sr-murthy/oedtools.svg?branch=master)](https://travis-ci.com/sr-murthy/oedtools)\n\n**NOTICE: Following changes in the way the OED format is managed, this project will no longer be maintained, as of Tuesday 19 May, 2020, and there will be no further public releases of the `oedtools` PyPI package. All current and earlier releases of \n`oedtools` will still be valid for OED versions 1.1.1 and earlier.**\n\n**Use of the source code is still subject to the terms of the license.**\n\n# oedtools\n\n`oedtools` is a (command-line) file validation, query and data sampling toolkit for the \u003ca href=\"https://github.com/Simplitium/OED\" target=\"_blank\"\u003eSimplitium Open Exposure Data (OED)\u003c/a\u003e (re)insurance exposure data format.\n\n**Note**: the repository and package are based on the current OED version 1.1.1 - this is stored in the \u003ca href=\"https://github.com/sr-murthy/oedtools/blob/master/oedtools/schema/schema_version.txt\" target=\"_blank\"\u003eschema version\u003c/a\u003e file.\n\nThe main user-level features currently include\n\n* **validating files** - validation (headers + data) of OED account (`acc`), location (`loc`), reinsurance info. (`reinsinfo`) and reinsurance scope (`reinsscope`) input CSV files\n* **querying schemas** - querying of columns in the various schemas based on properties such as headers (column names) or header substrings, column descriptions containing keywords, Python, SQL or Numpy data types, default values, and required and/or nonnull properties\n* **sampling columns** - sampling of column data, consistent with the column range or data type range or a specific column validation function\n\n(The query toolkit will be augmented in future releases with the ability to query the values profile, which currently can only be examined directly as a dict.)\n\nValidation, querying and sampling are all based on two types of interrelated but independent data structures built in to the package.\n\n* **file schemas** - separate JSON files for the \u003ca href=\"https://github.com/sr-murthy/oedtools/blob/master/oedtools/schema/acc_schema.json\" target=\"_blank\"\u003e acc.\u003c/a\u003e, \u003ca href=\"https://github.com/sr-murthy/oedtools/blob/master/oedtools/schema/loc_schema.json\" target=\"_blank\"\u003eloc.\u003c/a\u003e, \u003ca href=\"https://github.com/sr-murthy/oedtools/blob/master/oedtools/schema/reinsinfo_schema.json\" target=\"_blank\"\u003ereins. info.\u003c/a\u003e and \u003ca href=\"https://github.com/sr-murthy/oedtools/blob/master/oedtools/schema/reinsscope_schema.json\" target=\"_blank\"\u003ereins. scope\u003c/a\u003e files defining the properties of each column in each file\n* a **values profile** - a \u003ca href=\"https://github.com/sr-murthy/oedtools/blob/master/oedtools/schema/values.json\" target=\"_blank\"\u003eJSON profile\u003c/a\u003e of the data that the files can store, but independent of considerations of the column structure, including categories and subcategories of values, column headers and specific column ranges associated with the subcategories (if they exist), and column data validation and sampling methods (where available).\n\nThe file schemas define the column structure of each type of OED file and provide a \"file view\" of the OED data model, and the values\nprofile defines the properties of the data that occur in the columns and provides a \"data model view\" of the OED files.\n\n## Installation and Requirements\n\nInstallation is via `pip` (Python 3).\n\n    pip install oedtools\n\nThe package requires a Python \u003e=3.6 interpreter. It is best to install and use the package in a Python virtual environment.\n\n## Features\n\nThe command line interface is invoked via `oed` and provides three main command groups.\n\n* `validate` (`oed validate`) - for validating files (column headers + data), or only the headers in files\n* `query` (`oed query`) - for querying schema columns based on various schema properties\n* `sample` (`oed sample`) - for sampling column data\n\nThere is a `version` command for getting OED schema version (currently `1.1.1`) the package uses, or the package version (currently `1.0.2`). The usage is\n\n    $ oed version\n    1.1.1\n\n    $ oed version --package\n    1.0.2\n\n### Validation\n\n#### Files (headers + data)\n\nFile validation is performed via `oed validate file`, and includes validation of the column headers and data.\n\n    usage: oed validate file [-h] -f INPUT_FILE_PATH -t SCHEMA_TYPE\n    \n    optional arguments:\n      -h, --help            show this help message and exit\n      -f INPUT_FILE_PATH, --input-file-path INPUT_FILE_PATH\n                            OED input file path\n      -t SCHEMA_TYPE, --schema-type SCHEMA_TYPE\n                            File schema type - \"loc\", \"acc\", \"reinsinfo\", or\n                            \"reinsscope\"\n\nHeaders and data are validated separately, and a combined status report is printed to the console, e.g.\n\n    (myvenv) $ oed validate file -t 'loc' -f /path/to/location.csv\n    /path/to/location.csv:11:40: Invalid value \"WWTC;WEC;BFR;OO1\" in \"LocPerilsCovered\" - check the column or data type range: OED error: E371 Out of range data found in column\n\n    /path/to/location.csv:18:493: Invalid value \"-25000\" in \"LocMinDed6All\" - check the column or data type range: OED error: E371 Out of range data found in column\n\n    /path/to/location.csv:1:870: \"SubArea\" is not a valid column in any OED schema: OED error: E303 Not a valid column in any OED schema\n\n    /path/to/location.csv:1:870: \"SubArea\" is an invalid column in the OED \"loc\" schema: OED error: E304 Not a valid column in the given OED schema\n\n    /path/to/location.csv:2:928: Invalid data type for value \"s\" in \"CondPriority\" - expected type \"\u003cclass 'int'\u003e\", found type \"\u003cclass 'str'\u003e\": OED error: E351 Invalid data type(s) in column\n\n    /path/to/location.csv:1:-1: \"LocCurrency\" is a required column in an OED \"loc\" file but is missing: OED error: E331 Missing required column in file\n\nIf there are no errors in the file no output will be produced.\n\n    (myvenv) $ oed validate file -t 'acc' -f /path/to/account.csv\n    (myvenv) $\n\nHeader-related errors currently include\n\n* **non-OED headers** - headers not currently defined in any OED schema\n* **incompatible OED headers** - (OED) headers in a file incompatible with the file schema\n* **required but missing** headers - headers which are mandatory in a given file schema but not present in an actual input file\n\nData-related errors currently include\n\n* **null values in non-null columns** - a non-null column is defined as a column which must not contain any null values\n* **column values with incompatible data types** - values with data types inconsistent with the column data type, as defined in the given schema, e.g. string values in an integer or floating point column\n* **out of range values** - values not in the defined range of a column (this can be either a specific column range defined in the values profile, or a range inferred from the column data type defined in the schema)\n\n**Note**: data validation (and sampling) is facilitated via the \u003ca href=\"https://github.com/sr-murthy/oedtools/blob/master/oedtools/schema/values.json\" target=\"_blank\"\u003evalues profile\u003c/a\u003e, which defines the categories and subcategories of data values that can occur in the various columns, independently of the schemas. The values profile defines, where applicable, the ranges of values associated with each subcategory and links these ranges to columns in the relevant schemas. It also defines, where applicable, methods for validation and sampling. Currently, the categories of data covered by the values profile include\n\n* **area codes**\n* **attachments**\n* **construction codes**\n* **country codes**\n* **coverage types**\n* **currencies**\n* **deductible codes**\n* **deductible types**\n* **deductibles**\n* **geocoding**\n* **limit codes**\n* **limit types**\n* **limits**\n* **location properties**\n* **occupancy types**\n* **peril codes**\n* **reins. percentages**\n* **reins. risk levels**\n* **reins. types**\n* **shares**\n* **TIVs**\n* **years**\n\nThis will be extended in future releases to cover all possible values.\n\n#### Headers\n\nThis works in a very similar way to file validation, except that it is only for validating the headers in a given file. The headers can be provided either by providing a file path, or as a comma-separated string in quotation marks, e.g.\n\n    (myvenv) $ oed validate headers -t 'loc' -f /path/to/location.csv\n    /path/to/location.csv:1:870: \"SubArea\" is not a valid column in any OED schema: OED error: E303 Not a valid column in any OED schema\n\n    /path/to/location.csv:1:870: \"SubArea\" is an invalid column in the OED \"loc\" schema: OED error: E304 Not a valid column in the given OED schema\n\n    /path/to/location.csv:1:-1: \"LocCurrency\" is a required column in an OED \"loc\" file but is missing: OED error: E331 Missing required column in file\n\n### Querying\n\nSchema columns can be queried using `oed query` - results are always printed to console as JSON, in ascending alphabetic order by (case insensitive) header.\n\n    usage: oed query [-h] [-t SCHEMA_TYPES] [-m COLUMN_HEADERS] [-d DESCRIPTIONS]\n                     [-r REQUIRED] [-n] [-e DEFAULTS] [-p PYTHON_DTYPES]\n                     [-s SQL_DTYPES] [-y NUMPY_DTYPES] [-a]\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      -t SCHEMA_TYPES, --schema-types SCHEMA_TYPES\n                            List of file schema types; must be one of \"acc\",\n                            \"loc\", \"reinsinfo\", \"reinsscope\" - a comma-separated\n                            string enclosed in quotation marks\n      -m COLUMN_HEADERS, --column-headers COLUMN_HEADERS\n                            List of column headers or header substrings - a comma-\n                            separated string enclosed in quotation marks\n      -d DESCRIPTIONS, --descriptions DESCRIPTIONS\n                            List of column descriptions or description substrings\n                            - a comma-separated string enclosed in quotation marks\n      -r REQUIRED, --required REQUIRED\n                            Is the column required (R), conditionally required\n                            (CR) or optional (O)?\n      -n, --nonnull         Is the column required not to have any null values?\n      -e DEFAULTS, --defaults DEFAULTS\n                            List of default values - a comma-separated string\n                            enclosed in quotation marks\n      -p PYTHON_DTYPES, --python-dtypes PYTHON_DTYPES\n                            List of Python data types - only \"int\", \"float\", \"str\"\n                            are supported; a comma-separated string enclosed in\n                            quotation marks\n      -s SQL_DTYPES, --sql-dtypes SQL_DTYPES\n                            List of SQL data types - a comma-separated string\n                            enclosed in quotation marks\n      -y NUMPY_DTYPES, --numpy-dtypes NUMPY_DTYPES\n                            List of Numpy data types - a comma-separated string\n                            enclosed in quotation marks\n      -a, --headers-only    Only return the column headers\n\nHere are five queries that illustrate the possibilities of `oed query`.\n\n1. Display full column information for the `BuildingTIV` and `BITIV` columns only (header names are case insensitive in the query).\n\n        (myvenv) $ oed query -m 'buildingtiv, bitiv'\n        [\n            {\n                \"blank\": false,\n                \"column_range\": [\n                    0.0,\n                    3.4e+38\n                ],\n                \"column_sampling\": \"column range\",\n                \"column_validation\": \"column range\",\n                \"default\": null,\n                \"desc\": \"Business Interruption (BI) Total Insured Value\",\n                \"dtype_range\": [\n                    -3.4e+38,\n                    3.4e+38\n                ],\n                \"entity\": \"Loc\",\n                \"field_name\": \"BITIV\",\n                \"numpy_dtype\": \"float32\",\n                \"oed_db_field_name\": null,\n                \"oed_db_table\": \"Locations\",\n                \"py_dtype\": \"float\",\n                \"required\": \"R\",\n                \"secmod\": null,\n                \"sql_dtype\": \"real\"\n            },\n            {\n                \"blank\": false,\n                \"column_range\": [\n                    0.0,\n                    3.4e+38\n                ],\n                \"column_sampling\": \"column range\",\n                \"column_validation\": \"column range\",\n                \"default\": null,\n                \"desc\": \"Building Total Insured Value\",\n                \"dtype_range\": [\n                    -3.4e+38,\n                    3.4e+38\n                ],\n                \"entity\": \"Loc\",\n                \"field_name\": \"BuildingTIV\",\n                \"numpy_dtype\": \"float32\",\n                \"oed_db_field_name\": null,\n                \"oed_db_table\": \"Locations\",\n                \"py_dtype\": \"float\",\n                \"required\": \"R\",\n                \"secmod\": null,\n                \"sql_dtype\": \"real\"\n            }\n        ]\n\n    **Note**: the schema type (specified using option `-t`) isn't required if the columns you're looking for are unique.\n\n2. Display the headers only of all columns in the loc. file schema with the header substring `6all` and with the `int` or `float` (Python) data type.\n\n        (myvenv) $ oed query -t 'loc' -m '6all' -p 'int, float' --headers-only\n        [\n            \"LocDed6All (Loc)\",\n            \"LocDedCode6All (Loc)\",\n            \"LocDedType6All (Loc)\",\n            \"LocLimit6All (Loc)\",\n            \"LocLimitCode6All (Loc)\",\n            \"LocLimitType6All (Loc)\",\n            \"LocMaxDed6All (Loc)\",\n            \"LocMinDed6All (Loc)\"\n        ]\n\n    **Note 1**: as some OED column headers indicate coverage type at the tail end of the header (`1building`, `2other`, `3contents`, `4bi`, `5pd`, `6all`), the header substring option `-m` can be used, as above, to search for columns based on coverage type.\n\n    **Note 2**: The schema type is displayed in parentheses for clarity, as some columns like `LocNumber` and `AccNumber` can be present in different file types (`LocNumber` can occur in a ``loc`` or ``reinsscope`` file, and `AccNumber` can occur in a `loc` or `acc` or `reinsscope` file).\n\n3. Display the headers only of all required and non-null columns in the acc. file schema.\n\n        (myvenv) $ oed query -t 'acc' -r 'R' --nonnull --headers-only\n        [\n            \"AccCurrency (Acc)\",\n            \"AccNumber (Acc)\",\n            \"PolNumber (Acc)\",\n            \"PolPerilsCovered (Acc)\",\n            \"PortNumber (Acc)\"\n        ]\n\n4. Display the headers only of all required or conditionally required columns in the reins. info. file schema.\n\n        (myvenv) $ oed query -t 'reinsinfo' -r 'R,CR' --headers-only\n        [\n            \"InuringPriority (ReinsInfo)\",\n            \"PlacedPercent (ReinsInfo)\",\n            \"ReinsCurrency (ReinsInfo)\",\n            \"ReinsNumber (ReinsInfo)\",\n            \"ReinsPeril (ReinsInfo)\",\n            \"ReinsType (ReinsInfo)\"\n        ]\n\n5. Display the headers only of all columns in all the schemas whose descriptions contain the keyword \"percent\", i.e. we're looking here for all percentage-valued columns.\n\n        (myvenv) $ oed query -d 'percent' --headers-only\n        [\n            \"BrickVeneer (Loc)\",\n            \"BuildingExteriorOpening (Loc)\",\n            \"CededPercent (ReinsInfo, ReinsScope)\",\n            \"DeemedPercentPlaced (ReinsInfo)\",\n            \"LocParticipation (Loc)\",\n            \"PercentComplete (Loc)\",\n            \"PercentSprinklered (Loc)\",\n            \"PlacedPercent (ReinsInfo)\",\n            \"ScaleFactor (Acc)\",\n            \"SurgeLeakage (Loc)\",\n            \"TreatyShare (ReinsInfo)\"\n        ]\n\n### Sampling\n\nColumns can be sampled using `oed sample`.\n\n    (myvenv) $ oed sample --help\n    usage: oed sample [-h] -t SCHEMA_TYPE -m COLUMN_HEADER\n                              [-n SAMPLE_SIZE]\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      -t SCHEMA_TYPE, --schema-type SCHEMA_TYPE\n                            List of file schema types; must be one of \"acc\",\n                            \"loc\", \"reinsinfo\", \"reinsscope\" - a comma-separated\n                            string enclosed in quotation marks\n      -m COLUMN_HEADER, --column-header COLUMN_HEADER\n                            Column header\n      -n SAMPLE_SIZE, --sample-size SAMPLE_SIZE\n                            Sample size\n\nHere are three examples.\n\n1. Sampling reins. peril code sequences \n\n        (myvenv) $ oed sample -t 'loc' -m 'locperil'\n        [\n            \"BBF;QEQ;WSS;ZIC\",\n            \"ORF;QEQ;QLS;QQ1\",\n            \"AA1;BB1;QEQ;ZST\",\n            \"BB1;MNT;QLS;ZIC\",\n            \"MTR;QSL;WTC;ZZ1\",\n            \"BSK;QSL;WTC;WW2\",\n            \"BSK;QEQ;QSL;WW2\",\n            \"MNT;QEQ;XX1;ZST\",\n            \"BFR;OO1;WEC;XX1\",\n            \"QQ1;WW1;XX1;ZIC\"\n        ]\n\n    **Note 1**: sample size can be specified using the `-n` option, which has the default value of `10`.\n\n    **Note 2**: Column sampling is based on the values profile - this describes properties of OED data and is organized by groups and subgroups. This means that sampling a column whose values fall in the same group in the values profile as that of another column will produce similar results, e.g. sampling `LocPeril` will produce identical results to sampling `AccPeril` or `ReinsPeril`, because all grouped under `peril codes` in the values profile.\n\n2. Sampling reins. info. currency codes.\n\n        (myvenv) $ oed sample -t 'reinsinfo' -m 'reinscurrency'\n        [\n            \"MOP\",\n            \"SUR\",\n            \"YER\",\n            \"HKD\",\n            \"ROL\",\n            \"JOD\",\n            \"RUR\",\n            \"GHS\",\n            \"MNT\",\n            \"BYB\"\n        ]\n\n3. Sampling loc. occupancy codes.\n\n        (myvenv) $ oed sample -t 'loc' -m 'occupancycode'\n        [\n            3643,\n            2696,\n            3753,\n            3743,\n            1126,\n            1382,\n            2608,\n            3951,\n            2392,\n            2163\n        ]\n\n## Docker version\n\nThe package also also be used in an (Ubuntu) Docker container and a Docker file is available for building the image - to build the image run this command (from the base of the repository):\n\n    $ docker build -f ./Dockerfile -t \u003cimage name\u003e .\n\nTo run the image in a container and enter the container in a Bash shell use this command:\n\n    $ docker run --name \u003ccontainer name\u003e -itd \u003cimage name\u003e \u0026\u0026 docker exec -it \u003ccontainer name\u003e bash\n\nThe OED tools package will be available via the `oed` binary.\n\n    root@b7a8467f92d4:/usr/local/data# oed\n    usage: oed [-h] {query,sample,validate,version} ...\n\n    Root command\n\n    positional arguments:\n      {query,sample,validate,version}\n        query               query\n        sample              sample\n        validate            validate\n        version             version\n\n    optional arguments:\n      -h, --help            show this help message and exit\n\n## Contributors\n\nDeveloper contributions are welcome, in the usual way - fork the repository; create a feature and/or fix branch off `master`; make, test and commit your changes to the branch; create a PR from the base branch against this repository. Linting the code with PEP8 and/or Flake8 would be appreciated (ignoring E501). The test runner is `pytest`. Run all the tests (from the repo. root) with\n\n    $ pytest -v tests\n\nTo run a specific test module use\n\n    $ pytest -v tests/\u003ctest module name\u003e.py\n\nTo run a run specific test class in a test module use\n\n    $ pytest -v tests/\u003ctest module name\u003e.py::\u003ctest class name\u003e\n\nTo run a run specific test case in a test class in a test module use\n\n    $ pytest -v tests/\u003ctest module name\u003e.py::\u003ctest class name\u003e::\u003ctest case name\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsr-murthy%2Foedtools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsr-murthy%2Foedtools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsr-murthy%2Foedtools/lists"}