{"id":23125118,"url":"https://github.com/statisticsnorway/microdata-tools","last_synced_at":"2025-08-17T03:32:43.802Z","repository":{"id":176854772,"uuid":"652607419","full_name":"statisticsnorway/microdata-tools","owner":"statisticsnorway","description":"Tools for the microdata.no platform","archived":false,"fork":false,"pushed_at":"2025-08-12T08:35:59.000Z","size":922,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-08-12T10:26:08.232Z","etag":null,"topics":["backstage"],"latest_commit_sha":null,"homepage":"https://statisticsnorway.github.io/microdata-tools/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/statisticsnorway.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-06-12T12:25:03.000Z","updated_at":"2025-08-12T08:35:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"1fbad0b1-f2ab-484c-9e19-6c285f9775e3","html_url":"https://github.com/statisticsnorway/microdata-tools","commit_stats":null,"previous_names":["statisticsnorway/microdata-tools"],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/statisticsnorway/microdata-tools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticsnorway%2Fmicrodata-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticsnorway%2Fmicrodata-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticsnorway%2Fmicrodata-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticsnorway%2Fmicrodata-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/statisticsnorway","download_url":"https://codeload.github.com/statisticsnorway/microdata-tools/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/statisticsnorway%2Fmicrodata-tools/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270803014,"owners_count":24648682,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-17T02:00:09.016Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backstage"],"created_at":"2024-12-17T08:12:36.522Z","updated_at":"2025-08-17T03:32:43.425Z","avatar_url":"https://github.com/statisticsnorway.png","language":"Python","readme":"# microdata-tools\nTools for the [microdata.no](https://www.microdata.no/) platform\n\n## Installation\n`microdata-tools` can be installed from PyPI using pip:\n```\npip install microdata-tools\n```\n\n## Usage\nOnce you have your metadata and data files ready to go, they should be named and stored like this:\n```\nmy-input-directory/\n    MY_DATASET_NAME/\n        MY_DATASET_NAME.csv\n        MY_DATASET_NAME.json\n```\nThe CSV file is optional in some cases.\n\n### Package dataset\nThe `package_dataset()` function will encrypt and package your dataset as a tar archive. The process is as follows:\n\n1. Generate the symmetric key for a dataset.\n2. Encrypt the dataset data (CSV) using the symmetric key and store the encrypted file as `\u003cDATASET_NAME\u003e.csv.encr`\n3. Encrypt the symmetric key using the asymmetric RSA public key `microdata_public_key.pem` \n   and store the encrypted file as `\u003cDATASET_NAME\u003e.symkey.encr`\n4. Gather the encrypted CSV, encrypted symmetric key and metadata (JSON) file in one tar file.\n\n### Unpackage dataset\nThe `unpackage_dataset()` function will untar and decrypt your dataset using the `microdata_private_key.pem`\nRSA private key.\n\nThe packaged file has to have the `\u003cDATASET_NAME\u003e.tar` extension. Its contents should be as follows:\n\n```\u003cDATASET_NAME\u003e.json``` : Required medata file.\n\n```\u003cDATASET_NAME\u003e.csv.encr``` : Optional encrypted dataset file.\n\n```\u003cDATASET_NAME\u003e.symkey.encr``` : Optional encrypted file containing the symmetrical key used to decrypt the dataset file. Required if the `.csv.encr` file is present.\n\nDecryption uses the RSA private key located at ```RSA_KEY_DIR```.\n\nThe packaged file is then stored in `output_dir/archive/unpackaged` after a successful run or `output_dir/archive/failed` after an unsuccessful run.\n\n## Example\nPython script that uses a RSA public key named `microdata_public_key.pem` and packages a dataset:\n\n```py\nfrom pathlib import Path\nfrom microdata_tools import package_dataset\n\nRSA_KEYS_DIRECTORY = Path(\"tests/resources/rsa_keys\")\nDATASET_DIRECTORY = Path(\"tests/resources/input_package/DATASET_1\")\nOUTPUT_DIRECTORY = Path(\"tests/resources/output\")\n\npackage_dataset(\n   rsa_keys_dir=RSA_KEYS_DIRECTORY,\n   dataset_dir=DATASET_DIRECTORY,\n   output_dir=OUTPUT_DIRECTORY,\n)\n```\n\n### Validation\n\nOnce you have your metadata and data files ready to go, they should be named and stored like this:\n```\nmy-input-directory/\n    MY_DATASET_NAME/\n        MY_DATASET_NAME.csv\n        MY_DATASET_NAME.json\n```\nNote that the filename only allows upper case letters A-Z, number 0-9 and underscores.\n\n\nImport microdata-tools in your script and validate your files:\n```py\nfrom microdata_tools import validate_dataset\n\nvalidation_errors = validate_dataset(\n    \"MY_DATASET_NAME\",\n    input_directory=\"path/to/my-input-directory\"\n)\n\nif not validation_errors:\n    print(\"My dataset is valid\")\nelse:\n    print(\"Dataset is invalid :(\")\n    # You can print your errors like this:\n    for error in validation_errors:\n        print(error)\n```\n\n For a more in-depth explanation of usage visit [the usage documentation](https://statisticsnorway.github.io/microdata-tools/usage.md).\n\n ### Data format description\nA dataset as defined in microdata consists of one data file, and one metadata file.\n\nThe data file is a csv file seperated by semicolons. A valid example would be:\n```csv\n000000000000001;123;2020-01-01;2020-12-31;\n000000000000002;123;2020-01-01;2020-12-31;\n000000000000003;123;2020-01-01;2020-12-31;\n000000000000004;123;2020-01-01;2020-12-31;\n```\nRead more about the data format and columns in [the documentation](https://statisticsnorway.github.io/microdata-tools/).\n\nThe metadata files should be in json format. The requirements for the metadata is best described through the [Pydantic model](/microdata_tools/validation/model/metadata.py), [the examples](/docs/examples), and [the metadata model](https://statisticsnorway.github.io/microdata-tools/metadata-model/).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatisticsnorway%2Fmicrodata-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstatisticsnorway%2Fmicrodata-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstatisticsnorway%2Fmicrodata-tools/lists"}