{"id":44648551,"url":"https://github.com/fvaleye/metadata-guardian","last_synced_at":"2026-02-14T20:31:21.327Z","repository":{"id":38296818,"uuid":"416278425","full_name":"fvaleye/metadata-guardian","owner":"fvaleye","description":"Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️","archived":false,"fork":false,"pushed_at":"2026-01-15T10:02:37.000Z","size":17465,"stargazers_count":18,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-01-15T15:44:41.470Z","etag":null,"topics":["data","dataengineering","dataset","datastructures","metadata","metadata-driven","metadata-extraction","metadata-information","metadata-management","metadata-parser","pii-detection"],"latest_commit_sha":null,"homepage":"https://fvaleye.github.io/metadata-guardian/python","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fvaleye.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-10-12T09:55:19.000Z","updated_at":"2026-01-15T10:02:40.000Z","dependencies_parsed_at":"2023-10-11T04:48:48.466Z","dependency_job_id":"89eb5416-f9f6-4840-a645-88991cd46b21","html_url":"https://github.com/fvaleye/metadata-guardian","commit_stats":{"total_commits":117,"total_committers":2,"mean_commits":58.5,"dds":0.3418803418803419,"last_synced_commit":"47e0a5d9fdd957b9a2321a91802ac2a8934e4b36"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/fvaleye/metadata-guardian","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fvaleye%2Fmetadata-guardian","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fvaleye%2Fmetadata-guardian/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fvaleye%2Fmetadata-guardian/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fvaleye%2Fmetadata-guardian/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fvaleye","download_url":"https://codeload.github.com/fvaleye/metadata-guardian/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fvaleye%2Fmetadata-guardian/sbom","scorecard":{"id":415071,"data":{"date":"2025-08-11","repo":{"name":"github.com/fvaleye/metadata-guardian","commit":"9feed8a441f414595d03a674c9d4da78cb331d0c"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4.4,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/7 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":3,"reason":"4 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 3","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: jobLevel 'contents' permission set to 'write': .github/workflows/release.yml:86","Warn: no topLevel permission defined: .github/workflows/python_build.yml:1","Warn: no topLevel permission defined: .github/workflows/release.yml:1","Warn: no topLevel permission defined: .github/workflows/rust_build.yml:1"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python_build.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/python_build.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python_build.yml:20: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/python_build.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python_build.yml:48: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/python_build.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python_build.yml:51: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/python_build.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python_build.yml:89: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/python_build.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/python_build.yml:92: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/python_build.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:88: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:16: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:45: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:48: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/release.yml:61: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:64: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/release.yml/main?enable=pin","Warn: third-party GitHubAction not pinned by hash: .github/workflows/release.yml:73: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/release.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/rust_build.yml:17: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/rust_build.yml/main?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/rust_build.yml:34: update your workflow using https://app.stepsecurity.io/secureworkflow/fvaleye/metadata-guardian/rust_build.yml/main?enable=pin","Warn: containerImage not pinned by hash: Dockerfile:1","Warn: downloadThenRun not pinned by hash: Dockerfile:4","Warn: pipCommand not pinned by hash: Dockerfile:10","Warn: downloadThenRun not pinned by hash: .github/workflows/python_build.yml:26","Warn: pipCommand not pinned by hash: .github/workflows/python_build.yml:32","Warn: downloadThenRun not pinned by hash: .github/workflows/python_build.yml:57","Warn: pipCommand not pinned by hash: .github/workflows/python_build.yml:66","Warn: pipCommand not pinned by hash: .github/workflows/python_build.yml:98","Warn: downloadThenRun not pinned by hash: .github/workflows/release.yml:92","Warn: pipCommand not pinned by hash: .github/workflows/release.yml:104","Warn: downloadThenRun not pinned by hash: .github/workflows/rust_build.yml:21","Warn: downloadThenRun not pinned by hash: .github/workflows/rust_build.yml:38","Info:   0 out of  12 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   3 third-party GitHubAction dependencies pinned","Info:   0 out of   1 containerImage dependencies pinned","Info:   0 out of   6 downloadThenRun dependencies pinned","Info:   0 out of   5 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":-1,"reason":"internal error: internal error: Client.Checks.ListCheckRunsForRef: error during graphqlHandler.setupCheckRuns: non-200 OK status code: 502 Bad Gateway body: \"\u003chtml\u003e\\r\\n\u003chead\u003e\u003ctitle\u003e502 Bad Gateway\u003c/title\u003e\u003c/head\u003e\\r\\n\u003cbody\u003e\\r\\n\u003ccenter\u003e\u003ch1\u003e502 Bad Gateway\u003c/h1\u003e\u003c/center\u003e\\r\\n\u003chr\u003e\u003ccenter\u003enginx\u003c/center\u003e\\r\\n\u003c/body\u003e\\r\\n\u003c/html\u003e\\r\\n\"","details":null,"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T23:37:02.599Z","repository_id":38296818,"created_at":"2025-08-18T23:37:02.599Z","updated_at":"2025-08-18T23:37:02.599Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29455350,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-14T15:52:44.973Z","status":"ssl_error","status_checked_at":"2026-02-14T15:52:11.208Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","dataengineering","dataset","datastructures","metadata","metadata-driven","metadata-extraction","metadata-information","metadata-management","metadata-parser","pii-detection"],"created_at":"2026-02-14T20:31:19.254Z","updated_at":"2026-02-14T20:31:21.322Z","avatar_url":"https://github.com/fvaleye.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"image::logo.png[Metadata Guardian logo]\nimage:https://github.com/fvaleye/metadata-guardian/actions/workflows/python_build.yml/badge.svg[![python-build, link=https://github.com/fvaleye/metadata-guardian/actions/workflows/python_build.yml]\nimage:https://github.com/fvaleye/metadata-guardian/actions/workflows/rust_build.yml/badge.svg[![rust-build, link=https://github.com/fvaleye/metadata-guardian/actions/workflows/rust_build.yml]\nimage:https://img.shields.io/badge/docs-python-blue.svg?style=flat-square[Docs,link=https://fvaleye.github.io/metadata-guardian/python]\nimage:https://img.shields.io/pypi/v/metadata_guardian.svg?style=flat-square)[Pypi, link=https://pypi.org/project/metadata-guardian/]\n\n== 📌 Overview\nMetadata Guardian is a Python package that provides an easy way to protect your data sources by searching its metadata.\nBy searching with data rules, it will detect what you are looking to protect.\nUsing Rust, it makes blazing fast multi-regex matching.\n\nRead more in this https://medium.com/@florian.valeye/metadata-guardian-protect-your-data-by-searching-its-metadata-fe479c24f1b1[article].\n\n== 📦 Where to get it\n\n```sh\n# Install all the data sources\npip install 'metadata_guardian[all]'\n```\n\n```sh\n# Install one or more data sources from the list\npip install 'metadata_guardian[snowflake,avro,aws,gcp,deltalake,kafka_schema_registry,mysql]'\n```\n\n== 📜 Data Rules\nThe available data rules are here: *https://github.com/fvaleye/metadata-guardian/blob/main/python/metadata_guardian/rules/pii_rules.yaml[PII]* and *https://github.com/fvaleye/metadata-guardian/blob/main/python/metadata_guardian/rules/inclusion_rules.yaml[INCLUSION]*.\nBut you could also your custom data rules to suit your needs.\n\n== 📊 Data Sources\n\n=== Local\n- Parquet\n- ORC\n- AVRO\n- AVRO Schema\n- Arrow\n\n=== External\n- AWS: Athena and Glue\n- Deltalake\n- GCP: BigQuery\n- Snowflake\n- MySQL\n- Kafka Schema Registry\n\n== 🔎 Usage\n\nWith available Data Rules:\n```python\nfrom metadata_guardian import (\n    AvailableCategory,\n    ColumnScanner,\n    DataRules,\n)\nfrom metadata_guardian.source import MySQLSource\n\nsource = MySQLSource(\n        user=\"root\",\n        password=\"12345678\",\n        host=\"localhost\",\n    )\n\ndata_rules = DataRules.from_available_category(category=AvailableCategory.PII)\ncolumn_scanner = ColumnScanner(data_rules=data_rules)\n\nwith source:\n    report = column_scanner.scan_external(\n        source,\n        database_name=\"sequelmovie\",\n        include_comment=True,\n    )\n    report.to_console()\n```\n\nWith custom Data Rules:\n```python\nfrom metadata_guardian import (\n    AvailableCategory,\n    ColumnScanner,\n    DataRule,\n    DataRules,\n)\nfrom metadata_guardian.source import MySQLSource\n\nsource = MySQLSource(\n        user=\"root\",\n        password=\"12345678\",\n        host=\"localhost\",\n    )\n\ncategory = \"example\"\ndata_rule = DataRule(\nrule_name=\"example_rule_name\",\nregex_pattern=\"\\b(test|example)\\b\",\ndocumentation=\"example_test\",\n)\ndata_rules = [data_rule]\ndata_rules = DataRules.from_new_category(category=category, data_rules=data_rules)\ncolumn_scanner = ColumnScanner(\ndata_rules=data_rules, progression_bar_disabled=False\n)\n\nwith source:\n    report = column_scanner.scan_external(\n    source,\n    database_name=\"sequelmovie\",\n    include_comment=True,\n    )\n    report.to_console()\n```\n\n== 🛡️ Licence\nhttps://raw.githubusercontent.com/fvaleye/metadata-guardian/main/LICENSE.txt[Apache License 2.0]\n\n== 📚 Documentation\nThe documentation is hosted here: https://fvaleye.github.io/metadata-guardian/python/","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffvaleye%2Fmetadata-guardian","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffvaleye%2Fmetadata-guardian","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffvaleye%2Fmetadata-guardian/lists"}