{"id":22017842,"url":"https://github.com/shoprunner/baleen","last_synced_at":"2025-05-07T03:10:02.739Z","repository":{"id":32998562,"uuid":"136413940","full_name":"ShopRunner/baleen","owner":"ShopRunner","description":"Kotlin DSL for validating data (JSON, XML, CSV, Avro)","archived":false,"fork":false,"pushed_at":"2023-07-24T04:06:06.000Z","size":717,"stargazers_count":17,"open_issues_count":22,"forks_count":5,"subscribers_count":49,"default_branch":"master","last_synced_at":"2025-05-07T03:09:55.381Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShopRunner.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-07T02:57:43.000Z","updated_at":"2025-04-15T23:30:07.000Z","dependencies_parsed_at":"2023-01-14T22:59:01.442Z","dependency_job_id":null,"html_url":"https://github.com/ShopRunner/baleen","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShopRunner%2Fbaleen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShopRunner%2Fbaleen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShopRunner%2Fbaleen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShopRunner%2Fbaleen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShopRunner","download_url":"https://codeload.github.com/ShopRunner/baleen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252804219,"owners_count":21806771,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-30T05:08:23.317Z","updated_at":"2025-05-07T03:10:02.722Z","avatar_url":"https://github.com/ShopRunner.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Maven Central](https://img.shields.io/maven-central/v/com.shoprunner/baleen.svg?label=Maven%20Central)](https://search.maven.org/search?q=g:%22com.shoprunner%22%20AND%20%22baleen%22)\n\n# Baleen\n\nBaleen is fluent Kotlin DSL for validating data (JSON, XML, CSV, Avro)\n\n## Features\n\n- [Validating JSON](./baleen-json-jackson)\n- [Validating CSV](./baleen-csv)\n- [Validating XML](./baleen-xml)\n- [Generate JSON Schema from Baleen data description](./baleen-jsonschema-generator)\n- [Generate Avro Schema from Baleen data description](./baleen-avro-generator)\n- [Generate XSD Schema from Baleen data description](./baleen-xsd-generator)\n- [Generate Kotlin data classes from Baleen schema](./baleen-poet)\n- [Generate Baleen data description from Kotlin data class](./baleen-kotlin)\n- [Generate Baleen data description from JSON Schema](./jsonschema-baleen-generator)\n- [Generate Baleen data description from AVRO Schema](./baleen-avro-generator)\n\n## Example Baleen Data Description\n\n```kotlin\nimport com.shoprunner.baleen.Baleen.describeAs\nimport com.shoprunner.baleen.ValidationError\nimport com.shoprunner.baleen.dataTrace\nimport com.shoprunner.baleen.types.StringType\n\nval departments = listOf(\"Mens\", \"Womens\", \"Boys\", \"Girls\", \"Kids\", \"Baby \u0026 Toddler\")\n\nval productDescription = \"Product\".describeAs {\n\n    \"sku\".type(StringType(min = 1, max = 500),\n          required = true)\n\n    \"brand_manufacturer\".type(StringType(min = 1, max = 500),\n          required = true)\n\n    \"department\".type(StringType(min = 0, max = 100))\n         .describeAs {\n             test(\"department is correct value\") { data -\u003e\n                 assertThat(data).hasAttribute(\"department\") {\n                     it.isOneOf(departments)\n                 }\n             }\n         }\n}\n\n// Get your data\nval data: Data = // get from file or database or whatever \n\n// Get Validation Results\nval validation: Validation = dataDesc.validate(data)\n\n// Each call on `isValid` and `results` will iterate over dataset again. \n// Warning: that for large datasets this will eat memory\nval cachedValidation: CachedValidation = validation.cache()\n\n// Check if any errors. True if no errors, false otherwise. \n// val isValid: Boolean = validation.isValid()\nval isValid: Boolean = cachedValidation.isValid() \n\n// Iterate over results. Each iteration over results will execute entire flow again unless cached.\n// validation.results.forEach { }\ncachedValidation.results.forEach { }\n\n// Use `watch` to print to Console summaries every 1,000 results\ncachedValidation.results.watch().forEach { }\n\n// Summarize into Validation object with list of ValidationSummary with examples of errors included    \n// val validationSummary: Validation= validation.createSummary()\nval validationSummary: CachedValidation = cachedValidation.createSummary()\nvalidationSummary.results.forEach { }\n\n// Print the results to various formats including Console, Logger, CSV, HTML, or Text\n// Look at baleen-printer-* sub-modules\nFile(\"validation.html\").writer().use {\n    HtmlPrinter(it).print(validationSummary.results)\n}\n```\n\n## Getting Help\n\nJoin the [slack channel](https://join.slack.com/t/baleen-validation/signup)\n\n## Core Concepts\n\n- Tests are great\n\n  There are a lot of great libraries for testing code.  We should use those same concepts for testing \n  data.\n\n- Performance and streaming are important\n\n  A data validation library should be able to handle large amounts of data quickly.\n\n- Invalid data is also important\n\n  Warnings and Errors need to be treated as first class objects.\n\n- Data Traces\n  \n  Similar to a stack trace being used to debug a code path, a data trace can be used to debug a \n  path through data. \n\n- Don't map data to Types too early.\n\n  Type safe code is great but if the data hasn't been sanitized then it isn't really typed.  \n\n### Warnings\n\nSometimes you will want an attribute or type to warn instead of error. The `asWarnings()` method will transform the output\nfrom `ValidationError` to `ValidationWarning` for all nested tests run underneath that attribute/type.\n\n```kotlin\nimport com.shoprunner.baleen.Baleen.describeAs\nimport com.shoprunner.baleen.ValidationError\nimport com.shoprunner.baleen.dataTrace\nimport com.shoprunner.baleen.types.StringType\nimport com.shoprunner.baleen.types.asWarnings\n\n\nval productDescription = \"Product\".describeAs {\n\n    // The asWarnings() method is on StringType. Min/max are warnings, but required is still an error.\n    \"sku\".type(StringType(min = 1, max = 500).asWarnings(), required = true) \n\n    // The asWarnings() method is on the attribute. Min/max and required are all warnings.\n    \"brand_manufacturer\".type(StringType(min = 1, max = 500), required = true).asWarnings()\n\n    // The asWarnings() method is on the attribute. The attribute's custom test will also be turned into a warning.\n    \"department\".type(StringType(min = 0, max = 100)).describeAs {\n        test(\"department is correct value\") { data -\u003e\n            assertThat(data).hasAttribute(\"department\") {\n                it.isOneOf(departments)\n            }\n        }\n    }.asWarnings()\n}\n```\n\n### Tagging\n\nA feature of Baleen is to add tags to tests, so that you can more easily identify, annotate, and filter your results.\nThere are a couple use-cases tagging becomes useful. For example, you have an identifier, like a sku, that you want each\ntest to have so that you can group together failed tests by that identifier. Another use-case is that you have different\npriority levels for your tests that you can set so you can highlight the most important errors.\n\n```kotlin\nval productDescription = \"Product\".describeAs {\n\n    // The tag() method is on StringType and dynamic tag pulls the value.\n    \"sku\".type(StringType().tag(\"priority\", \"critical\").tag(\"sku\", withValue()))\n\n    // The tag() method is on the attribute and the dynamic tag pulls an attribute value from sku.\n    \"brand_manufacturer\".type(StringType(), required = true)\n        .tag(\"priority\", \"low\")\n        .tag(\"sku\", withAttributeValue(\"sku\"))\n \n    // The tag() method is on the attribute, and a custom tag function is used that returns a String\n    \"department\".type(StringType(min = 0, max = 100))\n        .tag(\"priority\", \"high\")\n        .tag(\"sku\", withAttributeValue(\"sku\"))\n        .tag(\"gender\") { d -\u003e\n            when {\n                d is Data \u0026\u0026 d.containsKey(\"gender\") -\u003e \n                    when(d[\"gender\"]) {\n                        \"male\" -\u003e \"male\"\n                        \"mens\" -\u003e \"male\"\n                        \"female\" -\u003e \"female\"\n                        \"womens\" -\u003e \"femle\"\n                        else -\u003e \"other\"\n                    }\n                else -\u003e \"none\"\n            }\n        }\n}\n// Tag is on data description and the dynamic tag pulls attribute value from sku field  from the data\n.tag(\"sku\", withAttributeValue(\"sku\"))\n``` \n\nTagging is also done at the data evaluation level.  When writing tests, additional tags can be passed in using the Tagger function.\n```kotlin\n    \"department\".type(StringType(min = 0, max = 100)).describeAs {\n        test(\"department is correct value\", \"sku\" to withAttributeValue(\"sku\")) { data -\u003e\n            assertThat(data).hasAttribute(\"department\") {\n                it.isOneOf(departments)\n            }\n        }\n    }\n```\n\nSome Baleen Validation libraries, such as the XML or JSON validators, use tags to add line and column numbers as it \nparses the original raw data. This will help identify errors in the raw data much more quickly.    \n\n## Gotchas\n\n- Baleen does not assume that an attribute is not set and an attribute that is set with the value of null are the same thing.\n\n## Similar Projects\n\n- [Clojure Spec](https://clojure.org/guides/spec)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshoprunner%2Fbaleen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshoprunner%2Fbaleen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshoprunner%2Fbaleen/lists"}