{"id":24977694,"url":"https://github.com/adzz/data_schema","last_synced_at":"2025-07-20T15:02:34.286Z","repository":{"id":37953478,"uuid":"428034854","full_name":"Adzz/data_schema","owner":"Adzz","description":"Declarative schemas for data transformations.","archived":false,"fork":false,"pushed_at":"2023-12-08T20:49:27.000Z","size":731,"stargazers_count":91,"open_issues_count":15,"forks_count":9,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-19T18:09:10.316Z","etag":null,"topics":["data","data-parsing","elixir","functional-programming","types","validation"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Adzz.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-14T20:33:42.000Z","updated_at":"2025-05-06T23:45:52.000Z","dependencies_parsed_at":"2023-12-08T21:31:59.201Z","dependency_job_id":"68efe6a5-42b6-4b5b-9a8f-9f795fa81be9","html_url":"https://github.com/Adzz/data_schema","commit_stats":{"total_commits":133,"total_committers":6,"mean_commits":"22.166666666666668","dds":"0.17293233082706772","last_synced_commit":"ad5e1b14f9bf07111b59c84d8f050cb15b0319df"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Adzz/data_schema","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adzz%2Fdata_schema","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adzz%2Fdata_schema/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adzz%2Fdata_schema/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adzz%2Fdata_schema/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Adzz","download_url":"https://codeload.github.com/Adzz/data_schema/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Adzz%2Fdata_schema/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266143941,"owners_count":23883069,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-parsing","elixir","functional-programming","types","validation"],"created_at":"2025-02-03T23:08:42.970Z","updated_at":"2025-07-20T15:02:34.218Z","avatar_url":"https://github.com/Adzz.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataSchema\n\n[![Module Version](https://img.shields.io/hexpm/v/data_schema.svg)](https://hex.pm/packages/data_schema)\n[![Hex Docs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/data_schema/)\n[![Total Download](https://img.shields.io/hexpm/dt/data_schema.svg)](https://hex.pm/packages/data_schema)\n\n\u003c!-- README START --\u003e\nData schemas are declarative descriptions of how to create a struct from some input data. You can set up different schemas to handle different kinds of input data. By default we assume the incoming data is a map, but you can configure schemas to work with any arbitrary data input including XML and json.\n\nData Schemas really shine when working with API responses - converting the response into trusted internal data easily and efficiently.\n\nThis library has no dependencies.\n\n## Creating A Simple Schema\n\nLet's think of creating a struct as taking some source data and turning it into the desired struct. To do this we need to know at least three things:\n\n1. The keys of the desired struct\n2. The types of the values for each of the keys\n3. Where / how to get the data for each value from the source data.\n\nTurning the source data into the correct type defined by the schema will often require casting, so to cater for that the type definitions are casting functions. Let's look at a simple field example\n\n```\nfield {:content, \"text\", \u0026cast_string/1}\n#       ^          ^                ^\n# struct field     |                |\n#     path to data in the source    |\n#                            casting function\n```\n\nThis says in the source data there will be a field called `\"text\"`. When creating a struct we should get the data under that field and pass it too `cast_string/1`. The result of that function will be put in the resultant struct under the key `:content`.\n\nThere are 5 kinds of struct fields we could want:\n\n1. `field`     - The value will be a casted value from the source data.\n2. `list_of`   - The value will be a list of casted values created from the source data.\n3. `has_one`   - The value will be created from a nested data schema (so will be a struct)\n4. `has_many`  - The value will be created by casting a list of values into a data schema.\n(You end up with a list of structs defined by the provided schema). Similar to has_many in ecto\n5. `aggregate` - The value will be a casted value formed from multiple bits of data in the source.\n\nAvailable options are:\n\n* `:optional?` - specifies whether or not the field in the struct should be included in\nthe `@enforce_keys` for the struct. By default all fields are required but you can mark\nthem as optional by setting this to `true`. This will also be checked when creating a\nstruct with `DataSchema.to_struct/2` returning an error if the required field is null.\n* `:empty_values` - allows you to define what values should be used as \"empty\" for a\ngiven field. If either the value returned from the data accessor or the casted value are\nequivalent to any element in this list, that field is deemed to be empty. Defaults to `[nil]`,\nmeaning nil is always considered \"empty\".\n* `:default` - specifies a 0 arity function that will be called to produce a default value for a field\nwhen casting. This function will only be called if a field is found to be empty AND optional.\nIf it's empty and not optional we will error.\n\nFor example:\n\n```elixir\ndefmodule Sandwich do\n  require DataSchema\n\n  DataSchema.data_schema([\n    field: {:type, \"the_type\", \u0026{:ok, String.upcase(\u00261)}, optional?: true, empty_values: [nil]},\n  ])\nend\n```\n\n  And:\n\n```elixir\ndefmodule Sandwich do\n  require DataSchema\n\n  DataSchema.data_schema([\n    field: {:list, \"list\", \u0026{:ok, \u00261}, optional?: true, empty_values: [[]]},\n  ])\nend\n```\n\n  And:\n\n```elixir\ndefmodule Sandwich do\n  require DataSchema\n\n  @options [optional?: true, empty_values: [nil], default: \u0026DateTime.utc_now/0]\n  DataSchema.data_schema([\n    field: {:created_at, \"inserted_at\", \u0026{:ok, \u00261}, @options},\n  ])\nend\n```\n\nTo see this better let's look at a very simple example. Assume our input data looks like this:\n\n```elixir\nsource_data = %{\n  \"content\" =\u003e \"This is a blog post\",\n  \"comments\" =\u003e [%{\"text\" =\u003e \"This is a comment\"},%{\"text\" =\u003e \"This is another comment\"}],\n  \"draft\" =\u003e %{\"content\" =\u003e \"This is a draft blog post\"},\n  \"date\" =\u003e \"2021-11-11\",\n  \"time\" =\u003e \"14:00:00\",\n  \"metadata\" =\u003e %{ \"rating\" =\u003e 0}\n}\n```\n\nAnd now let's assume the struct we wish to make is this one:\n\n```elixir\n%BlogPost{\n  content: \"This is a blog post\",\n  comments: [%Comment{text: \"This is a comment\"}, %Comment{text: \"This is another comment\"}],\n  draft: %DraftPost{content: \"This is a draft blog post\"},\n  post_datetime: ~N[2020-11-11 14:00:00]\n}\n```\n\nWe can describe the following schemas to enable this:\n\n```elixir\ndefmodule DraftPost do\n  import DataSchema, only: [data_schema: 1]\n\n  data_schema([\n    field: {:content, \"content\", \u0026{:ok, to_string(\u00261)}}\n  ])\nend\n\ndefmodule Comment do\n  import DataSchema, only: [data_schema: 1]\n\n  data_schema([\n    field: {:text, \"text\", \u0026{:ok, to_string(\u00261)}}\n  ])\nend\n\ndefmodule BlogPost do\n  import DataSchema, only: [data_schema: 1]\n\n  @date_time_fields [\n    field: {:date, \"date\", {Date, :from_iso8601, []}},\n    field: {:time, \"time\", \u0026Time.from_iso8601/1}\n  ]\n  data_schema([\n    field: {:content, \"content\", \u0026{:ok, to_string(\u00261)}},\n    has_many: {:comments, \"comments\", Comment},\n    has_one: {:draft, \"draft\", DraftPost},\n    aggregate: {:post_datetime, @date_time_fields, \u0026NaiveDateTime.new(\u00261.date, \u00261.time)},\n  ])\nend\n```\n\nThen to transform some input data into the desired struct we can call `DataSchema.to_struct/2` which works recursively to transform the input data into the struct defined by the schema.\n\n```elixir\nsource_data = %{\n  \"content\" =\u003e \"This is a blog post\",\n  \"comments\" =\u003e [%{\"text\" =\u003e \"This is a comment\"},%{\"text\" =\u003e \"This is another comment\"}],\n  \"draft\" =\u003e %{\"content\" =\u003e \"This is a draft blog post\"},\n  \"date\" =\u003e \"2021-11-11\",\n  \"time\" =\u003e \"14:00:00\",\n  \"metadata\" =\u003e %{ \"rating\" =\u003e 0}\n}\n\nDataSchema.to_struct(source_data, BlogPost)\n# This will output the following:\n\n%BlogPost{\n  content: \"This is a blog post\",\n  comments: [%Comment{text: \"This is a comment\"}, %Comment{text: \"This is another comment\"}],\n  draft: %DraftPost{content: \"This is a draft blog post\"},\n  post_datetime: ~N[2020-11-11 14:00:00]\n}\n```\n\nSo imagine the input data came from an API response:\n\n```elixir\nwith {:ok, %{status: 200, response_body: body}} \u003c- Http.get(\"https://www.my_thing.example.com\"),\n     {:ok, decoded} \u003c- Jason.decode(body) do\n  DataSchema.to_struct(source_data, BlogPost)\nend\n```\n\n## Different Source Data Types\n\nAs we mentioned before we want to be able to handle multiple different kinds of source data in our schemas. For each type of source data we want to be able to specify how you access the data for each field type. We do that by providing a \"data accessor\" (a module that implements the `DataSchema.DataAccessBehaviour`) when we create the schema. We do this by providing a `@data_accessor` on the schema. By default if you do not provide this module attribute we use `DataSchema.MapAccessor`. That means the above example is equivalent to doing the following:\n\n```elixir\ndefmodule DraftPost do\n  import DataSchema, only: [data_schema: 1]\n\n  @data_accessor DataSchema.MapAccessor\n  data_schema([\n    field: {:content, \"content\", \u0026{:ok, to_string(\u00261)}}\n  ])\nend\n\ndefmodule Comment do\n  import DataSchema, only: [data_schema: 1]\n\n  @data_accessor DataSchema.MapAccessor\n  data_schema([\n    field: {:text, \"text\", \u0026{:ok, to_string(\u00261)}}\n  ])\nend\n\ndefmodule BlogPost do\n  import DataSchema, only: [data_schema: 1]\n  @data_accessor DataSchema.MapAccessor\n\n  @date_time_fields [\n    field: {:date, \"date\", \u0026Date.from_iso8601/1},\n    field: {:time, \"time\", \u0026Time.from_iso8601/1}\n  ]\n  data_schema([\n    field: {:content, \"content\", \u0026{:ok, to_string(\u00261)}},\n    has_many: {:comments, \"comments\", Comment},\n    has_one: {:draft, \"draft\", DraftPost},\n    aggregate: {:post_datetime, @date_time_fields, \u0026NaiveDateTime.new(\u00261.date, \u00261.time)},\n  ])\nend\n```\nWhen creating the struct DataSchema will call the relevant function for the field we are creating, passing it the source data and the path to the value(s) in the source. Our `DataSchema.MapAccessor` looks like this:\n\n```elixir\ndefmodule DataSchema.MapAccessor do\n  @behaviour DataSchema.DataAccessBehaviour\n\n  @impl true\n  def field(data = %{}, field) do\n    Map.get(data, field)\n  end\n\n  @impl true\n  def list_of(data = %{}, field) do\n    Map.get(data, field)\n  end\n\n  @impl true\n  def has_one(data = %{}, field) do\n    Map.get(data, field)\n  end\n\n  @impl true\n  def has_many(data = %{}, field) do\n    Map.get(data, field)\n  end\nend\n```\n\nTo save repeating `@data_accessor DataSchema.MapAccessor` on all of your schemas you could use a `__using__` macro like so:\n\n```elixir\ndefmodule MapSchema do\n  defmacro __using__(_) do\n    quote do\n      import DataSchema, only: [data_schema: 1]\n      @data_accessor DataSchema.MapAccessor\n    end\n  end\nend\n```\nThen use it like so:\n\n```elixir\ndefmodule DraftPost do\n  use MapSchema\n\n  data_schema([\n    field: {:content, \"content\", \u0026{:ok, to_string(\u00261)}}\n  ])\nend\n```\n\nThis means should we want to change how we access data (say we wanted to use `Map.fetch!` instead of `Map.get`) we would only need to change the accessor used in one place - inside the `__using__` macro. It also gives you a handy place to provide other functions for the structs that get created, perhaps implementing a default Inspect protocol implementation for example:\n\n```elixir\ndefmodule MapSchema do\n  defmacro __using__(opts) do\n    skip_inspect_impl = Keyword.get(opts, :skip_inspect_impl, false)\n\n    quote bind_quoted: [skip_inspect_impl: skip_inspect_impl] do\n      import DataSchema, only: [data_schema: 1]\n      @data_accessor DataSchema.MapAccessor\n\n      unless skip_inspect_impl do\n        defimpl Inspect do\n          def inspect(struct, _opts) do\n            \"\u003c\" \u003c\u003e \"#{struct.__struct__}\" \u003c\u003e \"\u003e\"\n          end\n        end\n      end\n    end\n  end\nend\n```\n\nThis could help ensure you never log sensitive fields by requiring you to explicitly implement an inspect protocol for a struct in order to see the fields in it.\n\n### XML Schemas\n\nNow let's imagine instead that our source data was XML. What would it require to enable that? First a new Xpath data accessor:\n\n```elixir\ndefmodule XpathAccessor do\n  @behaviour DataSchema.DataAccessBehaviour\n  import SweetXml, only: [sigil_x: 2]\n\n  @impl true\n  def field(data, path) do\n    SweetXml.xpath(data, ~x\"#{path}\"s)\n  end\n\n  @impl true\n  def list_of(data, path) do\n    SweetXml.xpath(data, ~x\"#{path}\"l)\n  end\n\n  @impl true\n  def has_one(data, path) do\n    SweetXml.xpath(data, ~x\"#{path}\")\n  end\n\n  @impl true\n  def has_many(data, path) do\n    SweetXml.xpath(data, ~x\"#{path}\"l)\n  end\nend\n```\n\nAs we can see our accessor uses the library [Sweet XML](https://github.com/kbrw/sweet_xml) to access the XML. That means if we wanted to change the library later we would only need to alter this one module for all of our schemas to benefit from the change.\n\nOur source data looks like this:\n\n```elixir\nsource_data = \"\"\"\n\u003cBlog date=\"2021-11-11\" time=\"14:00:00\"\u003e\n  \u003cContent\u003eThis is a blog post\u003c/Content\u003e\n  \u003cComments\u003e\n    \u003cComment\u003eThis is a comment\u003c/Comment\u003e\n    \u003cComment\u003eThis is another comment\u003c/Comment\u003e\n  \u003c/Comments\u003e\n  \u003cDraft\u003e\n    \u003cContent\u003eThis is a draft blog post\u003c/Content\u003e\n  \u003c/Draft\u003e\n\u003c/Blog\u003e\n\"\"\"\n```\n\nLet's define our schemas like so:\n\n```elixir\ndefmodule DraftPost do\n  import DataSchema, only: [data_schema: 1]\n\n  @data_accessor XpathAccessor\n  data_schema([\n    field: {:content, \"./Content/text()\", \u0026{:ok, to_string(\u00261)}}\n  ])\nend\n\ndefmodule Comment do\n  import DataSchema, only: [data_schema: 1]\n\n  @data_accessor XpathAccessor\n  data_schema([\n    field: {:text, \"./text()\", \u0026{:ok, to_string(\u00261)}}\n  ])\nend\n\ndefmodule BlogPost do\n  import DataSchema, only: [data_schema: 1]\n\n  @data_accessor XpathAccessor\n  @datetime_fields [\n    field: {:date, \"/Blog/@date\", \u0026Date.from_iso8601/1},\n    field: {:time, \"/Blog/@time\", \u0026Time.from_iso8601/1},\n  ]\n  data_schema([\n    field: {:content, \"/Blog/Content/text()\", \u0026{:ok, to_string(\u00261)}},\n    has_many: {:comments, \"//Comment\", Comment},\n    has_one: {:draft, \"/Blog/Draft\", DraftPost},\n    aggregate: {:post_datetime, @datetime_fields, \u0026NaiveDateTime.new(\u00261.date, \u00261.time)},\n  ])\nend\n```\n\nAnd now we can transform:\n\n```elixir\nsource_data = \"\"\"\n\u003cBlog date=\"2021-11-11\" time=\"14:00:00\"\u003e\n  \u003cContent\u003eThis is a blog post\u003c/Content\u003e\n  \u003cComments\u003e\n    \u003cComment\u003eThis is a comment\u003c/Comment\u003e\n    \u003cComment\u003eThis is another comment\u003c/Comment\u003e\n  \u003c/Comments\u003e\n  \u003cDraft\u003e\n    \u003cContent\u003eThis is a draft blog post\u003c/Content\u003e\n  \u003c/Draft\u003e\n\u003c/Blog\u003e\n\"\"\"\n\nDataSchema.to_struct(source_data, BlogPost)\n\n# This will output:\n\n %BlogPost{\n   comments: [\n     %Comment{text: \"This is a comment\"},\n     %Comment{text: \"This is another comment\"}\n   ],\n   content: \"This is a blog post\",\n   draft: %DraftPost{content: \"This is a draft blog post\"},\n   post_datetime: ~N[2021-11-11 14:00:00]\n }\n```\n\u003c!-- README END --\u003e\n\n### JSONPath Schemas\n\nThis is left as an exercise for the reader but hopefully you can see how you could extend this idea to allow for json data and JSONPaths pointing to the data in the schemas.\n\n### Guides\n\nSee the [docs](https://hexdocs.pm/data_schema/DataSchema.html) or the [guides in this repo](https://github.com/Adzz/data_schema/tree/main/guides) for more details.\n\n### Livebook\n\nThere are livebooks available under the `livebooks` folder in this repo. Find out more about livebook [here](https://github.com/livebook-dev/livebook).\n\nFor quick instruction at the root of the repo you can:\n\n```sh\nmix escript.install hex livebook\n\n# Start the Livebook server\nlivebook server\n```\n\nIf using asdf you should also `asdf reshim elixir`.\n\nThe above will start the livebook where you can navigate to the livebooks repo and load any of the interactive docs from there.\n\n### Contributing\n\n**NB** Set the `MIX_ENV` to `:docs` when publishing the package. This will ensure that modules inside `test/support` wont get their documentation published with the library (as they are included in the :dev env).\n\n```sh\nMIX_ENV=docs mix hex.publish\n```\n\nYou will also have to set that env if you want to run `mix docs`\n\n```sh\nMIX_ENV=docs mix docs\n```\n\n## Installation\n\n[available in Hex](https://hex.pm/packages/data_schema), the package can be installed by adding `data_schema` to your list of dependencies in `mix.exs`:\n\n```elixir\ndef deps do\n  [\n    {:data_schema, \"~\u003e 0.5.0\"}\n  ]\nend\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadzz%2Fdata_schema","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadzz%2Fdata_schema","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadzz%2Fdata_schema/lists"}