Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alchimystic/data-sampler
Utility library to generate data, based on Type classes and using Shapeless for dealing with Products
https://github.com/alchimystic/data-sampler
adt data-generator scala shapeless test
Last synced: 21 days ago
JSON representation
Utility library to generate data, based on Type classes and using Shapeless for dealing with Products
- Host: GitHub
- URL: https://github.com/alchimystic/data-sampler
- Owner: alchimystic
- License: mit
- Created: 2022-11-24T11:55:31.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2022-11-24T14:52:29.000Z (about 2 years ago)
- Last Synced: 2024-10-25T09:49:47.310Z (2 months ago)
- Topics: adt, data-generator, scala, shapeless, test
- Language: Scala
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# data-sampler
Utility library to generate data, based on Type classes and using Shapeless for dealing with Products
### Sampler[T]: a generator of instances of T
This small framework is based on Type classes, enabling ad-hoc polymorphism.
A type class is basically a way to add functionality to a class without the need of changing it.
**Sampler[T]** is our core type class, and it represents a generator of instances of type T.
While you can implement your own Sampler[T], the goal is to be able to derive as many Samplers as possible without writing any code.
In Scala, type classes rely on implicits for doing the heavy-duty work of looking up the correct implementations.
So with 3 or 4 imports you should be able to generate instances of most case classes in Scala.
### Algebraic Data Types
ADTs play an essential role in Functional Programming, including Scala.
The most important data types are **Sum** and **Product**.
A Sum data type represents an **OR** relation, so a sum T can be one of many other types. This is simple polymorphism and in Scala is done using sealed traits.
A Product data type represents an **AND** relation, so a product T has other types in it. This is a case class in Scala.So in Scala we can say that any domain model will contain a finite set of data types representing a deep and wide combination of Sums and Products.
Regarding Products, the framework is able to derive Sampler for a case class T(A, B), if it has Samplers for A and B.
This is done using **Shapeless**, which gives us a way of reasoning about Products in a type-safe recursive way (and avoid reflection, for example).Regarding Sums, the framework is not able to automatically derive a Sampler as it doesn't know which implementation of a sealed trait to pick.
We can however hint on the use of a certain implementation on a certain context.### Scalar data types
ADTs allows us to define a rich and deep set of data types, but in the very end this relies on a small set of well know scalar types, holding a single value.
Int, Double, Boolean, Strings are some of examples of these scalar types, most of them are primitives.
Other types (like BigDecimal, Instant or LocalDateTime) are a bit more complex, but they still represent a single value.
At the code, the framework must provide Samplers for these types.The samplers for all these core scalar types are available in trait **CoreSamplerSet**.
We have 2 object implementations of CoreSamplerSet:
* **FixedSamplers**: samplers providing always the same value for its type
* **UniqueSamplers**: samplers providing unique (as much as possible) values for its type### Collection data types
Some built-in types are not scalar, but are still very important and widely used: Seq, List, Set, Option, Map.
These represent a container for other types, and in a broad sense we can call them _Collections_.
We have 2 objects with Samplers for collections:
* **EmptySamplers**: samplers providing always the empty case of the collection (Nil, None, etc..)
* **NonEmptySamplers**: samplers providing always a collection with one element### Enumerated / enumerations
Enumerations are a special case of a scalar type. To bridge enumerations with Samplers, we have the type class **Enumerated[T]**
The Enumerated[T] represents a type class which knows all the possible values of T.It's more or less consensual than enumeratum provides a much better and flexible implementation of enumerations than Scala built-in enums
For that reason, for now we can only derive Enumerated for any T extending enumeratum's EnumEntry.
Later we expect to add support for Scala enums.If an Enumerated[T] can be provided, a Sampler[T] can be derived.
Any of the implementations of **CoreSamplerSet** can provide this.### Deriving the appropriate Samplers
Deriving automatically the Samplers for Scala ADTs is just a matter of picking the right imports:
* always import Samplers._, needed for the basic Sampler machinery and deriving samplers for Products (case classes)
* import FixedSamplers or UniqueSamplers, depending on the needs of each case
* import EmptySamplers or NonEmptySamplers, depending on how simplistic/empty or thorough/deep we need data in each case### How to use
The most natural use case is to generate test data.
Its a pain to write tests when we have case classes with many non-optional fields, especially when each test only cares with a limited set of those fields.
The usual approach is to define a set of constant values, and then use them to create more complex types.
In the end of the test we would compare the results with constants we defined.I always used a different approach: I create the complex domain instances i need, and then the 'constants' are retrieved from it.
Some hints on how to get the best out of this framework:
* If its important to have unique identifiers (Int, Long, String) import UniqueSamplers, otherwise FixedSamplers will to the job
* If our test need to be thorough use deeply filled datatypes (eg: testing a Codec), we should import NonEmptySamplers, otherwise EmptySamplers will do
* Customize sampled data by mutating it with copy(). I usually mutate only the fields which are relevant to each test case
* Use Samplers.select() to pick a sampler for a generic type (sealed trait / sum datatype). Eg: Samplers.select[Animal,Bird] will provide a bird whenever an animal is needed