Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/alchimystic/data-sampler

Utility library to generate data, based on Type classes and using Shapeless for dealing with Products
https://github.com/alchimystic/data-sampler

adt data-generator scala shapeless test

Last synced: 21 days ago
JSON representation

Utility library to generate data, based on Type classes and using Shapeless for dealing with Products

Host: GitHub
URL: https://github.com/alchimystic/data-sampler
Owner: alchimystic
License: mit
Created: 2022-11-24T11:55:31.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2022-11-24T14:52:29.000Z (about 2 years ago)
Last Synced: 2024-10-25T09:49:47.310Z (2 months ago)
Topics: adt, data-generator, scala, shapeless, test
Language: Scala
Homepage:
Size: 11.7 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # data-sampler

Utility library to generate data, based on Type classes and using Shapeless for dealing with Products

### Sampler[T]: a generator of instances of T

This small framework is based on Type classes, enabling ad-hoc polymorphism.

A type class is basically a way to add functionality to a class without the need of changing it.

**Sampler[T]** is our core type class, and it represents a generator of instances of type T.

While you can implement your own Sampler[T], the goal is to be able to derive as many Samplers as possible without writing any code.

In Scala, type classes rely on implicits for doing the heavy-duty work of looking up the correct implementations.

So with 3 or 4 imports you should be able to generate instances of most case classes in Scala.

### Algebraic Data Types

ADTs play an essential role in Functional Programming, including Scala.

The most important data types are **Sum** and **Product**.

A Sum data type represents an **OR** relation, so a sum T can be one of many other types. This is simple polymorphism and in Scala is done using sealed traits.

A Product data type represents an **AND** relation, so a product T has other types in it. This is a case class in Scala.  

So in Scala we can say that any domain model will contain a finite set of data types representing a deep and wide combination of Sums and Products.

Regarding Products, the framework is able to derive Sampler for a case class T(A, B), if it has Samplers for A and B. 

This is done using **Shapeless**, which gives us a way of reasoning about Products in a type-safe recursive way (and avoid reflection, for example).

Regarding Sums, the framework is not able to automatically derive a Sampler as it doesn't know which implementation of a sealed trait to pick.

We can however hint on the use of a certain implementation on a certain context.

### Scalar data types

ADTs allows us to define a rich and deep set of data types, but in the very end this relies on a small set of well know scalar types, holding a single value.

Int, Double, Boolean, Strings are some of examples of these scalar types, most of them are primitives. 

Other types (like BigDecimal, Instant or LocalDateTime) are a bit more complex, but they still represent a single value.

At the code, the framework must provide Samplers for these types.

The samplers for all these core scalar types are available in trait **CoreSamplerSet**.

We have 2 object implementations of CoreSamplerSet: 

* **FixedSamplers**: samplers providing always the same value for its type

* **UniqueSamplers**: samplers providing unique (as much as possible) values for its type 

### Collection data types

Some built-in types are not scalar, but are still very important and widely used: Seq, List, Set, Option, Map.

These represent a container for other types, and in a broad sense we can call them _Collections_.

We have 2 objects with Samplers for collections:

* **EmptySamplers**: samplers providing always the empty case of the collection (Nil, None, etc..)

* **NonEmptySamplers**: samplers providing always a collection with one element

### Enumerated / enumerations

Enumerations are a special case of a scalar type. To bridge enumerations with Samplers, we have the type class **Enumerated[T]**

The Enumerated[T] represents a type class which knows all the possible values of T.

It's more or less consensual than enumeratum provides a much better and flexible implementation of enumerations than Scala built-in enums

For that reason, for now we can only derive Enumerated for any T extending enumeratum's EnumEntry.

Later we expect to add support for Scala enums.

If an Enumerated[T] can be provided, a Sampler[T] can be derived.

Any of the implementations of **CoreSamplerSet** can provide this.

### Deriving the appropriate Samplers

Deriving automatically the Samplers for Scala ADTs is just a matter of picking the right imports:

* always import Samplers._, needed for the basic Sampler machinery and deriving samplers for Products (case classes)

* import FixedSamplers or UniqueSamplers, depending on the needs of each case

* import EmptySamplers or NonEmptySamplers, depending on how simplistic/empty or thorough/deep we need data in each case

### How to use

The most natural use case is to generate test data.

Its a pain to write tests when we have case classes with many non-optional fields, especially when each test only cares with a limited set of those fields.

The usual approach is to define a set of constant values, and then use them to create more complex types.

In the end of the test we would compare the results with constants we defined.

I always used a different approach: I create the complex domain instances i need, and then the 'constants' are retrieved from it.

Some hints on how to get the best out of this framework:

* If its important to have unique identifiers (Int, Long, String) import UniqueSamplers, otherwise FixedSamplers will to the job

* If our test need to be thorough use deeply filled datatypes (eg: testing a Codec), we should import NonEmptySamplers, otherwise EmptySamplers will do

* Customize sampled data by mutating it with copy(). I usually mutate only the fields which are relevant to each test case

* Use Samplers.select() to pick a sampler for a generic type (sealed trait / sum datatype). Eg:  Samplers.select[Animal,Bird] will provide a bird whenever an animal is needed