Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shuttle-hq/synth
The Declarative Data Generator
https://github.com/shuttle-hq/synth
data-generation hacktoberfest json postgres realistic-data rust synthetic-data test-data-generator
Last synced: about 1 month ago
JSON representation
The Declarative Data Generator
- Host: GitHub
- URL: https://github.com/shuttle-hq/synth
- Owner: shuttle-hq
- License: apache-2.0
- Created: 2020-08-09T08:51:34.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-02-25T08:05:47.000Z (9 months ago)
- Last Synced: 2024-06-12T22:28:40.794Z (5 months ago)
- Topics: data-generation, hacktoberfest, json, postgres, realistic-data, rust, synthetic-data, test-data-generator
- Language: Rust
- Homepage: https://www.getsynth.com/
- Size: 32.2 MB
- Stars: 1,351
- Watchers: 26
- Forks: 101
- Open Issues: 101
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
The Declarative Data Generator
------
Synth is a tool for generating realistic data using a declarative data model. Synth is database agnostic and can scale to millions of rows of data.
## Why Synth
Synth answers a simple question. There are so many ways to consume data, why are there no frameworks for *generating* data?
Synth provides a robust, declarative framework for specifying constraint based data generation, solving the following problems developers face on the regular:
1. You're creating an App from scratch and have no way to populate your fresh schema with correct, realistic data.
2. You're doing integration testing / QA on **production** data, but you know it is bad practice, and you really should not be doing that.
3. You want to see how your system will scale if your database suddenly has 10x the amount of data.Synth solves exactly these problems with a flexible declarative data model which you can version control in git, peer review, and automate.
## Key Features
The key features of Synth are:
- **Data as Code**: Data generation is described using a declarative configuration language allowing you to specify your
entire data model as code.- **Import from Existing Sources**: Synth can import data from existing sources and automatically create data models.
Synth currently has Alpha support for [Postgres][postgres], [MySQL][mysql] and [mongoDB][mongo]!- **Data Inference**: While ingesting data, Synth automatically works out the relations, distributions and types of the
dataset.- **Database Agnostic**: Synth supports semi-structured data and is database agnostic - playing nicely with SQL and
NoSQL databases.- **Semantic Data Types**: Synth uses the [fake-rs][fake-rs] crate to enable the generation of semantically rich data
with support for types like names, addresses, credit card numbers etc.## Status
- [x] Alpha: We are testing `synth` with a closed set of users
- [x] Public Alpha: Anyone can [install `synth`][synth-install]. But go easy on us, there are a few kinks
- [ ] Public Beta: Stable enough for most non-enterprise use-cases
- [ ] Public: Production-readyWe are currently in Public Alpha. Watch "releases" of this repo to get notified of major updates.
## Installation & Getting Started
On Linux and MacOS you can get started with the one-liner:
```bash
# Optional, set install path
$ export SYNTH_INSTALL_PATH=~/bin
$ curl -sSL https://getsynth.com/install | sh
```For more installation options, check out the [docs](https://getsynth.com/docs/getting_started/installation).
## Examples
### Building a data model from scratch
To start generating data without having a source to import from, you need to add Synth schema files to a namespace directory:
To get started we'll create a namespace directory for our data model and call it `my_app`:
```bash
$ mkdir my_app
```Next let's create a `users` collection using Synth's configuration language, and put it into `my_app/users.json`:
```json synth
{
"type": "array",
"length": {
"type": "number",
"constant": 1
},
"content": {
"type": "object",
"id": {
"type": "number",
"id": {}
},
"email": {
"type": "string",
"faker": {
"generator": "safe_email"
}
},
"joined_on": {
"type": "date_time",
"format": "%Y-%m-%d",
"subtype": "naive_date",
"begin": "2010-01-01",
"end": "2020-01-01"
}
}
}
```Finally, generate data using the `synth generate` command:
```bash
$ synth generate my_app/ --size 2 | jq
{
"users": [
{
"email": "[email protected]",
"id": 1,
"joined_on": "2014-12-14"
},
{
"email": "[email protected]",
"id": 2,
"joined_on": "2013-04-06"
}
]
}
```### Building a data model from an external database
If you have an existing database, Synth can automatically generate a data model by inspecting the database.
You can use the `synth import` command to automatically generate Synth schema files from your Postgres, MySQL or MongoDB database:
```bash
$ synth import tpch --from postgres://user:pass@localhost:5432/tpch
Building customer collection...
Building primary keys...
Building foreign keys...
Ingesting data for table customer... 10 rows done.
```Finally, generate data into another instance of Postgres:
```bash
$ synth generate tpch --to postgres://user:pass@localhost:5433/tpch
```## Why Rust
We decided to build Synth from the ground up in Rust. We love Rust, and given the scale of data we wanted `synth` to generate, it made sense as a first choice. The combination of memory safety, performance, expressiveness and a great community made it a no-brainer and we've never looked back!
## Get in touch
If you would like to learn more, or you would like support for your use-case, feel free to open an issue on GitHub.
If your query is more sensitive, you can email [[email protected]](mailto:[email protected]) and we'll happily chat about your usecase.
## About Us
The Synth project is backed by OpenQuery. We are a [YCombinator](https://www.ycombinator.com/) backed startup based in London, England. We are passionate about data privacy, developer productivity, and building great tools for software engineers.
## Contributing
First of all, we sincerely appreciate all contributions to Synth, large or small so thank you.
See the [contributing](./CONTRIBUTING.md) section for details.
## License
Synth is source-available and licensed under the [Apache 2.0 License](https://github.com/getsynth/synth/blob/master/LICENSE).
## Contributors β¨
Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
Christos Hadjiaslanis
π πΌ π» π π¨ π π π€ π π§ π¦ π π‘οΈ β οΈ π’
Nodar Daneliya
π πΌ π π¨ π π π€
llogiq
πΌ π» π π€ π π§ π§βπ« π π‘οΈ β οΈ
Dmitri Shkurski
π»
Damien Broka
π πΌ π» π π¨ π π π€ π π§ π β οΈ
fretz12
π€ π» π β οΈ
Tyler Bailey
π» π
JΓΊnior Bassani
π π»
Daniel Hofstetter
π π»
Dr Alexander Mikhalev
π§ π π π€ π»
s e
π» π π π¦ π π€ β οΈ π π§
This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
[postgres]: https://www.postgresql.org/
[mysql]: https://www.mysql.com/
[mongo]: https://www.mongodb.com/
[fake-rs]: https://github.com/cksac/fake-rs
[synth-install]: https://getsynth.com/docs/getting_started/installation