https://github.com/ponyo877/dummy_data_generator
:zap: dummy data generator for development :zap:
https://github.com/ponyo877/dummy_data_generator
dummy-data-generator
Last synced: 11 months ago
JSON representation
:zap: dummy data generator for development :zap:
- Host: GitHub
- URL: https://github.com/ponyo877/dummy_data_generator
- Owner: ponyo877
- Created: 2023-10-07T02:29:47.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-21T07:38:20.000Z (about 2 years ago)
- Last Synced: 2024-04-21T09:03:05.379Z (about 2 years ago)
- Topics: dummy-data-generator
- Language: Go
- Homepage:
- Size: 57.6 KB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Dummy Data Generator CLI
This CLI tool allows you to efficiently generate a large amount of dummy data in a database. It supports both PostgreSQL and MySQL and provides a flexible configuration file to specify which tables and columns to populate.
## Installation
To install the CLI tool, run the following command:
```bash
go install github.com/ponyo877/dummy_data_generator
```
## Features
- Generate a substantial amount of dummy data in a database.
- Supports both PostgreSQL and MySQL.
- Customize data generation through a configuration file.
- Track progress with a visual progress bar.
# Configuration
| Field | Description |
|-------------|---------------------------------------------------------------------------|
| tablename | Name of the table where the data will be generated. |
| recordcount | Total number of records to be generated. |
| buffer | Buffer size for generating records (useful for optimizing performance). |
| columns | List of columns with their respective configurations. |
| columns[].name | Name of the column. |
| columns[].type | Data type of the column (e.g., number, varchar, timestamp). |
| columns[].rule | Generation rule for the column. |
| columns[].rule.type | Dummy rule type (e.g., unique, const, pattern, random) |
| columns[].rule.format | [type: unique only] Dummy data format (e.g., UUID(varchar), ULID(varchar), NOW(timestamp)) |
| columns[].rule.value | [type: const only] Dummy data const value |
| columns[].rule.min | start of sequential value |
| columns[].rule.max | [type: pattern only] end of sequential value |
| columns[].rule.min_time | [type: random (timestamp) only] minimum value for random timestamp |
| columns[].rule.max_time | [type: random (timestamp) only] maximum value for random timestamp |
| columns[].patterns[].value | [type: pattern only] repeated value |
| columns[].patterns[].times | [type: pattern only] value of how many times to repeat |
#### Example Rules:
- `type: unique`: Generates unique values. sequential number(default), current_timestamp(format: NOW), UUID and ULID is supported
- `type: const`: Assigns a constant value.
- `type: pattern`: Generates values based on specified patterns. If you specify [{value: A, times: 2}, {value: B, times: 1}], it will create repeated values like [A,A,B,A,A,B,...] and so on. And if you specify {Min: 1, Max: 5}, it will create repeated values like [1,2,3,4,5,1,2,3,...] and so on.
- `type: random`: Generates random values between two values; min_time and max_time. Only timestamp data type is available as of now. If you specify {min_time: '2024-01-01 00:00:00', max_time: '2024-03-31 23:59:59'}, it will yeild random timestamps between them like '2024-02-01 01:23:45' but not '2023-12-31 23:59:59' or '2024-04-01 00:00:00'.
##### Example
```yaml
tablename: sample_tbl
recordcount: 1000000
buffer: 1000
columns:
# The 'id' column is a string in ULID format, ensuring all values are unique.
- name: id
type: varchar
rule:
type: unique
format: ULID
# The 'sex' column will contain the strings "male", "female", and "NA" in a 3:2:1 ratio.
- name: sex
type: varchar
rule:
type: pattern
patterns:
- value: male
times: 3
- value: female
times: 2
- value: NA
times: 1
# The 'created_at' column will have the fixed value "2024-01-01 00:00:00".
- name: created_at
type: timestamp
rule:
type: const
value: '2024-01-01 00:00:00'
```
## Sub Command
| Sub Command | Description |
|------------------|--------------------------------------------------------------------------------------------------------------|
| dummy_data_generator cnt | show number of record |
| dummy_data_generator gen | generate dummy data |
## Option
| Option | Description | Default Value |
|------------------|--------------------------------------------------------------------------------------------------------------|---------------|
| -c, --config | configuration file for dummy data. You can provide multiple configuration files using wildcards
(e.g., `-c "cfg_*.yaml"`) or by comma-separating them (e.g., `-c cfg_1.yaml,cfg_2.yaml`). | `config.yaml` |
| -d, --database | name of the database to use. | `mydb` |
| -u, --dbuser | database user name. | `root` |
| -e, --engine | database engine to use. Supports `postgres` and `mysql`. | `postgres` |
| -h, --host | database server host or socket directory. | `127.0.0.1` |
| -p, --password | database password to use when connecting to the server. | `password` |
| -P, --port | database server port. | `5432` |
## Usage Examples
- Example 1: Check current number of records. (MySQL)
```bash
$ dummy_data_generator cnt -e mysql -h 127.0.0.1 -u root -P 5432 -p password -c sample_1.yaml,sample_2.yaml
+--------+-------+
| TABLE | COUNT |
+--------+-------+
| table1 | 0 |
| table2 | 0 |
+--------+-------+
```
- Example 2: Generate dummy data to target table designated config file. (PostgreSQL, all default value without config)
```bash
$ dummy_data_generator gen -c "sample_*.yaml"
table1: 534000 / 1000000 in progress [=====================>-------------------] 53 %
table2: 533000 / 1000000 in progress [=====================>-------------------] 53 %
table3: 10000 / 10000 done! [=========================================]
```