{"id":13462509,"url":"https://github.com/feliixx/mgodatagen","last_synced_at":"2025-04-05T14:08:11.394Z","repository":{"id":23815506,"uuid":"99916389","full_name":"feliixx/mgodatagen","owner":"feliixx","description":"Generate random data for MongoDB","archived":false,"fork":false,"pushed_at":"2024-04-14T17:05:35.000Z","size":2164,"stargazers_count":313,"open_issues_count":5,"forks_count":63,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-05-01T14:41:45.690Z","etag":null,"topics":["bson","database-population","mongodb","random","scalability","seeding","test"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/feliixx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-10T11:36:00.000Z","updated_at":"2024-05-07T13:49:11.827Z","dependencies_parsed_at":"2022-08-31T02:00:46.647Z","dependency_job_id":"e3e3a755-3675-49df-86c5-3556a0e8a582","html_url":"https://github.com/feliixx/mgodatagen","commit_stats":null,"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feliixx%2Fmgodatagen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feliixx%2Fmgodatagen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feliixx%2Fmgodatagen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/feliixx%2Fmgodatagen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/feliixx","download_url":"https://codeload.github.com/feliixx/mgodatagen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247345854,"owners_count":20924102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bson","database-population","mongodb","random","scalability","seeding","test"],"created_at":"2024-07-31T12:00:50.188Z","updated_at":"2025-04-05T14:08:11.364Z","avatar_url":"https://github.com/feliixx.png","language":"Go","funding_links":[],"categories":["Uncategorized","Go","Tools"],"sub_categories":["Uncategorized","Development"],"readme":"[![Go Report Card](https://goreportcard.com/badge/github.com/feliixx/mgodatagen)](https://goreportcard.com/report/github.com/feliixx/mgodatagen)\n[![codecov](https://codecov.io/gh/feliixx/mgodatagen/branch/master/graph/badge.svg)](https://codecov.io/gh/feliixx/mgodatagen)\n[![PkgGoDev](https://pkg.go.dev/badge/github.com/feliixx/mgodatagen/datagen)](https://pkg.go.dev/github.com/feliixx/mgodatagen/datagen)\n\n# mgodatagen\n\nA small CLI tool to quickly generate millions of pseudo-random BSON documents and insert them into a MongoDB instance. Quickly test new data structure or how your application responds when your database grows!\n\nTry it online: [**mongoplayground.net**](https://mongoplayground.net/)\n\n## Features\n\n- Support all bson types listed in [MongoDB bson types](https://docs.mongodb.com/manual/reference/bson-types/)\n- Generate _real_ data using [gofakeit](https://github.com/brianvoe/gofakeit)\n- Create referenced fields across collections\n- Aggregate data across collections\n- Create sharded collection\n- Create collections in multiple databases\n- Output result as JSON\n\n![Demo](demo.gif)\n\n## installation\n\nDownload the binary from the [release page](https://github.com/feliixx/mgodatagen/releases)\n\nor\n\nBuild from source:\n\nFirst, make sure that go is installed on your machine (see [install go](https://golang.org/doc/install) for details ). Then clone the repo and build it:\n\n```\ngit clone https://github.com/feliixx/mgodatagen.git\ncd mgodatagen\ngo build\n```\n\nIf you are using go 1.17 or higher, you can simply use `go install`: \n\n```\ngo install github.com/feliixx/mgodatagen@latest\n```\n\n## Options\n\nSeveral options are available (use `mgodatagen --help` to print this):\n\n```\nUsage:\n  mgodatagen -f config_file.json\n\ntemplate:\n      --new=\u003cfilename\u003e                         create an empty configuration file\n\nconfiguration:\n  -f, --file=\u003cconfigfile\u003e                      JSON config file. This field is required\n\n  -a, --append                                 if present, append documents to the collection without\n                                               removing older documents or deleting the collection\n  -i, --indexonly                              if present, mgodatagen will just try to rebuild index\n  -x, --indexfirst                             if present, mgodatagen will create index before\n                                               inserting documents\n  -n, --numWorker=\u003cnb\u003e                         number of concurrent workers inserting documents\n                                               in database. Default is number of CPU\n  -b, --batchsize=\u003csize\u003e                       bulk insert batch size (default: 1000)\n  -s, --seed=\u003cseed\u003e                            specific seed to use. Passing the same seed garentees\n                                               the same output for evey run with the same config.\n                                               Has to be in [1, 18446744073709551615]\n  -o, --output=\u003coutput\u003e                        where documents should be written. Options are:\n                                               - mongodb (default)\n                                               - stdout\n                                               - filename\n      --prettyprint                            if present, indent the output. Only for stdout or file\n                                               output\n\nconnection infos:\n      --uri=\u003curi\u003e                              connection string URI. If present, takes precedence over all\n                                               other options. For detail on URI format, see\n                                               https://docs.mongodb.com/manual/reference/connection-string/\n  -h, --host=\u003chostname\u003e                        mongodb host to connect to (default: 127.0.0.1)\n      --port=\u003cport\u003e                            server port (default: 27017)\n  -u, --username=\u003cusername\u003e                    username for authentification\n  -p, --password=\u003cpassword\u003e                    password for authentification\n      --authenticationMechanism=\u003cmechanism\u003e    authentication mechanism\n                                               for now only PLAIN and MONGODB-X509 are supported\n      --tlsCertificateKeyFile=\u003cpath\u003e           PEM certificate/key file for TLS\n      --tlsCAFile=\u003cpath\u003e                       Certificate Authority file for TLS\n\ngeneral:\n      --help                                   show this help message\n  -v, --version                                print the tool version and exit\n  -q, --quiet                                  quieter output\n```\n\nOnly the configuration file is required ( **-f | --file flag**). A basic usage of mgodatagen would be\n\n```\n./mgodatagen -f config.json\n```\n\nIf no host/port is specified, mgodatagen tries to connect to **`mongodb://127.0.0.1:27017`**.\n\n# Configuration file\n\nThe config file is an array of JSON documents, where each documents holds the configuration\nfor a collection to create\n\nSee **MongoDB documentation** for details on parameters:\n\n- shardConfig: [**shardCollection**](https://docs.mongodb.com/manual/reference/command/shardCollection/)\n- indexes: [**indexes**](https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/)\n- collation: [**collation**](https://docs.mongodb.com/manual/reference/bson-type-comparison-order/#collation)\n\n```scala\n[\n  // first collection to create\n  {\n   // REQUIRED FIELDS\n   //\n   \"database\":     \u003cstring\u003e,    // required, database name\n   \"collection\":   \u003cstring\u003e,    // required, collection name\n   \"count\":        \u003cint\u003e,       // required, number of document to insert in the collection\n   \"content\": {                 // required, the actual schema to generate documents\n     \"fieldName1\": \u003cgenerator\u003e, // optional, see Generator below\n     \"fieldName2\": \u003cgenerator\u003e,\n     ...\n   },\n   // OPTIONAL FIELDS\n   //\n   // compression level (for WiredTiger engine only)\n   // possible values:\n   // - none\n   // - snappy\n   // - zlib\n   \"compressionLevel\": \u003cstring\u003e, // optional, default: snappy\n\n   // configuration for sharded collection\n   \"shardConfig\": {                          // optional\n      \"shardCollection\":  \u003cstring\u003e.\u003cstring\u003e, // required. \u003cdatabase\u003e.\u003ccollection\u003e\n      \"key\":              \u003cobject\u003e,          // required, shard key, eg: {\"_id\": \"hashed\"}\n      \"unique\":           \u003cboolean\u003e,         // optional, default: false\n      \"numInitialChunks\": \u003cint\u003e,             // optional\n\n      \"collation\": {                  // optional\n        \"locale\":          \u003cstring\u003e,  // required\n        \"caseLevel\":       \u003cboolean\u003e, // optional\n        \"caseFirst\":       \u003cstring\u003e,  // optional\n        \"strength\":        \u003cint\u003e,     // optional\n        \"numericOrdering\": \u003cboolean\u003e, // optional\n        \"alternate\":       \u003cstring\u003e,  // optional\n        \"maxVariable\":     \u003cstring\u003e,  // optional \n        \"backwards\":       \u003cboolean\u003e, // optional\n        \"normalization\":   \u003cstring\u003e   // optional\n      }\n   },\n\n   // list of index to build\n   \"indexes\": [                                // optional\n      {\n         \"name\":                    \u003cstring\u003e,  // required, index name\n         \"key\":                     \u003cobject\u003e,  // required, index key, eg: {\"name\": 1}\n         \"sparse\":                  \u003cboolean\u003e, // optional, default: false\n         \"unique\":                  \u003cboolean\u003e, // optional, default: false\n         \"bits\":                    \u003cint\u003e,     // optional, for 2d indexes only, default: 26\n         \"min\":                     \u003cdouble\u003e,  // optional, for 2d indexes only, default: -180.0\n         \"max\":                     \u003cdouble\u003e,  // optional, for 2d index only, default: 180.0\n         \"bucketSize\":              \u003cdouble\u003e,  // optional, for geoHaystack indexes only\n         \"expireAfterSeconds\":      \u003cint\u003e,     // optional, for TTL indexes only\n         \"weights\":                 \u003cstring\u003e,  // optional, for text indexes only\n         \"defaultLanguage\":         \u003cstring\u003e,  // optional, for text index only\n         \"languageOverride\":        \u003cstring\u003e,  // optional, for text index only\n         \"textIndexVersion\":        \u003cint\u003e,     // optional, for text index only\n         \"partialFilterExpression\": \u003cobject\u003e,  // optional\n         \"hidden\":                  \u003cboolean\u003e, // optional \n         \"storageEngine\":           \u003cobject\u003e,  // optional \n         \"wildcardProjection\":      \u003cobject\u003e,  // optional, for wildcard index only\n         \"2dsphereIndexVersion\":    \u003cint\u003e,     // optional, for 2dsphere index only\n\n         \"collation\": {                  // optional\n           \"locale\":          \u003cstring\u003e,  // required \n           \"caseLevel\":       \u003cboolean\u003e, // optional\n           \"caseFirst\":       \u003cstring\u003e,  // optional\n           \"strength\":        \u003cint\u003e,     // optional\n           \"numericOrdering\": \u003cboolean\u003e, // optional\n           \"alternate\":       \u003cstring\u003e,  // optional\n           \"maxVariable\":     \u003cstring\u003e,  // optional\n           \"backwards\":       \u003cboolean\u003e, // optional\n           \"normalization\":   \u003cstring\u003e   // optional\n         }\n   ]\n  },\n  // second collection to create\n  {\n    ...\n  }\n]\n```\n\n### Example\n\nA set of sample config files can be found in **[datagen/generators/testdata/](https://github.com/feliixx/mgodatagen/tree/master/datagen/generators/testdata)**. To use it,\nmake sure that you have a mongodb instance running (on 127.0.0.1:27017 for example)\nand run\n\n```\n./mgodatagen -f datagen/generators/testdata/ref.json\n```\n\nThis will insert 1000 random documents in collections `test` and `link` of database\n`mgodatagen_test` with the structure defined in the config file.\n\n# Generator types\n\nGenerators have a common structure:\n\n```scala\n\"fieldName\": {                  // required, field name in generated document\n  \"type\":             \u003cstring\u003e, // required, type of the field\n  \"typeParam\":        ...,      // specific parameters for this type\n\n  \"maxDistinctValue\": \u003cint\u003e,    // optional, maximum number of distinct values for this field\n  \"nullPercentage\":   \u003cint\u003e     // optional, int between 0 and 100. Percentage of documents\n                                // that will have this field\n}\n```\n\nList of `\u003cgenerator\u003e` types:\n\n- [string](#string)\n- [stringFromParts](#stringFromParts)\n- [int](#int)\n- [long](#long)\n- [double](#double)\n- [decimal](#decimal)\n- [autoincrement](#autoincrement)\n- [boolean](#boolean)\n- [objectId](#objectid)\n- [UUID](#uuid)\n- [binary](#binary)\n- [date](#date)\n- [coordinates (formerly position)](#coordinates)\n- [constant](#constant)\n- [enum (formerly fromArray)](#enum)\n- [reference](#reference)\n- [faker](#faker)\n- [array](#array)\n- [object](#object)\n- [countAggregator](#countAggregator)\n- [valueAggregator](#valueAggregator)\n- [boundAggregator](#boundAggregator)\n\n### String\n\nGenerates a random string of a certain length. String is composed of char within this list:\n`abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_`\n\n```scala\n\"fieldName\": {\n    \"type\":             \"string\", // required\n    \"minLength\":        \u003cint\u003e,    // optional, must be \u003e= 0\n    \"maxLength\":        \u003cint\u003e,    // optional, must be \u003e= minLength\n    \"unique\":           \u003cbool\u003e,   // optional, see details below\n    \"nullPercentage\":   \u003cint\u003e,    // optional\n    \"maxDistinctValue\": \u003cint\u003e     // optional\n}\n```\n\n#### Unique String\n\nIf `unique` is set to true, the field will only contains unique strings. Unique strings\nhave a **fixed length**, `minLength` is taken as length for the string.\nThere is `64^x` possible unique string for strings of length `x`. This number has to\nbe inferior or equal to the number of documents you want to generate.\nFor example, if you want unique strings of length 3, there is `64 * 64 * 64 = 262144` possible\nstrings\n\nThey will look like\n\n```\n\"aaa\",\n\"aab\",\n\"aac\",\n\"aad\",\n...\n```\n\n### StringFromParts\n\nGenerates a random string from several generators. `parts` generators can't have `unique` or `maxDistinctValue` attributes set. \n\n```scala\n\"fieldName\": {\n    \"type\":           \"stringFromParts\", // required\n    \"parts\": [                           // required. Can't be empty. An array \n      \u003cgenerator\u003e,                       // of generators of any basic type\n      \u003cgenerator\u003e\n      ...\n    ],\n    \"nullPercentage\": \u003cint\u003e              // optional\n}\n```\n\n**Example:**\n\nTo generate phone number like `'(555) 565-2431'`, you can combine several generators \nlike this: \n\n```scala\n\"phone\": {\n  \"type\": \"stringFromParts\",\n  \"parts\": [\n    {\n      \"type\": \"constant\",\n      \"constVal\": \"(555) \"\n    },\n    {\n      \"type\": \"int\",\n      \"min\": 100,\n      \"max\": 999\n    },\n    {\n      \"type\": \"constant\",\n      \"constVal\": \"-\"\n    },\n    {\n      \"type\": \"int\",\n      \"min\": 1000,\n      \"max\": 9999\n    },\n  ]\n}\n```\n\n### Int\n\nGenerates a random `int` within bounds.\n\n```scala\n\"fieldName\": {\n    \"type\":             \"int\", // required\n    \"min\":              \u003cint\u003e, // optional\n    \"max\":              \u003cint\u003e, // optional, must be \u003e= min\n    \"nullPercentage\":   \u003cint\u003e, // optional\n    \"maxDistinctValue\": \u003cint\u003e  // optional\n}\n```\n\n### Long\n\nGenerates a random `long` within bounds.\n\n```scala\n\"fieldName\": {\n    \"type\":             \"long\", // required\n    \"min\":              \u003clong\u003e, // optional\n    \"max\":              \u003clong\u003e, // optional, must be \u003e= min\n    \"nullPercentage\":   \u003cint\u003e,  // optional\n    \"maxDistinctValue\": \u003cint\u003e   // optional\n}\n```\n\n### Double\n\nGenerates a random `double` within bounds.\n\n```scala\n\"fieldName\": {\n    \"type\":             \"double\", // required\n    \"min\":              \u003cdouble\u003e, // optional\n    \"max\":              \u003cdouble\u003e, // optional, must be \u003e= min\n    \"nullPercentage\":   \u003cint\u003e,    // optional\n    \"maxDistinctValue\": \u003cint\u003e     // optional\n}\n```\n\n### Decimal\n\nGenerates a random `decimal128`.\n\n```scala\n\"fieldName\": {\n    \"type\":             \"decimal\", // required\n    \"nullPercentage\":   \u003cint\u003e,     // optional\n    \"maxDistinctValue\": \u003cint\u003e,     // optional\n}\n```\n\n### Autoincrement\n\nGenerates an autoincremented value (type `\u003clong\u003e` or `\u003cint\u003e`).\n\n```scala\n\"fieldName\": {\n    \"type\":           \"autoincrement\", // required\n    \"autoType\":       \u003cstring\u003e,        // required, either \"int\" or \"long\"\n    \"start\":          \u003cint|long\u003e,      // optional, start value\n    \"nullPercentage\": \u003cint\u003e            // optional\n}\n```\n\n### Boolean\n\nGenerates a random `boolean`.\n\n```scala\n\"fieldName\": {\n    \"type\":             \"boolean\", // required\n    \"nullPercentage\":   \u003cint\u003e,     // optional\n    \"maxDistinctValue\": \u003cint\u003e      // optional\n}\n```\n\n### ObjectId\n\nGenerates a random `objectId`.\n\n```scala\n\"fieldName\": {\n    \"type\":             \"objectId\", // required\n    \"nullPercentage\":   \u003cint\u003e,      // optional\n    \"maxDistinctValue\": \u003cint\u003e       // optional\n}\n```\n\n### UUID\n\nGenerates a random UUID\n\n```scala\n\"fieldName\": {\n    \"type\":           \"uuid\",  // required\n    \"format\":         \u003cstring\u003e // optional, either \"string\" or \"binary\".\n                               // default is \"string\" \n    \"nullPercentage\": \u003cint\u003e    // optional\n}\n```\n\nIf `format` is `\"string\"`, the field will be a simple string like `\"f1b9b567-9b34-45af-9d9c-35f565d57716\"`.\n\n If `format` is `\"binary\"`, the field will be stored as a [bson UUID](https://docs.mongodb.com/manual/reference/method/UUID/) like `UUID(\"f1b9b567-9b34-45af-9d9c-35f565d57716\")`\n \n### Binary\n\nGenerates random binary data of length within bounds.\n\n```scala\n\"fieldName\": {\n    \"type\":             \"binary\", // required\n    \"minLength\":        \u003cint\u003e,    // optional, must be \u003e= 0\n    \"maxLength\":        \u003cint\u003e,    // optional, must be \u003e= minLength\n    \"nullPercentage\":   \u003cint\u003e,    // optional\n    \"maxDistinctValue\": \u003cint\u003e     // optional\n}\n```\n\n### Date\n\nGenerates a random date (stored as [`ISODate`](https://docs.mongodb.com/manual/reference/method/Date/) ).\n\n`startDate` and `endDate` are string representation of a Date following RFC3339:\n\n**format**: \"yyyy-MM-ddThh:mm:ss+00:00\"\n\n```scala\n\"fieldName\": {\n    \"type\":             \"date\",   // required\n    \"startDate\":        \u003cstring\u003e, // required\n    \"endDate\":          \u003cstring\u003e, // required, must be \u003e= startDate\n    \"nullPercentage\":   \u003cint\u003e,    // optional\n    \"maxDistinctValue\": \u003cint\u003e     // optional\n}\n```\n\n### Coordinates\n\nGenerates random [GeoJSON](https://docs.mongodb.com/manual/geospatial-queries/#std-label-geospatial-geojson) coordinates (a GPS position in WGS84 Decimal Degrees with folowing format: `[ longitude, latitude ]` )\n\neg : [40.741895, -73.989308]\n\n```scala\n\"fieldName\": {\n    \"type\":             \"coordinates\", // required\n    \"nullPercentage\":   \u003cint\u003e,         // optional\n    \"maxDistinctValue\": \u003cint\u003e          // optional\n}\n```\n\n### Constant\n\nAdds the same value to each document.\n\n```scala\n\"fieldName\": {\n    \"type\":           \"constant\", // required\n    \"constVal\":       \u003cobject\u003e,   // required, can be of any type including object and array\n                                  // eg: {\"k\": 1, \"v\": \"val\"}\n                                  // to set constant ObjectId eg: { \"$oid\": \"5a934e000102030405000001\" }\n    \"nullPercentage\": \u003cint\u003e       // optional\n}\n```\n\n### Enum\n\nPicks an object from an array as value for the field. Currently, objects in the\narray have to be of the same type. By default, items are picked from the array \nin the order where they appear.\n\n```scala\n\"fieldName\": {\n    \"type\":           \"enum\", // required\n    \"values\": [               // required. Can't be empty. An array of object of\n      \u003cobject\u003e,               // any type, including object and array.\n      \u003cobject\u003e\n      ...\n    ], \n    \"randomOrder\":    \u003cbool\u003e, // optional. If set to true, objects will be picked \n                              // from the array in random order.\n    \"nullPercentage\": \u003cint\u003e   // optional\n\n}\n```\n\n### Reference\n\nUse the same list of values for fields in different collection.\n\n```scala\n\"fieldName\":{\n    \"type\":             \"reference\", // required\n    \"id\":               \u003cint\u003e,       // required, generator id used to link\n                                     // field between collections\n    \"refContent\":       \u003cgenerator\u003e, // required, generator to use to create the\n                                     // list of values \n    \"nullPercentage\":   \u003cint\u003e,       // optional\n    \"maxDistinctValue\": \u003cint\u003e        // optional\n}\n```\n\ngenerator in other collections:\n\n```scala\n\"fieldName\": {\n    \"type\":             \"reference\", // required\n    \"id\":               \u003cint\u003e,       // required, same id as previous generator\n    \"nullPercentage\":   \u003cint\u003e,       // optional\n    \"maxDistinctValue\": \u003cint\u003e        // optional\n}\n```\n\nIt can also be used to duplicate a field in a single collection ( see [reference_same_collection.json](https://github.com/feliixx/mgodatagen/tree/master/datagen/testdata/reference_same_collection.json) ) \n\n\n\n### Array\n\nGenerates a random array of bson object.\n\n```scala\n\"fieldName\": {\n    \"type\":             \"array\",     // required\n    \"arrayContent\":     \u003cgenerator\u003e, // required, generator use to create element\n                                     // to fill the array. Can be of any type\n    \"minLength\":        \u003cint\u003e,       // optional, must be \u003e= 0\n    \"maxLength\":        \u003cint\u003e,       // optional, must be \u003e= minLength\n    \"nullPercentage\":   \u003cint\u003e,       // optional\n    \"maxDistinctValue\": \u003cint\u003e        // optional\n}\n```\n\n### Object\n\nGenerates random nested object.\n\n```scala\n\"fieldName\": {\n    \"type\":                \"object\",    // required\n    \"objectContent\": {                  // required, list of generator used to\n       \"nestedFieldName1\": \u003cgenerator\u003e, // generate the nested document\n       \"nestedFieldName2\": \u003cgenerator\u003e,\n       ...\n    },\n    \"nullPercentage\":      \u003cint\u003e,       // optional\n    \"maxDistinctValue\":    \u003cint\u003e        // optional\n}\n```\n\n\n### CountAggregator\n\nCount documents from `\u003cdatabase\u003e.\u003ccollection\u003e` matching a specific query. To use a\nvariable of the document in the query, prefix it with \"$$\".\n\nThe query can't be empty or null.\n\n```scala\n\"fieldName\": {\n  \"type\":      \"countAggregator\", // required\n  \"database\":   \u003cstring\u003e,         // required, db to use to perform aggregation\n  \"collection\": \u003cstring\u003e,         // required, collection to use to perform aggregation\n  \"query\":      \u003cobject\u003e          // required, query that selects which documents to count in\n                                  // the collection\n}\n```\n\n**Example:**\n\nAssuming that the collection `first` contains:\n\n```scala\n{\"_id\": 1, \"field1\": 1, \"field2\": \"a\" }\n{\"_id\": 2, \"field1\": 1, \"field2\": \"b\" }\n{\"_id\": 3, \"field1\": 2, \"field2\": \"c\" }\n```\n\nand that the generator for collection `second` is:\n\n```scala\n{\n  \"database\": \"test\",\n  \"collection\": \"second\",\n  \"count\": 2,\n  \"content\": {\n    \"_id\": {\n      \"type\": \"autoincrement\",\n      \"autoType\": \"int\"\n      \"start\": 0\n    },\n    \"count\": {\n      \"type\": \"countAggregator\",\n      \"database\": \"test\",\n      \"collection\": \"first\",\n      \"query\": {\n        \"field1\": \"$$_id\"\n      }\n    }\n  }\n}\n```\n\nThe collection `second` will contain:\n\n```scala\n{\"_id\": 1, \"count\": 2}\n{\"_id\": 2, \"count\": 1}\n```\n\n### ValueAggregator\n\nGet distinct values for a specific field for documents from\n`\u003cdatabase\u003e.\u003ccollection\u003e` matching a specific query. To use a variable of\nthe document in the query, prefix it with \"$$\".\n\nThe query can't be empty or null.\n\n```scala\n\"fieldName\": {\n  \"type\":      \"valueAggregator\", // required\n  \"database\":   \u003cstring\u003e,         // required, db to use to perform aggregation\n  \"collection\": \u003cstring\u003e,         // required, collection to use to perform aggregation\n  \"key\":        \u003cstring\u003e,         // required, the field for which to return distinct values.\n  \"query\":      \u003cobject\u003e          // required, query that specifies the documents from which\n                                  // to retrieve the distinct values\n}\n```\n\n**Example**:\n\nAssuming that the collection `first` contains:\n\n```scala\n{\"_id\": 1, \"field1\": 1, \"field2\": \"a\" }\n{\"_id\": 2, \"field1\": 1, \"field2\": \"b\" }\n{\"_id\": 3, \"field1\": 2, \"field2\": \"c\" }\n```\n\nand that the generator for collection `second` is:\n\n```scala\n{\n  \"database\": \"test\",\n  \"collection\": \"second\",\n  \"count\": 2,\n  \"content\": {\n    \"_id\": {\n      \"type\": \"autoincrement\",\n      \"autoType\": \"int\"\n      \"start\": 0\n    },\n    \"count\": {\n      \"type\": \"valueAggregator\",\n      \"database\": \"test\",\n      \"collection\": \"first\",\n      \"key\": \"field2\",\n      \"values\": {\n        \"field1\": \"$$_id\"\n      }\n    }\n  }\n}\n```\n\nThe collection `second` will contain:\n\n```scala\n{\"_id\": 1, \"values\": [\"a\", \"b\"]}\n{\"_id\": 2, \"values\": [\"c\"]}\n```\n\n### BoundAggregator\n\nGet the lowest and highest value for a specific field of documents in\n`\u003cdatabase\u003e.\u003ccollection\u003e` matching a specific query. To use a variable of\nthe document in the query, prefix it with \"$$\"\n\nThe query can't be empty or null\n\n```scala\n\"fieldName\": {\n  \"type\":      \"boundAggregator\", // required\n  \"database\":   \u003cstring\u003e,         // required, db to use to perform aggregation\n  \"collection\": \u003cstring\u003e,         // required, collection to use to perform aggregation\n  \"key\":        \u003cstring\u003e,         // required, the field for which to return distinct values.\n  \"query\":      \u003cobject\u003e          // required, query that specifies the documents from which\n                                  // to retrieve lower/higher value\n}\n```\n\n**Example**:\n\nAssuming that the collection `first` contains:\n\n```scala\n{\"_id\": 1, \"field1\": 1, \"field2\": \"0\" }\n{\"_id\": 2, \"field1\": 1, \"field2\": \"10\" }\n{\"_id\": 3, \"field1\": 2, \"field2\": \"20\" }\n{\"_id\": 4, \"field1\": 2, \"field2\": \"30\" }\n{\"_id\": 5, \"field1\": 2, \"field2\": \"15\" }\n{\"_id\": 6, \"field1\": 2, \"field2\": \"200\" }\n```\n\nand that the generator for collection `second` is:\n\n```scala\n{\n  \"database\": \"test\",\n  \"collection\": \"second\",\n  \"count\": 2,\n  \"content\": {\n    \"_id\": {\n      \"type\": \"autoincrement\",\n      \"autoType\": \"int\"\n      \"start\": 0\n    },\n    \"count\": {\n      \"type\": \"valueAggregator\",\n      \"database\": \"test\",\n      \"collection\": \"first\",\n      \"key\": \"field2\",\n      \"values\": {\n        \"field1\": \"$$_id\"\n      }\n    }\n  }\n}\n```\n\nThe collection `second` will contain:\n\n```scala\n{\"_id\": 1, \"values\": {\"m\": 0, \"M\": 10}}\n{\"_id\": 2, \"values\": {\"m\": 15, \"M\": 200}}\n```\n\nwhere `m` is the min value, and `M` the max value.\n\n### Faker\n\nGenerate 'real' data using [gofakeit library](https://github.com/brianvoe/gofakeit).\n\n```scala\n\"fieldName\": {\n    \"type\":             \"faker\",  // required\n    \"method\":           \u003cstring\u003e, // required, faker method to use, for example: Name\n    \"nullPercentage\":   \u003cint\u003e,    // optional\n    \"maxDistinctValue\": \u003cint\u003e     // optional\n}\n```\n\nList of faker methods:\n\n```\n\"FirstName\"\n\"LastName\"\n\"Name\"\n\"NamePrefix\"\n\"NameSuffix\"\n\"Gender\"\n\"Phone\"\n\"PhoneFormatted\"\n\"Username\"\n\"Email\"\n\n\"BS\"\n\"BuzzWord\"\n\"Company\"\n\"CompanySuffix\"\n\"JobDescriptor\"\n\"JobLevel\"\n\"JobTitle\"\n\"Language\"\n\"LanguageAbbreviation\"\n\n\"CreditCardCvv\"\n\"CreditCardExp\"\n\"CreditCardType\"\n\"CurrencyLong\"\n\"CurrencyShort\"\n\n\"DomainName\"\n\"DomainSuffix\"\n\"HTTPMethod\"\n\"IPv4Address\"\n\"IPv6Address\"\n\"MacAddress\"\n\"FileMimeType\"\n\"SSN\"\n\"URL\"\n\"UserAgent\"\n\"SafariUserAgent\"\n\"OperaUserAgent\"\n\"ChromeUserAgent\"\n\"FileExtension\"\n\"FirefoxUserAgent\"\n\n\"TimeZone\"\n\"TimeZoneAbv\"\n\"TimeZoneFull\"\n\"Month\"\n\"WeekDay\"\n\n\"Word\"\n\"Question\"\n\"Quote\"\n\"Letter\"\n\"ProgrammingLanguage\"\n\"ProgrammingLanguageBest\"\n\"HexColor\"\n\"Color\"\n\"HipsterWord\"\n\"SafeColor\"\n\n\"Street\"\n\"StreetName\"\n\"StreetNumber\"\n\"StreetPrefix\"\n\"StreetSuffix\"\n\"City\"\n\"State\"\n\"StateAbr\"\n\"Zip\"\n\"Country\"\n\"CountryAbr\"\n\n\"Emoji\"\n\"EmojiAlias\"\n\"EmojiCategory\"\n\"EmojiDescription\"\n\"EmojiTag\"\n\n\"HackerAbbreviation\"\n\"HackerAdjective\"\n\"HackeringVerb\"\n\"HackerNoun\"\n\"HackerPhrase\"\n\"HackerVerb\"\n\n\"CarMaker\"\n\"CarModel\"\n\"CarTransmissionType\"\n\"CarFuelType\"\n\"CarType\"\n\n\"Animal\"\n\"AnimalType\"\n\"Cat\"\n\"Dog\"\n\"FarmAnimal\"\n\"PetName\"\n\n\"BeerAlcohol\"\n\"BeerBlg\"\n\"BeerHop\"\n\"BeerIbu\"\n\"BeerMalt\"\n\"BeerName\"\n\"BeerStyle\"\n\"BeerYeast\"\n```\n\nIf you're building large datasets (10M+ documents) you should avoid faker generators\nand use main or custom generators instead, as faker may be way slower.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeliixx%2Fmgodatagen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffeliixx%2Fmgodatagen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffeliixx%2Fmgodatagen/lists"}