{"id":13480883,"url":"https://github.com/variety/variety","last_synced_at":"2025-05-15T09:06:11.367Z","repository":{"id":2770918,"uuid":"3769679","full_name":"variety/variety","owner":"variety","description":"Variety: a MongoDB Schema Analyzer","archived":false,"fork":false,"pushed_at":"2024-08-17T10:34:43.000Z","size":277,"stargazers_count":1753,"open_issues_count":11,"forks_count":245,"subscribers_count":59,"default_branch":"main","last_synced_at":"2025-04-11T19:55:23.903Z","etag":null,"topics":["data-science","javascript","mongo","mongodb","nosql","nosql-analytics"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/variety.png","metadata":{"files":{"readme":"README.markdown","changelog":"CHANGELOG","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2012-03-19T23:03:02.000Z","updated_at":"2025-04-01T13:06:24.000Z","dependencies_parsed_at":"2022-07-22T00:46:55.864Z","dependency_job_id":"e8f4118f-d069-46a7-8d1f-d14ae2171525","html_url":"https://github.com/variety/variety","commit_stats":{"total_commits":199,"total_committers":24,"mean_commits":8.291666666666666,"dds":0.7638190954773869,"last_synced_commit":"7b45441ad5dbab1a2255adc8299260bdb4ab5861"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/variety%2Fvariety","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/variety%2Fvariety/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/variety%2Fvariety/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/variety%2Fvariety/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/variety","download_url":"https://codeload.github.com/variety/variety/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254310515,"owners_count":22049469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","javascript","mongo","mongodb","nosql","nosql-analytics"],"created_at":"2024-07-31T17:00:46.218Z","updated_at":"2025-05-15T09:06:11.350Z","avatar_url":"https://github.com/variety.png","language":"JavaScript","funding_links":[],"categories":["JavaScript","Tools","II. Databases, search engines, big data and machine learning"],"sub_categories":["Development","4. Client and drivers for databases"],"readme":"# Meet Variety, a Schema Analyzer for MongoDB ###\nThis lightweight tool helps you get a sense of your application's schema, as well as any outliers to that schema. Particularly useful when you inherit a codebase with data dump and want to quickly learn how the data's structured. Also useful for finding rare keys.\n\n***\n\n_“I happen to slowly be falling in love with Variety! It is actually one of the most useful tools to get a sense for a messy/unknown data set, and I have put it in a few of our exercises at Zipfian Academy.”_\n\nJon Dinu\n_Co-founder of [Zipfian Academy](http://www.zipfianacademy.com/)_\n\n***\n\nAlso featured on the [official MongoDB blog](http://blog.mongodb.org/post/21923016898/meet-variety-a-schema-analyzer-for-mongodb).\n\n### An Easy Example ###\n\nWe'll make a collection:\n\n    db.users.insert({name: \"Tom\", bio: \"A nice guy.\", pets: [\"monkey\", \"fish\"], someWeirdLegacyKey: \"I like Ike!\"});\n    db.users.insert({name: \"Dick\", bio: \"I swordfight.\", birthday: new Date(\"1974/03/14\")});\n    db.users.insert({name: \"Harry\", pets: \"egret\", birthday: new Date(\"1984/03/14\")});\n    db.users.insert({name: \"Geneviève\", bio: \"Ça va?\"});\n    db.users.insert({name: \"Jim\", someBinData: new BinData(2,\"1234\")});\n\nSo, let's see what we've got here:\n\n    $ mongo test --eval \"var collection = 'users'\" variety.js\n\n    +------------------------------------------------------------------+\n    | key                | types              | occurrences | percents |\n    | ------------------ | ------------       | ----------- | -------- |\n    | _id                | ObjectId           |           5 |    100.0 |\n    | name               | String             |           5 |    100.0 |\n    | bio                | String             |           3 |     60.0 |\n    | birthday           | Date               |           2 |     40.0 |\n    | pets               | Array(1),String(1) |           2 |     40.0 |\n    | someBinData        | BinData-old        |           1 |     20.0 |\n    | someWeirdLegacyKey | String             |           1 |     20.0 |\n    +------------------------------------------------------------------+\n\n_(\"test\" is the database containing the collection we are analyzing.)_\n\nHmm. Looks like everybody has a \"name\" and \"_id\". Most, but not all have a \"bio\".\n\nInterestingly, it looks like \"pets\" can be either an array or a string, but there are more arrays than strings. Will this cause any problems in the application, I wonder?\n\nSeems like the first document created has a weird legacy key—those damn fools who built the prototype didn't clean up after themselves. If there were a thousand such early documents, I might cross-reference the codebase to confirm they are no longer used, and then delete them all. That way they'll not confuse any future developers.\n\nResults are stored for future use in a varietyResults database.\n\n### See Progress When Analysis Takes a Long Time ###\n\nTailing the log is great for this. Mongo provides a \"percent complete\" measurement for you. These operations can take a long time on huge collections.\n\n### Analyze Only Recent Documents ###\n\nPerhaps you have a really large collection, and you can't wait a whole day for Variety's results.\n\nPerhaps you want to ignore a collection's oldest documents, and only see what the collection's documents' structures have been looking like, as of late.\n\nOne can apply a \"limit\" constraint, which analyzes only the newest documents in a collection ([unless sorting](https://github.com/variety/variety#analyze-documents-sorted-in-a-particular-order)), like so:\n\n\t$ mongo test --eval \"var collection = 'users', limit = 1\" variety.js\n\nLet's examine the results closely:\n\n    +----------------------------------------------------+\n    | key         | types       | occurrences | percents |\n    | ----------- | ----------- | ----------- | -------- |\n    | _id         | ObjectId    |           1 |    100.0 |\n    | name        | String      |           1 |    100.0 |\n    | someBinData | BinData-old |           1 |    100.0 |\n    +----------------------------------------------------+\n\nWe are only examining the last document here (\"limit = 1\"). It belongs to Geneviève, and only contains the _id, name and bio fields. So it makes sense these are the only three keys.\n\n### Analyze Documents to a Maximum Depth\n\nPerhaps you have a potentially very deep nested object structure, and you don't want to see more than a few levels deep in the analysis.\n\nOne can apply a \"maxDepth\" constraint, which limits the depth Variety will recursively search to find new objects.\n\n    db.users.insert({name:\"Walter\", someNestedObject:{a:{b:{c:{d:{e:1}}}}}});\n\nThe default will traverse all the way to the bottom of that structure:\n\n    $ mongo test --eval \"var collection = 'users'\" variety.js\n\n    +----------------------------------------------------------------+\n    | key                        | types    | occurrences | percents |\n    | -------------------------- | -------- | ----------- | -------- |\n    | _id                        | ObjectId |           1 |    100.0 |\n    | name                       | String   |           1 |    100.0 |\n    | someNestedObject           | Object   |           1 |    100.0 |\n    | someNestedObject.a         | Object   |           1 |    100.0 |\n    | someNestedObject.a.b       | Object   |           1 |    100.0 |\n    | someNestedObject.a.b.c     | Object   |           1 |    100.0 |\n    | someNestedObject.a.b.c.d   | Object   |           1 |    100.0 |\n    | someNestedObject.a.b.c.d.e | Number   |           1 |    100.0 |\n    +----------------------------------------------------------------+\n\n    $ mongo test --eval \"var collection = 'users', maxDepth = 3\" variety.js\n\n    +----------------------------------------------------------+\n    | key                  | types    | occurrences | percents |\n    | -------------------- | -------- | ----------- | -------- |\n    | _id                  | ObjectId |           1 |    100.0 |\n    | name                 | String   |           1 |    100.0 |\n    | someNestedObject     | Object   |           1 |    100.0 |\n    | someNestedObject.a   | Object   |           1 |    100.0 |\n    | someNestedObject.a.b | Object   |           1 |    100.0 |\n    +----------------------------------------------------------+\n\nAs you can see, Variety only traversed three levels deep.\n\n### Analyze a Subset of Documents ###\n\nPerhaps you have a large collection, or you only care about some subset of the documents.\n\nOne can apply a \"query\" constraint, which takes a standard Mongo query object, to filter the set of documents required before analysis.\n\n    $ mongo test --eval \"var collection = 'users', query = {'caredAbout':true}\" variety.js\n\n### Analyze Documents Sorted In a Particular Order ###\n\nPerhaps you want to analyze a subset of documents sorted in an order other than creation order, say, for example, sorted by when documents were updated.\n\nOne can apply a \"sort\" constraint, which analyzes documents in the specified order like so:\n\n    $ mongo test --eval \"var collection = 'users', sort = { updated_at : -1 }\" variety.js\n\n### Include Last Value ###\n\nYou can add ```lastValue``` property to show values of the last document.\n\n    $ mongo test --eval \"var collection = 'orders', lastValue = true\" variety.js\n    \n    +--------------------------------------------------------------------------------------------+\n    | key             | types        | occurrences | percents | lastValue                        |\n    | --------------- | ------------ | ----------- | -------- | -------------------------------- |\n    | _id             | ObjectId     |           1 |    100.0 | 5a834b76f4d3fa6e578a67f6         |\n    | age             | Number       |           1 |    100.0 |                          38.2569 |\n    | animals         | Array        |           1 |    100.0 | [Array]                          |\n    | animals.XX.type | String       |           1 |    100.0 | dog                              |\n    | balance         | NumberLong   |           1 |    100.0 |                 1236458945684846 |\n    | date            | Date         |           1 |    100.0 |                    1513539969000 |\n    | fn              | Object       |           1 |    100.0 | [Object]                         |\n    | fn.code         | String       |           1 |    100.0 | function (x, y){ return x + y; } |\n    | name            | String       |           1 |    100.0 | John                             |\n    | nil             | null         |           1 |    100.0 | [null]                           |\n    | uid             | BinData-UUID |           1 |    100.0 | 3b241101e2bb42558caf4136c566a962 |\n    +--------------------------------------------------------------------------------------------+\n\nIf use without ```sort``` it will fetch values of the last natural sorted document.\nDate is converted into timestamp, ObjectId into string and binary data as hex. Other types shown in square brackets.\n\n### Render Output As JSON For Easy Ingestion and Parsing ###\n\nVariety supports two different output formats:\n\n- ASCII: nicely formatted tables (as in this README)\n- JSON: valid JSON results for subsequent processing in other tools (see also [quiet option](#quiet-option))\n\nDefault format is ```ascii```. You can select the type of format with property ```outputFormat``` provided to Variety. Valid values are ```ascii``` and ```json```.\n\n    $ mongo test --quiet --eval \"var collection = 'users', outputFormat='json'\" variety.js\n\n#### Quiet Option ####\nBoth MongoDB and Variety output some additional information to standard output. If you want to remove this info, you can use ```--quiet``` option provided to ```mongo``` executable.\nVariety can also read that option and mute unnecessary output. This is useful in connection with ```outputFormat=json```. You would then receive only JSON, without any other characters around it.\n\n    $ mongo test --quiet --eval \"var collection = 'users', sort = { updated_at : -1 }\" variety.js\n\n#### Log Keys and Types As They Arrive Option ####\nSometimes you want to see the keys and types come in as it happens.  Maybe you have a large dataset and want accurate results, but you also are impatient and want to see something now.  Or maybe you have a large mangled dataset with crazy keys (that probably shouldn't be keys) and Variety is going out of memory.  This option will show you the keys and types as they come in and help you identify problems with your dataset without needing the Variety script to finish.  \n\n    $ mongo test --eval \"var collection = 'users', sort = { updated_at : -1 }, logKeysContinuously = true\" variety.js\n\n#### Exclude Subkeys ####\nSometimes you inherit a database full of junk.  Maybe the previous developer put data in the database keys, which causes Variety to go out of memory when run.  After you've run the `logKeysContinuously` to figure out which subkeys may be a problem, you can use this option to run Variety without those subkeys.  \n\n    db.users.insert({name:\"Walter\", someNestedObject:{a:{b:{c:{d:{e:1}}}}}, otherNestedObject:{a:{b:{c:{d:{e:1}}}}}});\n\n    $ mongo test --eval \"var collection = 'users', sort = { updated_at : -1 }, excludeSubkeys = [ 'someNestedObject.a.b' ]\" variety.js\n\n    +-----------------------------------------------------------------+\n    | key                         | types    | occurrences | percents |\n    | --------------------------- | -------- | ----------- | -------- |\n    | _id                         | ObjectId |           1 |    100.0 |\n    | name                        | String   |           1 |    100.0 |\n    | someNestedObject            | Object   |           1 |    100.0 |\n    | someNestedObject.a          | Object   |           1 |    100.0 |\n    | someNestedObject.a.b        | Object   |           1 |    100.0 |\n    | otherNestedObject           | Object   |           1 |    100.0 |\n    | otherNestedObject.a         | Object   |           1 |    100.0 |\n    | otherNestedObject.a.b       | Object   |           1 |    100.0 |\n    | otherNestedObject.a.b.c     | Object   |           1 |    100.0 |\n    | otherNestedObject.a.b.c.d   | Object   |           1 |    100.0 |\n    | otherNestedObject.a.b.c.d.e | Number   |           1 |    100.0 |\n    +-----------------------------------------------------------------+\n\n#### Secondary Reads ####\nAnalyzing a large collection on a busy replica set primary could take a lot longer than if you read from a secondary. To do so, we have to tell MongoDB it's okay to perform secondary reads\nby setting the ```slaveOk``` property to ```true```:\n\n    $ mongo secondary.replicaset.member:31337/somedb --eval \"var collection = 'users', slaveOk = true\" variety.js\n\n### Save Results in MongoDB For Future Use ###\nBy default, Variety prints results only to standard output and does not store them in MongoDB itself. If you want to persist them automatically in MongoDB for later usage, you can set the parameter ```persistResults```.\nVariety then stores result documents in database ```varietyResults``` and the collection name is derived from the source collection's name.\nIf the source collection's name is ```users```, Variety will store results in collection ```usersKeys``` under ```varietyResults``` database.\n\n    $ mongo test --quiet --eval \"var collection = 'users', persistResults=true\" variety.js\n\nTo persist to an alternate MongoDB database, you may specify the following parameters:\n\n  * `resultsDatabase` - The database to store Variety results in. Accepts either a database name or a `host[:port]/database` URL.\n  * `resultsCollection` - Collection to store Variety results in. **WARNING:** This collection is dropped before results are inserted.\n  * `resultsUser` - MongoDB username for results database\n  * `resultsPass` - MongoDB password for results database\n\n```\n$ mongo test --quiet --eval \"var collection = 'users', persistResults=true, resultsDatabase='db.example.com/variety' variety.js\n```\n\n### Reserved Keys ###\nVariety expects keys to be well formed, not having any '.'s in them (mongo 2.4 allows dots in certain cases).  Also mongo uses the pseudo keys 'XX' and keys coresponding to the regex 'XX\\d+XX.*' for use with arrays.  You can change the string XX in these patterns to whatever you like if there is a conflict in your database using the `arrayEscape` parameter.  \n\n    $ mongo test --quiet --eval \"var collection = 'users', arrayEscape = 'YY'\" variety.js\n\n### Command Line Interface\nVariety itself is command line friendly, as shown on examples above.\nBut if you are a NPM and Node.js user, you could prefer the\n[variety-cli](https://github.com/variety/variety-cli) project. It simplifies usage of\nVariety and removes all the complexity of passing variables in the ```--eval``` argument and\nproviding a path to the variety.js library.\n\nExample of a simplified command-line usage:\n```\nvariety test/users --outputFormat='json' --quiet\n```\nFor more details see the [documentation of variety-cli project](https://github.com/variety/variety-cli).\n\n##### \"But my dad told me MongoDB is a schemaless database!\" #####\n\nFirst of all, your father is a great guy. Moving on...\n\nA Mongo collection does not enforce a predefined schema like a relational database table. Still, documents in real-world collections nearly always have large sections for which the format of the data is the same. In other words, there is a schema to the majority of collections, it's just enforced by the _application_, rather than by the database system. And this schema is allowed to be a bit fuzzy, in the same way that a given table column might not be required in all rows, but to a much greater degree of flexibility. So we examine what percent of documents in the collection contain a key, and we get a feel for, among other things, how crucial that key is to the proper functioning of the application.\n\n##### Dependencies #####\n\nAbsolutely none, except MongoDB. Written in 100% JavaScript. _(mongod's \"noscripting\" may not be set to true, and 'strict mode' must be disabled.)_\n\n##### Development, Hacking #####\nThis project is NPM based and provides standard NPM functionality. As an additional (not required) dependency, [Docker](https://www.docker.com/) can be installed to test against different MongoDB versions.\n\nTo install all dev dependencies call as usual:\n```\nnpm install\n```\n\nBy default, tests expect MongoDB available on ```localhost:27017``` and can be executed by calling:\n\n```\nnpm test\n```\n\nIf you have Docker installed and don't want to test against your own MongoDB instance,\nyou can execute tests against dockerized MongoDB:\n\n```\nMONGODB_VERSION=3.2 npm run test:docker\n```\nThe script downloads one of [official MongoDB images](https://hub.docker.com/_/mongo/) (based on your provided version),\nstarts the database, executes test suite against it (inside the container) and stops the DB.\n\n#### Reporting Issues / Contributing ####\n\nPlease report any bugs and feature requests on the Github issue tracker. I will read all reports!\n\nI accept pull requests from forks. Very grateful to accept contributions from folks.\n\n#### Core Maintainers ####\n\n* Tomáš Dvořák ([personal website](http://www.tomas-dvorak.cz/))\n* Eve Freeman ([Twitter](https://twitter.com/wefreema))\n* James Cropcho (original creator of Variety) ([Twitter](https://twitter.com/Cropcho))\n\n#### Special Thanks ####\n\nAdditional special thanks to Gaëtan Voyer-Perraul ([@gatesvp](https://twitter.com/#!/@gatesvp)) and Kristina Chodorow ([@kchodorow](https://twitter.com/#!/kchodorow)) for answering other people's questions about how to do this on Stack Overflow, thereby providing me with the initial seed of code which grew into this tool.\n\nMuch thanks also, to Kyle Banker ([@Hwaet](https://twitter.com/#!/hwaet)) for writing an unusually good book on MongoDB, which has taught me everything I know about it so far.\n\n#### Tools Which Use Variety (Open Source) ####\n\nKnow of one? Built one? Let us know!\n\n##### Stay Safe #####\n\nI have every reason to believe this tool will **not** corrupt your data or harm your computer. But if I were you, I would not use it in a production environment.\n\n\nReleased by Maypop Inc, © 2012–2023, under the [MIT License] (http://www.opensource.org/licenses/MIT).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvariety%2Fvariety","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvariety%2Fvariety","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvariety%2Fvariety/lists"}