{"id":40354805,"url":"https://github.com/pajachiet/pymongo-schema","last_synced_at":"2026-01-20T10:01:21.613Z","repository":{"id":50465295,"uuid":"88147387","full_name":"pajachiet/pymongo-schema","owner":"pajachiet","description":"A schema analyser for MongoDB, written in Python. ","archived":false,"fork":false,"pushed_at":"2022-08-31T09:32:32.000Z","size":2323,"stargazers_count":78,"open_issues_count":1,"forks_count":13,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-09-20T03:52:33.266Z","etag":null,"topics":["json","mongodb","schema"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pajachiet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-04-13T09:17:36.000Z","updated_at":"2025-07-25T16:32:53.000Z","dependencies_parsed_at":"2022-08-12T21:21:39.797Z","dependency_job_id":null,"html_url":"https://github.com/pajachiet/pymongo-schema","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pajachiet/pymongo-schema","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pajachiet%2Fpymongo-schema","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pajachiet%2Fpymongo-schema/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pajachiet%2Fpymongo-schema/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pajachiet%2Fpymongo-schema/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pajachiet","download_url":"https://codeload.github.com/pajachiet/pymongo-schema/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pajachiet%2Fpymongo-schema/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28601282,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-20T09:39:28.479Z","status":"ssl_error","status_checked_at":"2026-01-20T09:38:10.511Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json","mongodb","schema"],"created_at":"2026-01-20T10:00:54.807Z","updated_at":"2026-01-20T10:01:21.607Z","avatar_url":"https://github.com/pajachiet.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pymongo-schema\nA schema analyser for MongoDB, written in Python. \n\nThis tool allows you to **extract your application's schema, directly from your MongoDB data**. It comes with **powerful schema manipulation and export functionalities**.\n\nIt will be particularly useful when you inherit a data dump, and want to quickly learn how the data is structured. \n\npymongo-schema allows to map your MongoDB data model to a relational (SQL) data model. This greatly helps to configure [mongo-connector-postgresql](https://github.com/Hopwork/mongo-connector-postgresql), a tool to synchronize data from MongoDB to a target PostgreSQL database.\n\nIt also helps you to **compare different versions of your data model**.\n\nThis tools is inspired by [variety](https://github.com/variety/variety), with the following enhancement\n\n- extract the **hierarchical structure** of the schema \n- versatile output options : json, yaml, tsv, markdown or htlm\n- **finer grained types**. ex: INTEGER, DOUBLE rather than NUMBER \n- **filtering** of the output schema, using a `namespace` as defined by [mongo-connector](https://github.com/mongodb-labs/mongo-connector/wiki/Configuration-Options#configure-namespaces)\n- **mapping to a relational schema**\n- **comparison** of successive schema\n\n[![Build Status](https://travis-ci.org/pajachiet/pymongo-schema.svg?branch=master)](https://travis-ci.org/pajachiet/pymongo-schema)\n[![Coverage Status](https://coveralls.io/repos/github/pajachiet/pymongo-schema/badge.svg?branch=master)](https://coveralls.io/github/pajachiet/pymongo-schema?branch=master)\n\n\n# Install\n\nYou can install latest stable version PyPi :\n```shell\npip install --upgrade pymongo-schema\n```\n\nOr directly from github : \n```shell\npip install --upgrade git+https://github.com/pajachiet/pymongo-schema\n```\n# Usage\n\n## Command line\n\n```shell\npython -m pymongo_schema -h\nusage: [-h] [--quiet] {extract,transform,tosql,compare} ...\n\ncommands:\n  {extract,transform,tosql,compare}\n    extract             Extract schema from a MongoDB instance\n    transform           Transform a json schema to another format, potentially\n                        filtering or changing columns outputs\n    tosql               Create a mapping from mongo schema to relational\n                        schema (json input and output)\n    compare             Compare two schemas\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --quiet               Remove logging on standard output\n\nUsage:\n    python -m pymongo_schema extract -h\n    usage:  [-h] [-f [FORMATS [FORMATS ...]]] [-o OUTPUT] [--port PORT] [--host HOST]\n                 [-d [DATABASES [DATABASES ...]]] [-c [COLLECTIONS [COLLECTIONS ...]]]\n                 [--columns COLUMNS [COLUMNS ...]] [--size SIZE] [--without-counts]\n                 \n    python -m pymongo_schema transform -h\n    usage: [-h] [-f [FORMATS [FORMATS ...]]] [-o OUTPUT] [--category CATEGORY] [-n FILTER]\n                [--columns COLUMNS [COLUMNS ...]] [--without-counts] [input]\n                \n    python -m pymongo_schema tosql -h\n    usage: [-h] [-f [FORMATS [FORMATS ...]]] [--columns COLUMNS [COLUMNS ...]]\n                [--without-counts] [-o OUTPUT] [input]\n\n    python -m pymongo_schema compare -h\n    usage: [-h] [-f [FORMATS [FORMATS ...]]] [-o OUTPUT] [input]\n                [--columns COLUMNS [COLUMNS ...]] [--without-counts]\n                [--detailed_diff] prev_schema [new_schema]\n\n```\n\nTo display full usage, with options description, run:\n```shell \npymongo-schema \u003ccommand\u003e -h\n```\n\n## Python package\n\npymongo_schema modules can also be imported to be used directly inside python code :\n\n```python\nfrom pymongo_schema.compare import compare_schemas_bases\nfrom pymongo_schema.export import transform_data_to_file\nfrom pymongo_schema.extract import extract_pymongo_client_schema\nfrom pymongo_schema.filter import filter_mongo_schema_namespaces\nfrom pymongo_schema.tosql import mongo_schema_to_mapping\n```\n\nFore more details, refer to modules and functions docstrings.\n\n# Examples\n\nFirst, lets populate a collection in test database from mongo shell\n\n\n    db.users.insert({name: \"Tom\", bio: \"A nice guy.\", pets: [\"monkey\", \"fish\"], someWeirdLegacyKey: \"I like Ike!\"})\n    db.users.insert({name: \"Dick\", bio: \"I swordfight.\", birthday: new Date(\"1974/03/14\")})\n    db.users.insert({name: \"Harry\", pets: \"egret\", birthday: new Date(\"1984/03/14\"), location:{country:\"France\", city: \"Lyon\"}})\n    db.users.insert({name: \"Geneviève\", bio: \"Ça va?\", location:{country:\"France\", city: \"Nantes\"}})\n    db.users.insert({name: \"MadJacques\", location:{country:\"France\", city: \"Paris\"}})\n\n## Bash api examples\n### Easy examples\n\nExtract the schema from this database, with a json format on standard output\n\n    $ python -m pymongo_schema extract --database test\n    === Start MongoDB schema analysis\n    Extract schema of database test\n    ...collection users\n       scanned 5 documents out of 5 (100.00 %)\n    --- MongoDB schema analysis took 0.00 s\n    === Write output\n\n    {\"test\": {\n        \"users\": {\n            \"object\": {\"_id\": {\"prop_in_object\": 1.0, \"count\": 5, \"type\": \"oid\", \"types_count\": {\"oid\": 5}},\n                       \"pets\": {\"array_types_count\": {\"string\": 2}, \"prop_in_object\": 0.4, \"count\": 2, \"array_type\": \"string\", \"type\": \"ARRAY\", \"types_count\": {\"string\": 1, \"ARRAY\": 1}},\n                       \"birthday\": {\"prop_in_object\": 0.4, \"count\": 2, \"type\": \"date\", \"types_count\": {\"date\": 2}},\n                       \"name\": {\"prop_in_object\": 1.0, \"count\": 5, \"type\": \"string\", \"types_count\": {\"string\": 5}},\n                       \"bio\": {\"prop_in_object\": 0.6, \"count\": 3, \"type\": \"string\", \"types_count\": {\"string\": 3}},\n                       \"someWeirdLegacyKey\": {\"prop_in_object\": 0.2, \"count\": 1, \"type\": \"string\", \"types_count\": {\"string\": 1}},\n                       \"location\": {\"object\": {\"country\": {\"prop_in_object\": 1.0, \"count\": 3, \"type\": \"string\", \"types_count\": {\"string\": 3}},\n                                               \"city\": {\"prop_in_object\": 1.0, \"count\": 3, \"type\": \"string\", \"types_count\": {\"string\": 3}}},\n                                    \"types_count\": {\"OBJECT\": 3}, \"prop_in_object\": 0.6, \"type\": \"OBJECT\", \"count\": 3}},\n            \"count\": 5}}}\n\nExtract the same schema in md format.\n\n    $ python -m pymongo_schema extract --database test --format md\n    === Start MongoDB schema analysis\n    Extract schema of database test\n    ...collection users\n       scanned 5 documents out of 5 (100.00 %)\n    --- MongoDB schema analysis took 0.00 s\n    === Write output\n\n    ### Database: test\n    #### Collection: users \n    |Field_compact_name     |Field_name             |Count     |Percentage     |Types_count                           |\n    |-----------------------|-----------------------|----------|---------------|--------------------------------------|\n    |_id                    |_id                    |5         |100.0          |oid : 5                               |\n    |name                   |name                   |5         |100.0          |string : 5                            |\n    |bio                    |bio                    |3         |60.0           |string : 3                            |\n    |location               |location               |3         |60.0           |OBJECT : 3                            |\n    | . city                |city                   |3         |100.0          |string : 3                            |\n    | . country             |country                |3         |100.0          |string : 3                            |\n    |birthday               |birthday               |2         |40.0           |date : 2                              |\n    |pets                   |pets                   |2         |40.0           |ARRAY(string : 2) : 1, string : 1     |\n    |someWeirdLegacyKey     |someWeirdLegacyKey     |1         |20.0           |string : 1                            |\n\nMap this schema to a relational mapping\n\n    $ python -m pymongo_schema extract --database test | python -m pymongo_schema tosql\n    === Start MongoDB schema analysis\n    Extract schema of database test\n    ...collection users\n       scanned 5 documents out of 5 (100.00 %)\n    --- MongoDB schema analysis took 0.00 s\n    === Write output\n    === Generate mapping from mongo to sql\n    === Write output\n\n    {\"test\":\n     {\"users\":\n          {\"_id\": {\"type\": \"TEXT\", \"dest\": \"_id\"},\n           \"pets\": {\"valueField\": \"pets\", \"fk\": \"id_users\", \"type\": \"_ARRAY_OF_SCALARS\", \"dest\": \"users__pets\"},\n           \"location.city\": {\"type\": \"TEXT\", \"dest\": \"location__city\"},\n           \"name\": {\"type\": \"TEXT\", \"dest\": \"name\"},\n           \"someWeirdLegacyKey\": {\"type\": \"TEXT\", \"dest\": \"someWeirdLegacyKey\"},\n           \"pk\": \"_id\",\n           \"bio\": {\"type\": \"TEXT\", \"dest\": \"bio\"},\n           \"birthday\": {\"type\": \"TIMESTAMP\", \"dest\": \"birthday\"},\n           \"location.country\": {\"type\": \"TEXT\", \"dest\": \"location__country\"}},\n      \"users__pets\": {\"id_users\": {\"type\": \"TEXT\"},\n                      \"pets\": {\"type\": \"TEXT\", \"dest\": \"pets\"},\n                      \"pk\": \"_id_postgres\"}}}\n\n### Other examples\n\n**extract:** Extract the schema for collections `test_collection_1` and `test_collection_2` from `test_db` and write it into `mongo_schema.html` and `mongo_schema.json` files\n```shell\n    python -m pymongo_schema extract --databases test_db --collections test_collection_1 test_collection_2 --output mongo_schema --format html json\n```\n**extract:** Extract the schema for collection `test_collection_1` with only 1000 random rows scanned and write it into `mongo_schema.html` files\n```shell\n    python -m pymongo_schema extract --collections test_collection_1 --size 1000 --output mongo_schema --format html\n```\n**transform:** Filter extracted schema (`mongo_schema.json`) using `namespace.json` file and write output into `mongo_schema_filtered.html`, `mongo_schema_filtered.csv` and `mongo_schema_filtered.json` files\n```shell\n    python -m pymongo_schema transform mongo_schema.json --filter namespace.json --output mongo_schema_filtered --format html csv json\n```\n**tosql:** Create mapping file based on `mongo_schema_filtered.json`\n```shell\n    python -m pymongo_schema tosql mongo_schema_filtered.json --output mapping.json\n```\n\n## Python api examples\n\nExtract the schemas of all collections and all databases in a MongoDB instance:\n\n```python\nimport pymongo\nfrom pymongo_schema.extract import extract_pymongo_client_schema\n\nwith pymongo.MongoClient() as client:\n    schema = extract_pymongo_client_schema(client)\n```\nArguments can be specified to extract only some databases and some collections. See code documentation for more details.\n\nFilter extract schema with a `namespace`:\n```python\nimport json\nfrom pymongo_schema.filter import filter_mongo_schema_namespaces\n\n# assuming a namespace is defined in a file named namespace.json\nwith open(\"namespace.json\") as f:\n    namespace = json.load(f)\n\nschema_filtered = filter_mongo_schema_namespaces(schema, namespace)\n```\n\nSave filtered_schema (could be used for schema) to file in json and md formats in a `docs` directory:\n```python\nfrom pymongo_schema.export import transform_data_to_file\n\ntransform_data_to_file(schema_filtered, ['json', 'md'], output='docs/schema_filtered')\n```\n\nCompare filtered_schema (could be used for schema) to another (previous for example) schema:\n```python\nfrom pymongo_schema.compare import compare_schemas_bases\n\n# assuming a namespace is defined in a file named namespace.json\nwith open(\"old_schema_filtered.json\") as f:\n    old_schema_filtered = json.load(f)\n\ndifferences = compare_schemas_bases(old_schema_filtered, schema_filtered)\n```\n\nSave differences to file in json and md formats in a `docs` directory:\n```python\ntransform_data_to_file(differences, ['json', 'md'], output='docs/diff', category='diff')\n```\n\nTransform filtered_schema to a relational mapping:\n```python\nfrom pymongo_schema.tosql import mongo_schema_to_mapping\n\nmapping = mongo_schema_to_mapping(schema_filtered)\n```\n\nSave mapping to file in json and md formats in a `docs` directory:\n```python\ntransform_data_to_file(mapping, ['json', 'md'], output='docs/mapping', category='mapping')\n```\n\n# Schema\n\nWe define 'schema' as a dictionary describing the structure of MongoDB component, being either a MongoDB instances, a database, a collection, an objects or a field. \n \nSchema are hierarchically nested, with the following structure :  \n\n\n\n```python \n# mongo_schema : A MongoDB instance contains databases\n{\n    \"database_name_1\": {}, #database_schema,\n    \"database_name_2\": # A database contains collections\n    { \n        \"collection_name_1\": {}, # collection_schema,\n        \"collection_name_2\": # A collection maintains a 'count' and contains 1 object\n        { \n            \"count\" : int, \n            \"object\":  # object_schema : An object contains fields.            \n             {\n                \"field_name_1\" : {}, # field_schema, \n                \"field_name_2\": # A field maintains 'types_count_information\n                                # An optional 'array_types_count' field maintains 'types_count' information for values encountered in arrays \n                                # An 'OBJECT' or 'ARRAY(OBJECT)' field recursively contains 1 'object'\n                {\n                    'count': int,\n                    'prop_in_object': float,\n                    'type': 'type_str', \n                    'types_count': {  # count for each encountered type  \n                        'type_str' : 13,\n                        'Null' : 3\n                    }, \n                    'array_type': 'type_str',\n                    'array_types_count': {  # (optional) count for each type encountered  in arrays\n                        'type_str' : 7,\n                        'Null' : 3\n                    }, \n                    'object': {}, # (optional) object_schema \n                } \n            } \n        }\n    }           \n}\n```\n# Contributing - Limitations - TODO \nThe code base should be easy to read and improve upon. Contributions are welcomed.\n\n## Mixed types handling\npymongo-schema handles mixed types by looking for the lowest common parent type in the following tree.\n\n\u003cimg src=\"https://raw.githubusercontent.com/pajachiet/pymongo-schema/master/type_tree.png\" alt=\"type_tree\" width=700/\u003e\n\nIf a field contains both arrays and scalars, it is considered as an array. The 'array_type' is defined as the common parent type of scalars and array_types encountered in this field. \n\nTODO\n\n- Improve mapping from Python type to name (TYPE_TO_STR dict)\n    - see documentation: [bson-types](https://docs.mongodb.com/manual/reference/bson-types/), [spec](http://bsonspec.org/spec.html)\n\n- Check a mongo scheme for compatibility to an sql mapping\n- Handle incompatibilities\n\n## Support Python 3 version\n\n- fix encoding issues when exporting manually added non-ascii characters\n\n## Diff between schemas\n\nA way to compare the schema dictionaries and highlights the differences.\n\n\n## Test if a mongo schema can be mapped tosql\n\n- test for the presence of mongo types in the mapping \n- look for mixes of list and scalar, that are currently not supported by mongo-connector-postgresql\n- look for the presence of an '_id'\n\n=\u003e It may be donne directly in mongo-connector-postgresql doc_manager\n\n\n## Adding fields in json/yaml outputs\n\n- for example to add comments\n\n\n## Other option to sort text outputs\n\n- It is currently based on counts and then alphabetically.\n\n\n\n## Tackle bigger databases\nThis code has been only used on a relatively small sized Mongo database, on which it was faster than Variety. \n\nTo tackle bigger databases, it certainly would be usefull to implement the following variety's features :\n\n- Analyze subsets of documents, most recent documents, or documents to a maximum depth.\n\n## Tests\nThe codebase is still under development. It should not be trusted blindly.\n\n## Devcontainer\n\nThis project contains a devcontainer definition.\n\nJust open the project in VS Code and hit `[CTRL]` + `[SHIFT]` + `[P]` \u003e `Reopen in container` and it will launch the dev container with all dependencies installed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpajachiet%2Fpymongo-schema","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpajachiet%2Fpymongo-schema","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpajachiet%2Fpymongo-schema/lists"}