{"id":25019283,"url":"https://github.com/knightchaser/mongodbshardingexample","last_synced_at":"2025-10-09T21:04:26.459Z","repository":{"id":275552644,"uuid":"926384913","full_name":"KnightChaser/MongoDBShardingExample","owner":"KnightChaser","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-03T08:29:26.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-30T09:41:24.000Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KnightChaser.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-03T06:38:19.000Z","updated_at":"2025-02-03T08:29:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"c235c6eb-0699-4103-886f-4d21df450d69","html_url":"https://github.com/KnightChaser/MongoDBShardingExample","commit_stats":null,"previous_names":["knightchaser/mongodbshardingexample"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/KnightChaser/MongoDBShardingExample","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KnightChaser%2FMongoDBShardingExample","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KnightChaser%2FMongoDBShardingExample/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KnightChaser%2FMongoDBShardingExample/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KnightChaser%2FMongoDBShardingExample/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KnightChaser","download_url":"https://codeload.github.com/KnightChaser/MongoDBShardingExample/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KnightChaser%2FMongoDBShardingExample/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279002071,"owners_count":26083285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-05T11:39:50.690Z","updated_at":"2025-10-09T21:04:26.454Z","avatar_url":"https://github.com/KnightChaser.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MongoDBShardingExample\n\nAn example of MongoDB sharding using Docker and Python.\n\n\u003e **Prerequisites:**  \n\u003e - **Docker** and **docker-compose** must be installed and properly configured.  \n\u003e - **MongoDB** (via Docker images) must be accessible.  \n\u003e - In Python, install the required packages listed in `requirements.txt`.\n\n---\n\n## Overview\n\nThis project demonstrates how to set up a sharded MongoDB cluster using Docker containers and how to interact with the sharded data via Python. In this example, the cluster consists of:\n\n- **Config Servers:** A three-member replica set (`cfgrs`) for cluster metadata.\n- **Shards:** Three distinct shards, each implemented as a replica set with two members.  \n  - **shard1rs:** Two nodes (mainly used for system collections, such as `config.system.sessions`).\n  - **shard2rs** and **shard3rs:** Two nodes each, used for storing application data in the `exampleDB.files` collection.\n- **Mongos Router:** A single mongos process that routes queries to the appropriate shard(s).\n\n---\n\n## Procedure\n\n### 1. Boot Up the Docker Containers\n\nRun the following command in the directory containing your `docker-compose.yml` file:\n\n```sh\ndocker-compose up -d\n```\n\nThis command starts all services (config servers, shard replica sets, and the mongos router) in detached mode.\n\n---\n\n### 2. Initialize the Config Server Replica Set (`cfgrs`)\n\nOpen a shell into one of the config server containers (e.g., `configs1`):\n\n```sh\ndocker exec -it configs1 mongosh --port 27017\n```\n\nThen, initialize the replica set by executing:\n\n```javascript\nrs.initiate({\n  _id: \"cfgrs\",\n  configsvr: true,\n  members: [\n    { _id: 0, host: \"configs1:27017\" },\n    { _id: 1, host: \"configs2:27017\" },\n    { _id: 2, host: \"configs3:27017\" }\n  ]\n})\n```\n\nWait until the replica set initialization completes successfully.\n\n---\n\n### 3. Initialize the Shard Replica Sets\n\n#### For Shard 1 (`shard1rs`):\n\nConnect to one of the shard1 containers:\n\n```sh\ndocker exec -it shard1s1 mongosh --port 27017\n```\n\nInitialize the shard replica set:\n\n```javascript\nrs.initiate({\n  _id: \"shard1rs\",\n  members: [\n    { _id: 0, host: \"shard1s1:27017\" },\n    { _id: 1, host: \"shard1s2:27017\" }\n  ]\n})\n```\n\n#### For Shard 2 (`shard2rs`):\n\nConnect to one of the shard2 containers:\n\n```sh\ndocker exec -it shard2s1 mongosh --port 27017\n```\n\nInitialize the replica set:\n\n```javascript\nrs.initiate({\n  _id: \"shard2rs\",\n  members: [\n    { _id: 0, host: \"shard2s1:27017\" },\n    { _id: 1, host: \"shard2s2:27017\" }\n  ]\n})\n```\n\n#### For Shard 3 (`shard3rs`):\n\nConnect to one of the shard3 containers:\n\n```sh\ndocker exec -it shard3s1 mongosh --port 27017\n```\n\nInitialize the replica set:\n\n```javascript\nrs.initiate({\n  _id: \"shard3rs\",\n  members: [\n    { _id: 0, host: \"shard3s1:27017\" },\n    { _id: 1, host: \"shard3s2:27017\" }\n  ]\n})\n```\n\nWait until each shard replica set is fully initialized.\n\n---\n\n### 4. Add the Shards to the Cluster via Mongos\n\nFirst, connect to the mongos container:\n\n```sh\ndocker exec -it mongos mongosh --port 27017\n```\n\nThen, add each shard to the cluster:\n\n```javascript\nsh.addShard(\"shard1rs/shard1s1:27017,shard1s2:27017\")\nsh.addShard(\"shard2rs/shard2s1:27017,shard2s2:27017\")\nsh.addShard(\"shard3rs/shard3s1:27017,shard3s2:27017\")\n```\n\nTo verify that the shards were added correctly, run:\n\n```javascript\nsh.status()\n```\n\nYou should see all three shards listed along with their respective configuration details.\n\n---\n\n### 5. Enable Sharding on the Application Database\n\nIn the mongos shell, execute the following commands to enable sharding on your target database and shard the collection:\n\n```javascript\n// Enable sharding on the 'exampleDB' database.\nsh.enableSharding(\"exampleDB\")\n\n// Shard the 'files' collection on the 'filename' field.\nsh.shardCollection(\"exampleDB.files\", { filename: 1 })\n```\n\nMongoDB will now manage the `exampleDB.files` collection by splitting its data into chunks based on the `filename` shard key. As data grows or chunk splits/migrations occur, the balancer will distribute the data among the available shards.\n\n---\n\n## Note\n\nAfter running your Python program (`main.py`) repeatedly (which inserts documents and performs queries), the output of `sh.status()` in the mongos shell might look similar to the following JSON excerpt:\n\n```\ndatabases\n[\n  {\n    database: { _id: 'config', primary: 'config', partitioned: true },\n    collections: {\n      'config.system.sessions': {\n        shardKey: { _id: 1 },\n        unique: false,\n        balancing: true,\n        chunkMetadata: [ { shard: 'shard1rs', nChunks: 1 } ],\n        chunks: [\n          { min: { _id: MinKey() }, max: { _id: MaxKey() }, 'on shard': 'shard1rs', 'last modified': Timestamp({ t: 1, i: 0 }) }\n        ],\n        tags: []\n      }\n    }\n  },\n  {\n    database: {\n      _id: 'exampleDB',\n      primary: 'shard3rs',\n      version: {\n        uuid: UUID('8884af4d-21e0-47c5-984c-35a5d3b46611'),\n        timestamp: Timestamp({ t: 1738569282, i: 2 }),\n        lastMod: 1\n      }\n    },\n    collections: {\n      'exampleDB.files': {\n        shardKey: { filename: 1 },\n        unique: false,\n        balancing: true,\n        chunkMetadata: [\n          { shard: 'shard2rs', nChunks: 1 },\n          { shard: 'shard3rs', nChunks: 1 }\n        ],\n        chunks: [\n          { min: { filename: MinKey() }, max: { filename: '53aae26ed6e52b77447c0246957de2f490f16aefb6032993e92ab96c0983188fe2821509b5acc9af19cbb4c37f356df5e027ca89b86cb4bffa8dc3900260526402237c080a64413201f32eb333ca0d53cb22ddbcf74ca0e876b0db97692b1e485d1ee74939632096d9e911efd5ba06f985fa1c3534fab9a3e111531f096714d2.jpg' }, 'on shard': 'shard2rs', 'last modified': Timestamp({ t: 2, i: 0 }) },\n          { min: { filename: '53aae26ed6e52b77447c0246957de2f490f16aefb6032993e92ab96c0983188fe2821509b5acc9af19cbb4c37f356df5e027ca89b86cb4bffa8dc3900260526402237c080a64413201f32eb333ca0d53cb22ddbcf74ca0e876b0db97692b1e485d1ee74939632096d9e911efd5ba06f985fa1c3534fab9a3e111531f096714d2.jpg' }, max: { filename: MaxKey() }, 'on shard': 'shard3rs', 'last modified': Timestamp({ t: 2, i: 1 }) }\n        ],\n        tags: []\n      }\n    }\n  }\n]\n```\n\n**Explanation:**\n\n- **Data Distribution:**  \n  - The `exampleDB.files` collection is split into two chunks. One chunk resides on **shard2rs** and the other on **shard3rs**.  \n  - The `config.system.sessions` collection is stored on **shard1rs**. MongoDB automatically places system collections (like session data) on one of the shards.\n\n- **Primary Shard for `exampleDB`:**  \n  The primary shard for `exampleDB` is reported as **shard3rs**. This primary designation affects where unsharded collections are created and can influence initial chunk allocation for sharded collections.\n\n- **Autosplit \u0026 Balancer:**  \n  Autosplit is enabled, and the balancer is working (or has worked) to distribute data as chunks grow or are split, although in this example, most application data ended up in shard2rs and shard3rs.\n\n---\n\n## Summary\n\nThis documentation explains how to set up a Docker-based sharded MongoDB cluster and how to interact with it using Python. The steps include:\n\n1. Booting the containers with `docker-compose`.\n2. Initializing the config server replica set (`cfgrs`).\n3. Initializing three shard replica sets (`shard1rs`, `shard2rs`, `shard3rs`) with two nodes each.\n4. Adding the shards to the cluster via a mongos router.\n5. Enabling sharding on the `exampleDB` database and sharding the `files` collection on the `filename` field.\n\nThe provided `sh.status()` output demonstrates how data is distributed across the shards—with system collections on **shard1rs** and application data in **exampleDB.files** split between **shard2rs** and **shard3rs**.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknightchaser%2Fmongodbshardingexample","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fknightchaser%2Fmongodbshardingexample","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fknightchaser%2Fmongodbshardingexample/lists"}