{"id":25877649,"url":"https://github.com/tharuntejad/integer-id-generation-at-scale-using-kubernetes","last_synced_at":"2026-04-09T16:08:07.172Z","repository":{"id":278724460,"uuid":"936568080","full_name":"tharuntejad/Integer-Id-generation-at-scale-using-Kubernetes","owner":"tharuntejad","description":"High-throughput integer ID generation service using the Snowflake algorithm on Kubernetes (k3s). ","archived":false,"fork":false,"pushed_at":"2025-02-22T13:45:04.000Z","size":59,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-02T11:33:41.656Z","etag":null,"topics":["docker","fastapi","go","i","k3s","kuber","python","twitter-snowflake-algorithm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tharuntejad.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-21T10:02:07.000Z","updated_at":"2025-02-22T15:26:39.000Z","dependencies_parsed_at":"2025-02-22T14:36:00.066Z","dependency_job_id":null,"html_url":"https://github.com/tharuntejad/Integer-Id-generation-at-scale-using-Kubernetes","commit_stats":null,"previous_names":["tharuntejad/-integer-id-generation-at-scale-using-kubernetes","tharuntejad/integer-id-generation-at-scale-using-kubernetes"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tharuntejad/Integer-Id-generation-at-scale-using-Kubernetes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tharuntejad%2FInteger-Id-generation-at-scale-using-Kubernetes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tharuntejad%2FInteger-Id-generation-at-scale-using-Kubernetes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tharuntejad%2FInteger-Id-generation-at-scale-using-Kubernetes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tharuntejad%2FInteger-Id-generation-at-scale-using-Kubernetes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tharuntejad","download_url":"https://codeload.github.com/tharuntejad/Integer-Id-generation-at-scale-using-Kubernetes/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tharuntejad%2FInteger-Id-generation-at-scale-using-Kubernetes/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274006266,"owners_count":25206111,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","fastapi","go","i","k3s","kuber","python","twitter-snowflake-algorithm"],"created_at":"2025-03-02T11:27:50.726Z","updated_at":"2026-04-09T16:08:07.143Z","avatar_url":"https://github.com/tharuntejad.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Integer Id generation at scale using Kubernetes \n\n\nAs you may know, **integer IDs** offer several advantages:\n\n- **Smaller Storage** – `4-8 bytes` vs. `16 bytes` for UUIDs.\n- **Faster Indexing \u0026 Lookups** – Improves database performance.\n- **Better Readability** – Easier to read, debug, and reference.\n- **Efficient Joins** – Speeds up foreign key relationships.\n- **Auto-incrementing** – Maintains order and predictability.\n\n#### **Scaling Challenge**\nGenerating integer IDs at scale can be tricky. Twitter solved this with **Snowflake**, a **64-bit time-sortable ID** system capable of **4 billion IDs/sec** using **32 workers across 32 data centers**. However, this setup is overkill for most cases, where **1–100 million IDs/sec** is more than enough.\n#### Approach\nIn this project, we’ll run a **Go service (optionally Python/FastAPI)** on Kubernetes, where **data centers** are replaced by **nodes** and **workers** by **pods/replicas**. For example, a **32-node cluster** with **32 pods per node** can reach **1024 pods**—matching Twitter’s Snowflake scale.\n\n**Estimated ID generation rates:**\n\n- **1024 replicas → ~4B IDs/sec**\n- **32 replicas → ~128M IDs/sec**\n- **4 replicas → ~16M IDs/sec**\n\nJust **deploy the service on Kubernetes** and scale replicas as needed.\n\n**Note**\n- **Never exceed 1024 replicas** because Snowflake uses 10 bits for worker IDs (0–1023).\n- We use a **StatefulSet**—not a **Deployment**—because **StatefulSets** guarantee stable, ordinal pod names (e.g., `pod-0`, `pod-1`). These **ordinal numbers** act as each pod’s **unique worker ID** in the Snowflake algorithm, ensuring no ID collisions across pods.\n- During the course of this project we will be using a **k3s** cluster for local testing, as it provides a lightweight Kubernetes environment to easily set up and validate the entire configuration.\n## Prerequisites  \nBefore using this project, it's recommended to have knowledge of:  \n- **Basic Python and Go Environment Setup**: Understanding how to set up and run Python and Go projects.  \n- **Twitter Snowflake Algorithm**: How unique Integer IDs are generated.  \n- **Docker**: Understanding containerization.  \n- **Kubernetes (k3s)**: For deployment and scaling.  \n## Project Structure\nBelow is the structure of the project, along with a description of what each file and folder represents.\n```txt\n./          \n├── id-generator/     # id service implemented in python/fastapi\n├── id-generator-go/  # id service implemented in go\n├── kube/             # Dir containing all k8s config files\n│   ├── ingress.yaml            \n│   ├── namespace.yaml         \n│   ├── service.yaml             \n│   └── statefulset.yaml        \n├── testing/           # Dir containing files for testing the service \n│   ├── database.py     # define sqllite db to temply store ids during load tests\n│   ├── generated_ids.db  # sqllite db\n│   ├── load_test.py      # load test the service\n│   ├── service_test.py   # verify service health\n│   └── snowflake_test.py   # Demonstrates how Snowflake ID generation logic\n├── commands.md                         # All project related commands\n└── readme.md                           # Project overview and setup instructions\n\n```\n## ID Generation Service - Quick Overview\n\n#### Environment Variables\n- **Node name, Pod UID, and Pod name** are injected by Kubernetes into the containers (configured in `statefulset.yaml`).\n- **Node name** and **Pod UID** are optional and used only for debugging.\n#### Machine ID Extraction\n\nFor Snowflake ID generation to work correctly, **each worker must have a unique ID**. In our setup, this is achieved through a **stateful Kubernetes cluster** with **ordinal pod naming** (e.g., `id-generation-0` to `id-generation-1023`).\n\nKubernetes guarantees **unique pod names**, allowing us to use the pod's ordinal number as the **instance/machine/worker ID**. This ID is then used to **initialize the Snowflake generator** and serve incoming requests reliably.\n```python\n# Extract the machine ID from the pod name  \nMACHINE_ID = int(re.search(r\"\\d+\", POD_NAME).group())  \n  \n# Initialize snowflake id generator  \ninteger_id_generator = SnowflakeGenerator(instance=MACHINE_ID, epoch=EPOCH)\n```\n#### Endpoints\nWhen requested, the Snowflake generator produces unique IDs.\n```python\n@app.get(\"/generate-id\")  \ndef generate_id_integer():  \n    \"\"\"Generate a Snowflake-based integer ID.\"\"\"  \n    return {\"id\": next(integer_id_generator)}\n```\n\n#### ⚠️ Collisions\n- **Cross-pod collisions are impossible** because each pod has a unique **worker ID** (derived from the pod’s ordinal name).\n- **Same-pod collisions are possible** in the **Python** implementation because the `snowflake-id` package is **not thread-safe**. Under high concurrency, two threads within one pod could generate the same ID (though this is rare in Python due to the GIL).\nIn contrast, the **Go package** [`github.com/bwmarrin/snowflake`](https://github.com/bwmarrin/snowflake) is **thread-safe** and ideal for multithreaded environments.\n\n👉 **Recommendation:**\n\n- Use **Python** for testing and learning.\n- Use **Go** for production or high-concurrency scenarios.\n## Setup Process\n\n### **1. Clone the Repository**\n```bash\ngit clone \u003crepo-url\u003e\ncd integer-id-generation-at-scale-using-kubernetes\n```\n### **2. Setup Environment**\nThe service is implemented in both **Python** and **Go**. You can choose which version to run based on your preference.\n\n**Performance Note:**  \nLocal testing shows the **Go implementation is approximately 2x faster** than the Python version.\n#### **Working with Python (FastAPI)**\nIf you choose the **Python/FastAPI** service:\n1. **Navigate** to the Python service directory.\n2. **Set up a virtual environment** for isolated dependencies.\n3. **Install required packages** from `requirements.txt`.\n4. **Run the service** locally.\n5. **Test the service** via:\n    - Swagger docs: [http://localhost:8000/docs](http://localhost:8000/docs)\n    - Running the test script: `testing/service_test.py`\n```bash\ncd ./id-generator\n\n# Create and activate a virtual environment\npython3.11 -m venv venv\nsource venv/bin/activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Run the service locally\nuvicorn id-generator.main:app --reload --host \"0.0.0.0\" --port 8000\n```\n#### **Working with Go**\nIf you choose the **Go** service:\n1. **Ensure Go is installed** on your machine.\n2. **Navigate** to the Go service directory.\n3. **Install dependencies** using `go mod tidy`.\n4. **Run the service** locally for quick testing.\n5. **Build the binary** for production use and run it.\n6. **Test the service** by running `testing/service_test.py`.\n```bash\ncd ./id-generator-go\n\n# Install dependencies\ngo mod tidy\n\n# You can test go service by running it directly or building the binary and then running it\n# Run the service locally\ngo run main.go\n\n# Build and run the binary\ngo build -o id-generator\n./id-generator\n```\n\n**Testing:**  \nYou can verify both versions by running: `testing/service_test.py`\nOr by visiting the **Swagger documentation** for the Python service at:  \n[http://localhost:8000/docs](http://localhost:8000/docs)\n\n### **3. Build and Push Docker Images**\n```bash\n# Decide which service to use: Go or Python (this example uses Go)\n\n# Build the Docker image using the Go service\ndocker build -t id-generator ./id-generator-go/\n\n# Re-tag the image for the local registry\ndocker tag id-generator localhost:5001/id-generator\n\n# Run a local Docker registry (if not already running)\ndocker run -d -p 5001:5000 --name local-registry registry:2\n\n# Push the image to the local registry\ndocker push localhost:5001/id-generator\n```\n\n **Notes:**\n\n- Replace `./id-generator-go/` with `./id-generator/` if you are using the Python service.\n- The local registry allows Kubernetes to pull images without external dependencies.\n### **4. Install k3s**\n```bash\ncurl -sfL https://get.k3s.io | sh -\n```\n\n\n**Why k3s?**\n- Lightweight Kubernetes distribution for local and edge deployments.\n- Simple to install with minimal resource requirements.\n### **5. Verify k3s Installation**\n```bash\nsudo kubectl get nodes\n```\n\n**Expected Output:**  \nYou should see the node in a **\"Ready\"** state, indicating that k3s is successfully installed and running.\n```bash\nNAME        STATUS   ROLES                  AGE     VERSION\nyour-node   Ready    control-plane,master   5m      v1.xx.x+k3s\n```\n### **6. Configure k3s to Use Local Registry**\n\nEdit `/etc/rancher/k3s/registries.yaml` and add:\n\n```yaml\nmirrors:\n  \"localhost:5001\":\n    endpoint:\n      - \"http://localhost:5001\"\n```\n\nRestart k3s:\n```bash\nsudo systemctl restart k3s\n```\n### **7. Deploy Services to k3s**\n```bash\ncd ./kube\nsudo kubectl apply -f namespace.yaml\nsudo kubectl apply -f statefulset.yaml\nsudo kubectl apply -f service.yaml\nsudo kubectl apply -f ingress.yaml\n```\n\n### **8.  Monitoring \u0026 Scaling**\n```bash\n# View the pods in the cluster\nsudo kubectl get pods -n id-system\n\n# Scale the no of pods/replicas of our stateful set service in the cluster\nsudo kubectl scale statefulset id-generator --replicas=2 -n id-system\n\n# View the logs\nsudo kubectl logs statefulset/id-generator -n id-system --all-containers\n```\n### **9. Access the Services**\nYou can access the **ID Generation Service** at:   `http://localhost:80` or `http://localhost/`\n\n**For FastAPI (Python) Service:**  \nIf you deployed the FastAPI version, the interactive API documentation is available at: `http://localhost:80/docs` or `http://localhost/docs`\n\n### **10. Cleanup** \n```bash\n# remove k8s services  \ncd ./kube  \nsudo kubectl delete -f statefulset.yaml  \nsudo kubectl delete -f service.yaml  \nsudo kubectl delete -f ingress.yaml  \nsudo kubectl delete -f namespace.yaml  \n  \n  \n# Verify services are removed  \nsudo kubectl get all -n id-system  \n  \n  \n# Stop and remove local docker registry  \ndocker stop local-registry  \ndocker rm local-registry  \n  \n# Remove images\ndocker image rm localhost:5001/id-generator  \ndocker image rm id-generator\n```\n**Note:** For a complete list of project-related commands, refer to the **`commands.md`** file.\n## Rate Estimations\n- The **theoretical ID generation rate** of the **Twitter Snowflake** algorithm is approximately **4.19 billion IDs per second**.\n- With **1024 replicas** (maximum allowed machine IDs), we achieve a **similar rate of ~4 billion IDs per second**.\n- A **32-replica deployment** yields **~128 million IDs per second**, which is sufficient for most use cases.\n- The ID generator remains functional for **69 years** from the chosen epoch.\n\n In local tests on a 32 GB RAM Intel i7 machine with 4 pods , Python reached ~4,000 IDs/sec and Go ~9,000 IDs/sec. Actual performance may differ in production due to resource contention when both service and tests run on the same machine.\n### ⚠️ **Practical Considerations:**\n\n\u003e While theoretical rates are impressive, **real-world performance may vary** due to several factors:\n\n- **Network Latency \u0026 Routing Overhead:**  \n    Communication delays between clients, load balancers, and service instances can reduce the effective generation rate.\n    \n- **Resource Contention:**  \n    Deploying multiple pods on the **same node** leads to **CPU and memory sharing**, potentially reducing performance under heavy loads.\n    \n- **Scaling Across Nodes:**  \n    Distributing pods across **multiple nodes** with **fewer but adequately resourced pods per node** improves stability and overall throughput.\n    \n**Recommendation:**  \nTo achieve higher throughput and stable performance:\n\n- **Scale horizontally** across multiple nodes.\n- **Avoid overcrowding** a single node with too many pods.\n- Use **resource limits** in Kubernetes to prevent excessive resource contention.\n\n## Conclusion\n\nThis project demonstrates how to deploy a Snowflake-based ID generation service at scale using Kubernetes (k3s) and a StatefulSet for stable, unique worker IDs. For most real-world production needs, the Go version is recommended due to its thread safety and better performance.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftharuntejad%2Finteger-id-generation-at-scale-using-kubernetes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftharuntejad%2Finteger-id-generation-at-scale-using-kubernetes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftharuntejad%2Finteger-id-generation-at-scale-using-kubernetes/lists"}