{"id":23435279,"url":"https://github.com/erikpelli/bigmetric","last_synced_at":"2026-04-11T00:08:51.824Z","repository":{"id":137824250,"uuid":"574671251","full_name":"ErikPelli/BigMetric","owner":"ErikPelli","description":"Scalable system to collect data from multiple temperature sensors using Spring Boot","archived":false,"fork":false,"pushed_at":"2022-12-17T23:13:54.000Z","size":649,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-15T10:28:32.664Z","etag":null,"topics":["cassandra","cluster","docker","grafana","java","kafka","microservices","spring"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ErikPelli.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-05T20:33:25.000Z","updated_at":"2023-01-24T12:50:29.000Z","dependencies_parsed_at":"2023-04-06T09:01:03.465Z","dependency_job_id":null,"html_url":"https://github.com/ErikPelli/BigMetric","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ErikPelli%2FBigMetric","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ErikPelli%2FBigMetric/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ErikPelli%2FBigMetric/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ErikPelli%2FBigMetric/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ErikPelli","download_url":"https://codeload.github.com/ErikPelli/BigMetric/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248077007,"owners_count":21043876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cassandra","cluster","docker","grafana","java","kafka","microservices","spring"],"created_at":"2024-12-23T12:50:11.604Z","updated_at":"2025-12-30T23:05:51.553Z","avatar_url":"https://github.com/ErikPelli.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BigMetric\n\n![Dashboard](/../images/dashboard.jpg?raw=true \"Dashboard\")\n\n## Introduction\nBigMetric is a Proof of Concept to test a scalable network of temperature sensors located worldwide.\nTheoretically, if implemented correctly, it should be able to handle up to 1 million devices,\nthat send their °C temperature every ~10 seconds.\n![BigMetric schema](/../images/bigmetric-schema.png?raw=true \"BigMetric schema\")\n\n## Deployment\nIn my PoC the infrastructure deployment is rough, everything runs in Docker containers on a local\nmachine and therefore cannot be expanded too much.\n\n### Scalability\n#### On Premise\nIf you opt for a good on-premise deployment, there are many opportunities for scalability using\nKubernetes when the sensor network grows.\nSince each Spring microservice is independent of each other, with Kubernetes simply add more\nof them, and they will take the input data from the Apache Kafka cluster.\nThen, simply add more Cassandra and Kafka nodes to their clusters, as the performance increase\nis nearly linear when you add more nodes (horizontal scaling).\n\n#### Cloud\nWe can migrate this PoC to use cloud providers services and have an easy way to automatically scale\nbased on load.\nThe possible solution is to use a managed Kubernetes service such as GKE to manage the deployment of Java\nSpring microservices, or even a serverless solution, if it's cheaper, in that way it will be executed only\nwhen there are data to process.\nAlso, instead of a Cassandra cluster we can opt for something similar, such as Google BigTable\nor Amazon DynamoDB, and Google Cloud Pub/Sub can replace Apache Kafka.\n\n## Design choices\n### Database\nThe data that needs to be saved in the database is basically time series data, and therefore it is necessary\nto handle such data in our application.\nIn addition, as per the initial specifications, the application must be scalable and handle up to 1 million\nsensors simultaneously.\n\nTherefore, Apache Cassandra seems to be a good compromise between the two, if the cluster capacity is saturated,\njust add another node and the performance increases ([Here](https://docs.datastax.com/en/tutorials/Time_Series.pdf)\nis a good guide to design a time series DB).\n\n### Data Processing\nThere is a need for middleware that handles the data collected from the sensors, filters it, processes it\nby converting it to a compatible format, and saves it to the database.\nIt was decided to adopt a filter on temperature values in °C, which can be set in the environment variables\nand allows discarding outliers that are likely to come from an incorrect reading.\nIt was decided to take an approach where there are multiple deployments of a microservice written in\nJava Spring (but other languages/frameworks are fine as well) and the data is split between instances to\nincrease throughput.\n\n### Data Transfer\nThere could be several ways to transfer data from the various sensors to the service responsible for processing\nthem, including RPC, a REST API, or a message queue.\nIn this case we can use Apache Kafka as a message queue, and it looks good because it is asynchronous and allows\nhandling large amounts of data, with the possibility of adding more nodes to increase the capability.\n\nAn alternative would be to use a load balancer with a REST API to connect the sensor with the microservice that\nprocesses the data, but this would be more complex to manage in the long run and does not scale as well as in a\nmessage queue, which allows the data producer to be separated from the consumer.\n\nTo compensate for the time between the sensor sending the data and when it is saved by the processing service,\nit is convenient to include the timestamp directly in the message sent in the queue, so you can accurately track\nwhen the temperature was actually collected.\n\n## Build \u0026 Run\nBuild the Java (Spring Boot) microservice using Gradle:\n```\n./gradlew build\n```\n\nMake sure no other copies of the app are running (`docker ps` and `docker rm -f \u003cids\u003e`).\n\nStart the application stack:\n```\ndocker compose up\n```\n\n## Data Visualization\nGrafana was chosen to display some graph on changes in sensor data over time.\nThe tool takes data directly from the Cassandra cluster using some CQL (Cassandra Query Language)\nqueries written for this project, over the time frame considered.\n\nTo access Grafana UI:\n- Port 3001 in the browser (`localhost:3001`)\n- Username \u0026 password: `admin` `admin`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferikpelli%2Fbigmetric","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferikpelli%2Fbigmetric","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferikpelli%2Fbigmetric/lists"}