{"id":15163330,"url":"https://github.com/daleonpz/iot_cloud_infrastructure","last_synced_at":"2026-02-03T16:37:39.174Z","repository":{"id":250465216,"uuid":"833196336","full_name":"daleonpz/iot_cloud_infrastructure","owner":"daleonpz","description":"A Docker Compose-based IoT data pipeline for local development, featuring MQTT, MinIO, Cassandra, FastAPI, and Airflow for easy testing and expansion.","archived":false,"fork":false,"pushed_at":"2024-12-28T15:21:47.000Z","size":384,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-07T04:28:17.053Z","etag":null,"topics":["airflow","data-pipeline","docker-compose","iot","iot-application","iot-platform","mqtt","s3"],"latest_commit_sha":null,"homepage":"","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daleonpz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-24T14:42:28.000Z","updated_at":"2025-03-05T07:37:22.000Z","dependencies_parsed_at":"2024-09-13T19:16:38.026Z","dependency_job_id":"ccda0970-28ba-4cca-9f96-86520d648629","html_url":"https://github.com/daleonpz/iot_cloud_infrastructure","commit_stats":{"total_commits":6,"total_committers":2,"mean_commits":3.0,"dds":0.5,"last_synced_commit":"1974694a53006492fa7f463355d7f59c2ce2251b"},"previous_names":["daleonpz/iot_cloud_test","daleonpz/iot_cloud_infrastructure"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/daleonpz/iot_cloud_infrastructure","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daleonpz%2Fiot_cloud_infrastructure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daleonpz%2Fiot_cloud_infrastructure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daleonpz%2Fiot_cloud_infrastructure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daleonpz%2Fiot_cloud_infrastructure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daleonpz","download_url":"https://codeload.github.com/daleonpz/iot_cloud_infrastructure/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daleonpz%2Fiot_cloud_infrastructure/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29049348,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T15:43:47.601Z","status":"ssl_error","status_checked_at":"2026-02-03T15:43:46.709Z","response_time":96,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","data-pipeline","docker-compose","iot","iot-application","iot-platform","mqtt","s3"],"created_at":"2024-09-27T02:23:23.104Z","updated_at":"2026-02-03T16:37:39.145Z","avatar_url":"https://github.com/daleonpz.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IoT \"Cloud\" Data Pipeline\nThis repository helps you understand the basic components needed to build a data pipeline for IoT data and how they work together. Use this setup to test individual components or see how they function as a complete system. You can also expand this setup to create a more complex pipeline and deploy it to cloud platforms like AWS, Azure, or Google Cloud.\n\nI chose Docker Compose for local deployment to focus on understanding the components and their interactions without the complexity of cloud providers. This approach also makes it easy to share the setup and run it on any machine with minimal effort.\n\nThe pipeline and infrastructure include:\n\n- MQTT Broker\n- MQTT Agent/Application\n- Data Lake (MinIO)\n- Database (Cassandra)\n- REST API (FastAPI)\n- Orchestration (Airflow)\n- Transformation (ELT)\n\nThe components are connected as follows:\n\n![Architecture](images/arch.png)\n\n1. The **MQTT Broker** is the entry point for the data. It receives data from the IoT devices and publishes it to a topic.\n2. The **MQTT Agent** subscribes to the topic and writes it to the **Data Lake**.\n3. The **Data Lake** stores raw data and acts as the source for the **Transformation** component.\n4. The **Transformation** reads raw data from the **Data Lake**, processes it, and writes it to the **Database**. **Airflow** is used to orchestrate the workflow.\n\nOnce you have clean data in the database, you can use it for analytics, machine learning, or other applications.\n\n## Table of Contents\n\n[Prerequisites](#prerequisites)  \n[Component Testing](#component-testing)\n- [MQTT Broker](#mqtt-broker)\n- [Data Lake (MinIO)](#data-lake-minio)\n- [Database (Cassandra)](#database-cassandra)\n- [REST API (FastAPI)](#rest-api-fastapi)\n- [Transformation (ELT)](#transformation-elt)\n\n[Integration Testing](#integration-testing)\n- [MQTT with Data Lake](#mqtt-with-data-lake)\n- [Airflow Workflow](#airflow-workflow)\n\n[Stopping and Cleaning Up](#stopping-and-cleaning-up)  \n\n[Good to know](#good-to-know)\n\n## Prerequisites\n\n- [Docker](https://docs.docker.com/get-docker/)\n- [Docker Compose](https://docs.docker.com/compose/install/)\n- Clone this repository\n\n```sh\ngit clone https://github.com/daleonpz/iot_cloud_test.git\ncd iot_cloud_test\n```\n\n## Component Testing\n### MQTT Broker\n\n1. Build and Run\n\n```sh\ncd mqtt\ndocker build -t my-broker .\ndocker run -d --name my-broker -p 1883:1883 my-broker\n```\n\n2. Test MQTT Broker\n\n- Subscribe to Topic\n\n```sh\ndocker exec -it my-broker mosquitto_sub -h localhost -t test\n```\n\n3. Publish to Topic\n\nIn another terminal:\n\n```sh\ndocker exec -it my-broker mosquitto_pub -h localhost -t test -m \"hello\"\n```\n\n### Data Lake (MinIO)\n\n1. Build and Run\n\n```sh\ncd datalake\ndocker build -t my-datalake .\ndocker run -d --name my-datalake -p 9000:9000 -e \"MINIO_ACCESS_KEY=minio\" -e \"MINIO_SECRET_KEY=minio123\" my-datalake server /data --console-address \":9001\"\n```\n\n2. Access Data Lake\n\nOpen http://localhost:9000 in your browser.\n\n- Access Key: minio\n- Secret Key: minio123\n\nIf not accessible via localhost, use the container's IP address:\n\n```sh\ndocker logs my-datalake\n```\n\n### Database (Cassandra)\n\n1. Build and Run\n\n```sh\ncd database\ndocker build -t my-db .\ndocker run -d --name my-db -p 9042:9042 my-db\n```\n\n2. Test Cassandra with cqlsh\n\n```sh\ndocker exec -it my-db cqlsh localhost\n```\n\n3. Run the following commands in cqlsh:\n\n```sql\n    CREATE KEYSPACE iot WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };\n    USE iot;\n    CREATE TABLE measurements (id UUID PRIMARY KEY, temperature float, battery_level float);\n    INSERT INTO measurements (id, temperature, battery_level) VALUES (uuid(), 25.0, 50.0);\n    SELECT * FROM measurements;\n```\n\n### REST API (FastAPI)\n\n1. Build and Run\n\n```sh\ncd restapi\ndocker build -t api .\ndocker run -d --name api -p 8000:8000 --link my-db:my-db api\n```\n\n2. Test API\n\nFor debugging:\n\n```sh\ndocker run -it --name api -p 8000:8000 --link my-db:my-db api bash\n```\n\n- Send Data to the Database\n\n```sh\ncurl -X GET \"http://localhost:8000/data/{id}\" -H \"accept: application/json\" -d '{\"temperature\": 25.0, \"battery_level\": 50.0}'\n```\n\n- Get Data from the Database\n\n```sh\ncurl -X POST \"http://localhost:8000/data/{id}\" -H \"accept: application/json\"\n```\n\n### Transformation (ELT)\n\n1. Build and Run\n\n```sh\ndocker-compose -f docker-compose.yml.etl_test up --build\n```\n\n2. Verify Data\n\n```sh\ndocker exec -it my-db cqlsh localhost\n```\n\n3. Run the following commands in cqlsh:\n\n```sql\nUSE iot;\nSELECT * FROM measurements;\n```\n\n## Integration Testing\n### MQTT with Data Lake\n    \n1. Build and Run\n\n```sh\ndocker-compose -f docker-compose.yml.mqtt_app_test up --build\n```\n\n2. Publish Test Data\n\n```sh\ncd mqtt/\npython mqtt_publisher_test.py\n```\n\n### Airflow Workflow\n\n1. Build and Run\n\n```sh\ndocker-compose -f docker-compose.yml up --build\n```\n\n2. Publish Test Data\n\n```sh\ncd mqtt/\npython mqtt_publisher_test.py\n```\n\n3. Access Airflow\n\nLog in to http://localhost:8080 with:\n\n- Username: airflow\n- Password: airflow\n\nTrigger the DAG:\n\n- Click on \"transform_data\" under the \"DAG\" tab.\n- Click \"Trigger DAG\" or the \"Play\" button.\n\n4. Verify Data\n\n```sh\ndocker exec -it my-db cqlsh localhost\n```\n\nRun the following commands in cqlsh:\n\n```sql\nUSE iot;\nSELECT * FROM measurements;\n```\n\n## Stopping and Cleaning Up\n\n1. Remove All Containers\n\n```sh\n./tools/delete_containers.sh\n```\n\n2. Delete All Images\n\n```sh\n./tools/delete_docker_images.sh\n```\n\n# Good to know\n- There is a `.env` file in the root directory that sets environment variables for the services. You can modify this file as needed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaleonpz%2Fiot_cloud_infrastructure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaleonpz%2Fiot_cloud_infrastructure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaleonpz%2Fiot_cloud_infrastructure/lists"}