{"id":29682100,"url":"https://github.com/goamegah/flowstate","last_synced_at":"2026-05-06T18:36:48.075Z","repository":{"id":281741856,"uuid":"946135598","full_name":"goamegah/flowstate","owner":"goamegah","description":"End-To-End Real-time Road Traffic Monitoring Spark Structured Streaming solution","archived":false,"fork":false,"pushed_at":"2025-07-22T20:21:09.000Z","size":54444,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-22T20:37:30.042Z","etag":null,"topics":["airflow","aws","aws-kinesis","dags","kafka","postgresql","realtime","scala","spark-core","spark-sql","spark-streaming","streaming"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/goamegah.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-10T17:02:49.000Z","updated_at":"2025-07-22T20:21:13.000Z","dependencies_parsed_at":"2025-03-10T23:25:56.612Z","dependency_job_id":"c0596dff-068a-4491-a8a0-92c098e9258e","html_url":"https://github.com/goamegah/flowstate","commit_stats":null,"previous_names":["goamegah/realtime-traffic-monitor","goamegah/flowtrack","goamegah/flowstate"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/goamegah/flowstate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goamegah%2Fflowstate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goamegah%2Fflowstate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goamegah%2Fflowstate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goamegah%2Fflowstate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/goamegah","download_url":"https://codeload.github.com/goamegah/flowstate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/goamegah%2Fflowstate/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266604009,"owners_count":23954725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","aws","aws-kinesis","dags","kafka","postgresql","realtime","scala","spark-core","spark-sql","spark-streaming","streaming"],"created_at":"2025-07-23T02:08:00.940Z","updated_at":"2026-05-06T18:36:48.062Z","avatar_url":"https://github.com/goamegah.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eNear Real-time Road Traffic Monitoring Solution\u003c/h1\u003e\n\nFlowState is a near real-time road traffic monitoring solution that leverages Apache Spark, Apache Airflow, and Docker to process and analyze traffic data. The project is designed to handle large volumes of data efficiently, providing insights into traffic patterns and conditions.\nThis solution is built to be scalable and robust, making it suitable for real-world applications in traffic management and urban planning.\n\nData is collected from the [Rennes Metropole API](https://data.rennesmetropole.fr/explore/dataset/etat-du-trafic-en-temps-reel/information/), which provides real-time traffic data. The solution processes this data to extract meaningful insights, such as traffic flow and congestion levels, and stores the results in a structured format for further analysis.\n\nFor more details, see the [Documentation](docs/index.md).\n\nHere's reference architecture of the project:\n![Reference Architecture](./assets/arch.png)\n\n\n## Table of Contents\n- [Technology Stack](#technology-stack)\n- [Prerequisites](#prerequisites)\n- [Setup](#setup)\n\n### Technology Stack\n\n- **Stream Processing**: Apache Spark 4.0.0\n- **Orchestration**: Apache Airflow 2.6.0\n- **Database**: PostgreSQL 13\n- **UI Framework**: Streamlit\n- **Build Tool**: SBT with Assembly plugin\n- **Containerization**: Docker \u0026 Docker Compose\n- **Language**: Scala 2.13.16\n\n### Prerequisites\nBefore you begin, ensure you have the following software installed:\n\n- **Docker**: [Install Docker](https://docs.docker.com/engine/install/)\n\n### Setup\n\n1. **Clone the repository**:\n```bash\ngit clone git@github.com:goamegah/flowstate.git\ncd flowstate\n```\n\n2. **Rename the `dotenv.txt` file to `.env`**:\n```bash\nmv dotenv.txt .env\n```\n\n2. **Create 3 folders**:\n```bash\nmkdir -p shared/data/transient # for intermediate data loading\nmkdir -p shared/data/raw # for raw data loading from transient folder\nmkdir -p shared/checkpoint # used by Spark for checkpointing\nmkdir -p shared/jars # for Spark application JAR file\n```\n\n`shared` is a bind volume that is mounted to the docker container to display the raw files in your IDE.\n\n2. **Run the Docker Compose**:\n```bash\ndocker compose up -d\n```\n\n3. **Go to airflow web UI**:\n```bash\nhttp://localhost:8080\n```\n\nYou well need to create a connection to the API with the following parameters:\n- **Conn Id**: traffic_api\n- **Conn Type**: HTTP\n- **Host**: https://data.rennesmetropole.fr/\n\n![alt text](assets/airflow_admin_connections.png)\n\n![alt text](assets/airflow_admin_connections_api.png)\n\nThis connection is used to check the API availability\n\nAfter setting up the connection, you can see following 3 DAGs that you can run one after another:\n- **pl_load_flowstate_raw_files**\n\n![alt text](assets/pl_load_raw_file.jpeg)\nThis DAG loads the raw data from the Rennes Metropole API into the raw folder. It is scheduled to run every 1 minutes.\n\n- **pl_run_flowstate_mainapp_dag**\n\n![alt text](assets/pl_run_main_app.jpeg)\nThis DAG run the main application that processes the raw data and stores the results in the data warehouse. It's not scheduled to run automatically, you can trigger it manually from the Airflow UI.\n\n- **[Optional] pl_clean_up_flowstate_folders_dag**: \n\n![clean up pipeline](assets/pl_clean_up.jpeg)\nDAG that cleans up the data from raw, transient and checkpoint folders.\n\n4. **Check the results in the Streamlit app web UI**:\n```bash\nhttp://localhost:8501\n```\n\n![alt text](assets/flowtrack_history.jpeg)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoamegah%2Fflowstate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoamegah%2Fflowstate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoamegah%2Fflowstate/lists"}