{"id":22725637,"url":"https://github.com/livetocode/vehicle-fleet-poc","last_synced_at":"2025-03-29T23:42:52.050Z","repository":{"id":251433899,"uuid":"829690316","full_name":"livetocode/vehicle-fleet-poc","owner":"livetocode","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-17T01:20:52.000Z","size":1295,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-17T02:26:00.679Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/livetocode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-16T23:10:00.000Z","updated_at":"2025-02-17T01:20:56.000Z","dependencies_parsed_at":"2024-08-03T00:50:49.867Z","dependency_job_id":"0bac1a97-1432-4fda-affa-9ddcbabc50d1","html_url":"https://github.com/livetocode/vehicle-fleet-poc","commit_stats":null,"previous_names":["livetocode/vehicle-fleet-poc"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fvehicle-fleet-poc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fvehicle-fleet-poc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fvehicle-fleet-poc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livetocode%2Fvehicle-fleet-poc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/livetocode","download_url":"https://codeload.github.com/livetocode/vehicle-fleet-poc/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246258862,"owners_count":20748573,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-10T16:13:25.413Z","updated_at":"2025-03-29T23:42:52.029Z","avatar_url":"https://github.com/livetocode.png","language":"TypeScript","readme":"# Purpose\n\nThis is a sample project for exploring different ways of persisting Geospatial events for long term storage.\n\nThis is also an excuse to test various technologies to handle a massive amount of data.\nThis includes different programming languages, as well as cloud services.\n\n# Functional requirements\n\n- An organization would manage a fleet of vehicles that would report their current GPS position every 5 seconds.\n- The vehicles would remain in the area of a specific city and would typically be assigned to a district of the city.\n- The data should be archived for 7 years\n- The data should be queryable in order to find which cars where moving in a specific period, within a specific area of the city.\n\n# Estimates\n\nExecute the `estimate.py` script to compute the estimates in the [event-estimator](./event-estimator/) folder.\n\n# Architecture\n\n## Messages\n\n```\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| Topic                   | Consumer Group   | Event Type                 | Produced by    | Consumed by     | Usage                                    |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| generation              | generators       | start-generation           | viewer         | generator       | The viewer will initiate a generation    |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| generation.broadcast    |                  | stop-generation            | viewer         | generator       | The viewer will cancel any active        |\n|                         |                  |                            |                |                 | generation before starting a new one.    |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| generation.agent.*      |generation-agents | generate-partition         | generator      | generator       | The generator will partition its work    |\n|                         |                  |                            |                |                 | with its agents.                         |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| requests.collector      |requests-collector| clear-vehicles-data        | generator      | collector       | The generator will ask the collector to  |\n|                         |                  |                            |                |                 | clear all the files before starting the  |\n|                         |                  |                            |                |                 | generation.                              |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| inbox.generator.*       | generators       | generate-partition-stats   | viewer         | generator       | The viewer will initiate a generation    |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| inbox.generator.*       | generators       | clear-vehicles-data-result | collector      | generator       | The collector will report on the         |\n|                         |                  |                            |                |                 | completion of the destruction.           |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| commands.move.*         |                  | move                       | generator      | collector       | The collector will agregate the move     |\n|                         |                  |                            |                |                 | commands and persist them as a chunk.    |\n|                         |                  |                            |                +-----------------+------------------------------------------+\n|                         |                  |                            |                | viewer          | The viewer will display the move of      |\n|                         |                  |                            |                |                 | each vehicle.                            |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| commands.flush.*        |                  | flush                      | generator      | collector       | At the end of the generation, a flush    |\n|                         |                  |                            |                |                 | command is sent to force the collectors  |\n|                         |                  |                            |                |                 | to write the accumulated data.           |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| events                  |                  | aggregate-period-created   | collector      | viewer          | Every time a chunk of data is persisted  |\n|                         |                  |                            |                |                 | by the collector, some stats on the chunk|\n|                         |                  |                            |                |                 | will be sent to the viewer.              |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| events                  |                  | vehicle-generation-started | generator      | viewer          | Clear the map in the viewer when a new   |\n|                         |                  |                            |                |                 | generation begins.                       |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| query.vehicles          | vehicle-finder   | vehicle-query              | viewer         | finder          | A client is querying the persisted data. |\n|                         |                  |                            |                |                 | The finder will filter the chunks based  |\n|                         |                  |                            |                |                 | on the time period and the geohashes of  |\n|                         |                  |                            |                |                 | the polygon filter.                      |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n|query.vehicles.partitions| vehicle-finder-  | vehicle-query-partition    | finder         | finder          | The finder is delegating the file        |\n|                         | partitions       |                            |                |                 | processing to the cluster of finders.    |\n|                         |                  |                            |                |                 | The response will be sent using the      |\n|                         |                  |                            |                |                 | vehicle-query-partition-result-stats evt |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| inbox.viewer.\u003cUID\u003e      |                  | vehicle-query-result       | finder         | viewer          | While parsing the chunks, the finder     |\n|                         |                  |                            |                |                 | will send all the move commands that     |\n|                         |                  |                            |                |                 | match the criteria. The viewer will then |\n|                         |                  |                            |                |                 | be able to replay them.                  |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| inbox.viewer.\u003cUID\u003e      |                  | vehicle-query-result-stats | finder         | viewer          | Once the query is complete, the finder   |\n|                         |                  |                            |                |                 | will send some stats to the viewer, to   |\n|                         |                  |                            |                |                 | measure the performance and processing   |\n|                         |                  |                            |                |                 | that was required.                       |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n| inbox.finder.\u003cUID\u003e      |                  | vehicle-query-             | finder         | finder          | This is a partial result sent back to    |\n|                         |                  | partition-result-stats     |                |                 | the finder that delegated the            |\n|                         |                  |                            |                |                 | processing.                              |\n+-------------------------+------------------+----------------------------+----------------+-----------------+------------------------------------------+\n```\n\nNote that when a consumer uses a \"consumer group\" name, it means that the message will be handled only once by a member of the group.\nThis is a regular work queue with competing consumers, which is different from the default pub/sub case.\n\n### Collecting move commands\n\n```mermaid\nsequenceDiagram\n    viewer -\u003e\u003e+ generator: topic:generation/type:start-generation\n    activate generator-agent\n    generator -\u003e\u003e generator-agent: topic:generation-agent.*:generate-partition\n    loop generate move events\n        activate collector\n        generator-agent -\u003e\u003e collector: topic:commands.move.*/type:move\n        loop aggregate events\n            collector -\u003e\u003e collector: bufferize, then flush\n            collector --\u003e\u003e viewer: topic:events/type:aggregate-period-created\n        end\n        deactivate collector\n    end\n    generator-agent -\u003e\u003e generator: topic:inbox.generator.*/type:generate-partition-stats\n    deactivate generator-agent\n    note right of generator: force a flush at the end of the generation\n    generator -\u003e\u003e+ collector: topic:commands.flush.*/type:flush\n    collector --\u003e\u003e- viewer: topic:events/type:aggregate-period-created\n    generator -\u003e\u003e- viewer: topic:inbox.viewer.*/type:generation-stats\n```\n\n### Querying move commands\n\n#### Serialized processing\n\n```mermaid\nsequenceDiagram\n    viewer -\u003e\u003e+ finder: topic:query.vehicles/type:vehicle-query\n    loop read event files\n        loop for each matching position\n            finder -\u003e\u003e viewer: topic:inbox.viewer.*/type:vehicle-query-result\n        end\n    end\n    finder -\u003e\u003e- viewer: topic:inbox.viewer.*/type:vehicle-query-result-stats\n```\n\n#### Parallel processing\n\n```mermaid\nsequenceDiagram\n    viewer -\u003e\u003e+ finder: topic:query.vehicles/type:vehicle-query\n    par for each matching event file\n        finder -\u003e\u003e+ finder-agent: inbox.finder.*/type:vehicle-query-partition-result-stats\n        loop for each matching position\n            finder-agent -\u003e\u003e viewer: topic:inbox.viewer.*/type:vehicle-query-result\n        end\n        finder-agent -\u003e\u003e- finder: topic:inbox.finder.*/type:vehicle-query-partition-result-stats\n    end\n    finder -\u003e\u003e- viewer: topic:inbox.viewer.*/type:vehicle-query-result-stats\n```\n\n## Local dev\n\n### Features\n\n- web UI for viewing realtime data or query results\n- distributed services using a message broker\n- durable storage of events\n- distributed event generation with multiple instances for higher throughput\n- multiple data formats (parquet, csv, json, arrow)\n- multiple storage providers (filesystem, S3...)\n- flexible aggregation of events\n\n    - max window capacity\n    - concurrent time windows\n    - multiple data partitioning strategies (geohash, vehicle id...)\n\n- flexible search\n\n    - serialized or parallelized\n    - record limit\n    - timeout\n    - ttl\n    - time filters (time range)\n    - geoloc filters (polygons) using GeoJSON\n    - data filters (vehicle type)\n\n\n### Diagram\n\n\n```\n\n                            +------------+\n      Event Generator-------| Event Hub  |------Event Viewer\n                            | (NATS)     |\n                            +------------+\n                              |      |\n                        +-----+      +-----+\n                        |                  |\n                  Event collector       Event finder\n                   |    |                  |\n        +----------+  writes             reads\n        |               |                  |\n+-------------+      +-----------------------+\n| Event Store |      | Event Aggregate Store |\n| (In memory) |      | (File system or S3)   |\n+-------------+      +-----------------------+\n\n```\n\n### Technologies:\n\n- Event Hub: [NATS](https://nats.io/)\n- Event Generator: Typescript NodeJS\n- Event Viewer: Nuxt server, Typescript, PixiJS\n- Event Collector: Typescript NodeJS, https://pola.rs/, Parquet format, S3, geohash\n- Event Store: In memory, DuckDB\n- Event finder: Typescript NodeJS, geohash, turf\n\n## Cloud\n\n### Azure\n\nInspired by https://learn.microsoft.com/en-us/azure/stream-analytics/event-hubs-parquet-capture-tutorial\n\n#### Diagram\n\n```\n                                           +-----\u003e Event Viewer\n                                           |         + NuxtJS\nEvent Generator ==\u003e Event Hub -------------+         + PowerBI\n                     + Azure Event Hubs    |\n                                           +----\u003e Event Collector -------\u003e Event Aggregate Store \u003c------ Event finder\n                                                  + Azure Stream Analytics      + Azure Blob                 + Azure Synapse\n```\n\n#### Event aggregation\n\nSee https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-overview\nand https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-capture-enable-through-portal\nand https://learn.microsoft.com/en-us/azure/stream-analytics/event-hubs-parquet-capture-tutorial\n\n#### Querying\n\n##### Create a new Azure Synapse Analytics resource\n\nYou should create a Blob storage account with the following attributes:\n- Azure Blob Storage or Azure Data Lake Storage Gen 2\n- Enable hierarchical namespace = true\n\nThen you can create your new Azure Synapse Analytics resource and use the previously created Storage Account.\n\n##### Configure the external data source \n\nSee also the tutorial that demonstrates all the steps: https://www.youtube.com/watch?v=WMSF_ScBKDY\n\n- Go to your Azure Synapse portal.\n- Select the \"Develop\" section.\n- click the \"+\" button and select \"SQL script\"\n- name it \"configure_db\"\n- paste the following script and execute it:\n\n```sql\nCREATE DATABASE vehicles;\nGO;\n\nUSE vehicles;\n\n-- You should generate a new password\nCREATE MASTER KEY ENCRYPTION BY PASSWORD = '\u003cREDACTED\u003e';\n\nCREATE DATABASE SCOPED CREDENTIAL VehicleDataCredential\nWITH\n    IDENTITY = 'SHARED ACCESS SIGNATURE',\n    -- you should copy the SAS token configured for your Storage Account, in the \"Shared access signature\"\n    SECRET = 'sv=2022-11-02\u0026ss=b\u0026srt=co\u0026sp=rlf\u0026se=2024-12-31T22:42:44Z\u0026st=2024-12-15T14:42:44Z\u0026spr=https\u0026sig=\u003cREDACTED\u003e';\n\ncreate external data source VehicleDataEvents with ( \n    -- you should replace morganvehicledata with the name of your Storage Account.\n    location = 'wasbs://events@morganvehicledata.blob.core.windows.net',\n    CREDENTIAL = VehicleDataCredential  \n);\n\nGO;\n```\n\n##### Create queries\n\nThe strategy is to use the right BULK filter in order to only select the files containing the data, based on the time range and geohash list.\n\n###### Filter data and get stats\n\n- Go to your Azure Synapse portal.\n- Select the \"Develop\" section.\n- click the \"+\" button and select \"SQL script\"\n- name it \"events_per_file\"\n- paste the following script and execute it:\n\n```sql\nuse vehicles;\n\nSELECT ev.filename() as filename, COUNT(*) as event_count\nFROM  \n    OPENROWSET(\n        BULK '2024-01-01-*-*-*-*.parquet',\n        DATA_SOURCE = 'VehicleDataEvents',\n        FORMAT='PARQUET'\n    ) AS ev\nWHERE\n    ev.filepath(1) in ('05', '06', '07')\n    AND ev.filepath(3) in ('f257v', 'f25k6', 'f25se', 'f25ss')\n    AND ev.timestamp \u003e= '2024-01-01T05:05:00'\n    AND ev.timestamp \u003c '2024-01-01T07:05:00'\nGROUP BY\n  ev.filename()\nORDER BY\n  1\n```\n\n###### Filter data and get events\n\n- Go to your Azure Synapse portal.\n- Select the \"Develop\" section.\n- click the \"+\" button and select \"SQL script\"\n- name it \"get_events\"\n- paste the following script and execute it:\n\n```sql\nuse vehicles;\n\nSELECT ev.*\nFROM  \n    OPENROWSET(\n        BULK '2024-01-01-*-*-*-*.parquet',\n        DATA_SOURCE = 'VehicleDataEvents',\n        FORMAT='PARQUET'\n    ) AS ev\nWHERE\n    ev.filepath(1) in ('05', '06', '07')\n    AND ev.filepath(3) in ('f257v', 'f25k6', 'f25se', 'f25ss')\n    AND ev.timestamp \u003e= '2024-01-01T05:05:00'\n    AND ev.timestamp \u003c '2024-01-01T07:05:00'\nORDER BY\n  ev.timestamp\n```\n\n##### References\n\nhttps://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-parquet-files\nhttps://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-specific-files\nhttps://learn.microsoft.com/en-us/azure/synapse-analytics/sql/tutorial-data-analyst\n\n##### TODO\n\nUse a service principal and/or workload identities instead of SAS tokens.\n\n#### Technologies:\n\n- Event Hub: [Azure Event Hubs](https://azure.microsoft.com/fr-fr/products/event-hubs/)\n- Event Generator: Typescript NodeJS\n- Event Viewer: Nuxt server, Typescript, PixiJS\n- Event Collector: [Azure Stream Analytics](https://azure.microsoft.com/fr-fr/products/stream-analytics/)\n- Event Store: Azure Event Hubs\n- Event Finder: [Azure Synapse Analytics](https://azure.microsoft.com/fr-fr/products/synapse-analytics/)\n\n### AWS\n\n#### Diagram\n\n```\n                                           +-----\u003e Event Viewer\n                                           |         + NuxtJS\nEvent Generator ==\u003e Event Hub -------------+         \n                     + AWS Eventbridge     |\n                                           +----\u003e Event Collector -------\u003e Event Aggregate Store \u003c------ Event finder\n                                                  + Amazon Data Firehose       + AWS S3                     + Amazon Athena\n```\n\n\n#### Technologies:\n\n- Event Hub: [Amazon Eventbridge](https://aws.amazon.com/eventbridge/)\n- Event Generator: Typescript NodeJS\n- Event Viewer: Nuxt server, Typescript, PixiJS\n- Event Collector: [Amazon Data firehose](https://aws.amazon.com/firehose/)\n- Event Store: Amazon Eventbridge\n- Event finder: [Azure Stream Analytics](https://azure.microsoft.com/fr-fr/products/stream-analytics/)\n\n## File formats\n\n- JSON\n- CSV\n- Arrow\n- Parquet\n\n# Roadmap\n\n## Misc\n\ncreate a script that would do the cost projections for the data that will be aggregated, stored and queried.\n\n## Development\n\n### NodeJS\n\n#### Events\n\n- use https://github.com/cloudevents/spec\n- support auto reconnect when the server is not available\n- Support multiple messaging systems such as RabbitMQ, AWS SQS or Azure Event Hubs\n- Support event persistence\n- Support retry in message handling\n\n#### Generator\n\n\n#### Collector\n\n- use long term storage to persist events until they can be aggregated (instead of just using memory)\n- implement consistent hashing for paritioning geohashes in the instances\n- rewrite in Rust for speedup\n\n#### Viewer\n\n##### Web\n\n- use a mapping widget to display the vehicles?\n- try other Web frameworks?\n\n##### Terminal\n\nTODO: add a CLI that would listen to stats events and display aggregated stats.\n\n### Python\n\nTODO\n\n### Rust\n\nTODO\n\n# Local development\n\nNote that you can also use Docker for local development if you prefer. Then jump to the next \"Deployment/Docker\" section.\n\n## Scripts\n\n### Requirements\n\nPython3 for running some scripts.\n\n### install\n\n`python3 -m venv .venv`\n`.venv/bin/pip3 install -r scripts/requirements.txt`\n\n## NodeJS\n\n### Requirements\n\nNodeJS LTS should be installed.\n\nNATS server should be running. See [README](./event-hub/README.md) for the instructions.\n\n### install\n\n`bash scripts/nodejs/cleaup.sh`\n\n`bash scripts/nodejs/install.sh`\n\n### run the web viewer (optional)\n\nYou can use the dev mode with hotreload but it is much slower than the production build, when there are a lot of vehicles:\n\n```shell\ncd ./event/viewer/web/nuxt\nopen http://localhost:3000/\nnpm run dev\n```\n\nOr you can build a production release and run it (this is much more performant):\n```shell\ncd ./event/viewer/web/nuxt\nnpm npm run build\nopen http://localhost:3000/\nnpm run preview\n```\n\n### generate events\n\n`.venv/bin/python3 scripts/nodejs/start.py`\n\n# Deployment\n\n## Requirements\n\n- docker\n- docker-compose\n- kubectl\n- nodejs\n\n## Docker \n\n### build\n\n`bash scripts/nodejs/docker-build.sh`\n\n### run\n\nThe following script will start a `docker-compose`. Note that we run a single instance of each component in this mode.\n\n`bash scripts/nodejs/docker-run.sh`\n\n## Kubernetes\n\ncd deployment/kubernetes\n```shell\nnpm i\nnpm run synth\n```\n\nSelect the right Kubernetes cluster (either with the KUBECONFIG env var, or with a kubectl context).\n\n```shell\nkubectl apply -f dist/*\n```\n\nWait for everything to be created.\n\nBrowse to http://vehicle-fleet-viewer.kube.lab.ile.montreal.qc.ca/\n\nIf you want to cleanup the kubernetes resources:\n```shell\nkubectl delete -f dist/0001-vehicles.k8s.yaml\nkubectl delete -f dist/0000-vehicles-ns.k8s.yaml\n```\n\n# References\n\nhttps://medium.com/@igorvgorbenko/geospatial- data-analysis-in-clickhouse-polygons-geohashes-and-h3-indexing-2f55ff100fbe#:~:text=H3%20Indexing,-H3%20indexing\u0026text=Similar%20to%20geohashes%2C%20a%20longer,occupy%20a%20fixed%208%20bytes.\n\nhttps://medium.com/data-engineering-chariot/aggregating-files-in-your-data-lake-part-1-ed115b95425c\nhttps://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html\nhttps://deepak6446.medium.com/why-did-we-move-from-mongodb-to-athena-with-parquet-297b61ddf299\n\nhttps://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-real-time-fraud-detection\nhttps://www.red-gate.com/simple-talk/cloud/azure/query-blob-storage-sql-using-azure-synapse/\n\nhttps://hivekit.io/blog/how-weve-saved-98-percent-in-cloud-costs-by-writing-our-own-database/\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivetocode%2Fvehicle-fleet-poc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flivetocode%2Fvehicle-fleet-poc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivetocode%2Fvehicle-fleet-poc/lists"}