{"id":22169803,"url":"https://github.com/akopdev/smart-weather-station","last_synced_at":"2025-03-24T17:24:01.281Z","repository":{"id":256290523,"uuid":"854829615","full_name":"akopdev/smart-weather-station","owner":"akopdev","description":null,"archived":false,"fork":false,"pushed_at":"2024-10-18T02:51:17.000Z","size":501,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-24T18:08:01.778Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/akopdev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-09T21:05:29.000Z","updated_at":"2024-10-18T02:51:20.000Z","dependencies_parsed_at":"2024-09-10T02:51:51.676Z","dependency_job_id":"89517ecb-e1e7-4b26-abc3-1fa29e857f5e","html_url":"https://github.com/akopdev/smart-weather-station","commit_stats":null,"previous_names":["akopdev/smart-weather-station"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akopdev%2Fsmart-weather-station","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akopdev%2Fsmart-weather-station/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akopdev%2Fsmart-weather-station/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akopdev%2Fsmart-weather-station/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/akopdev","download_url":"https://codeload.github.com/akopdev/smart-weather-station/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245316280,"owners_count":20595408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-02T06:36:00.166Z","updated_at":"2025-03-24T17:24:01.251Z","avatar_url":"https://github.com/akopdev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"HomeKit Weather Station\n=======================\n\nThis is an exploration project, to discover the possibilities of machine learning \non a tiny microcontroller. The goal is to create a weather station that can predict\nthe weather based on the data it collects. \n\nFinal device should have an Apple HomeKit integration no enable home automation and\nvoice control.\n\n\u003e **Note:** This project is in an active development phase, with intention to be \n\u003e used as a learning dairy for myself.\n\n## Working with the garage door up\n\nThis project is built \"with the garage door up\" mindset, inspired by Andy Matuschak's \n[notes](https://notes.andymatuschak.org/About_these_notes?stackedNotes=z21cgR9K3UcQ5a7yPsj2RUim3oM2TzdBByZu). \n\nI'm inviting you to see the work before it's finished, so you can follow the progress as it emerges.\n\nThis is \"anti-marketing\" because marketing is about promoting a product in the best possible light, \nwhereas working with the garage door up exposes unpolished work and it is more realistic.\n\nFeel free to reach me out with any questions or suggestions, or open an issue.\n\n## Project log\n\n### Model training\n\n- [X] Download dataset from WorldWeatherOnline\n    - Sample of 10 years \n    - Hourly frequency\n    - Temperature, humidity, raining\n- [X] Cleaning up the dataset\n\t- Convert rain data into a binary (Yes/No)\n\t- Balance the dataset by undersampling the majority class\n\t- Scale the input features with Z-score\n- [X] Training the model with TensorFlow \n\t- Split the dataset into train, validation, and test datasets\n\t- Create a model with Keras API\n\t- Analyze the accuracy and loss after each training epoch\n- [X] Evaluating the model's effectiveness \n\t- Visualize a confusion matrix\n\t- Calculate Recall, Precision, and F-score performance metrics\n- [X] Quantizing\n    - Convert the model to TFLite\n    - On device deployment\n- [X] Code refactoring\n    - Group functions into classes\n    - Separate business logic of the app from the interface\n    - Build a pipeline for continuous training\n\n### On-device deployment\n\n- [ ] Collecting data with DHT22 sensor\n- [ ] On-device inference\n\n## Prepare dataset\n\nIn order to make training the model easier and reproducible, I wrote a script\nto download the dataset from Open Meteo through their API. \n\nScript supports multiple parameters, like location and date range.\nTo make it more useful, I added a feature to save the data to different formats,\nlike CSV, JSON, and column text, primarily used for debugging.\n\nThe script is pretty straightforward, I decided not to use asynchronous approach,\nsince script is going to be run only in one thread and only single request, it \nshould be fast enough.\n\nFor both, user input and API response validation, I used `pydantic` library, with\nsome custom validators, primarily for processing multiple values from the user.\n\nTo make a prediction for the rain, we need to convert data that comes as a float,\ninto a binary value. I decided to use a simple threshold, if the value is greater\nthan 0.1, it's raining, otherwise it's not.\n\nThe temperature and humidity features have a different numerical ranges, and so \ndifferent contributions during training, leading to a bias. I need to rescale using \n`z-score` technique to ensure that each input feature contributes equally during training.\n\n## Data cleaning and preparation\n\nAlthough the data we have is already quite clean, we still need to do some\npost-processing to make it more suitable for training the model. For instance, \nrain data is represented as a float, but we need to convert it into a binary\nvalue.\n\nI decided to use a simple threshold, if the value is greater than 0.1, it's\nraining, otherwise it's not. Maybe it makes sense to think about more cases,\nlike drizzle, but for now, I'll keep it simple.\n\nSince the dataset is unbalanced, I decided to undersample the majority class (no rain \nin my case). Without this step, the model will fail to learn the patterns of the minority,\nand performance of `recall` and `precision` metrics will be very poor.\n\nI used the basic `z-score` technique to scale the temperature and humidity features,\nto ensure that each input feature contributes equally during training. To validate the\nresults, \n\n\n## Training the model\n\nI splitted the dataset into train, validation, and test subsets. I used a binary classification\nmodel with one fully connected layer with 12 neurons and followed by ReLU activation function,\none dropout layer with 0.2 rate, and the output layer with a single neuron and sigmoid activation\nfunction.\n\nUse `make train` to train the model on downloaded dataset. \n\n## Evaluating the model\n\nI calculated the confusion matrix, and common performance metrics: \n\n- `accuracy`: the ratio of correctly predictions to the total number of tests\n- `recall`: metric tells us how many of the actual positive cases we were able to predict (higher is better)\n- `precision`: how many of the predicted positive cases were actually positive (higher is better)\n- `f1-score`: helps to evaluate recall and precision metrics at the same time (higher is better)\n\n## Quantizing\n\nOnce the model is trained and evaluated, I need to compress it to allow inference on tiny devices. \nI exported model to keras format, and then used TFLite to convert it to FlatBuffer, applying\n8-bit quantization to reduce the size. Final binary was converted to c-byte array to be used on\nesp32 microcontroller.\n\nFor hex dump, I used `xxd` command, available on MacOS. For Linux you can install it with `sudo apt install xxd`.\nSee the `Makefile` for more details on which options to use.\n\n## Reminders to myself\n\n- [ ] Python packages, specially so complicated as TensorFlow, can be a nightmare to install. I spent\nhours trying to reproduce Jupyter notebook environment both remote and local, and each time I ran\ninto a different issue. Best start for a new ML project is to create a robust environment with\nDocker, and python dependencies, pinned to the specific version.\n- [X] I can't find any benefit of wrapping code into a cli tools, and fitting it into data pipeline\nframework might be much more practical.\n- [X] During the work on the last part of model training, I noticed an interesting relation between the first step \n(dataset preparation), and next steps (training, quantizing). I should consider converting each step\ninto a separate class, and use a basic OOP principles to pass shared entities between steps, reducing\ndependencies on global variables or non-trivial function executions between logically separated parts of the program.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakopdev%2Fsmart-weather-station","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakopdev%2Fsmart-weather-station","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakopdev%2Fsmart-weather-station/lists"}