https://github.com/e-breuninger/data2day-2022
Public repository containing the materials and slides for Breuninger's data2day 2022 presentation.
https://github.com/e-breuninger/data2day-2022
breuninger conference data2day
Last synced: 11 months ago
JSON representation
Public repository containing the materials and slides for Breuninger's data2day 2022 presentation.
- Host: GitHub
- URL: https://github.com/e-breuninger/data2day-2022
- Owner: e-breuninger
- Created: 2022-09-19T06:31:16.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-09-20T13:34:33.000Z (over 3 years ago)
- Last Synced: 2025-06-30T05:06:09.553Z (12 months ago)
- Topics: breuninger, conference, data2day
- Language: Jupyter Notebook
- Homepage: https://www.e-breuninger.de/de/karriere/
- Size: 7.88 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# data2day-2022
Public repository containing the material for the 2022 data2day conference.
More on the program: [From PoCs to Large Scale ML Operationalization Covering the End-to-End Pipeline](https://www.data2day.de/veranstaltung-15085-0-from-pocs-to-large-scale-ml-operationalization-covering-the-end-to-end-pipeline.html).
## Developers
This repository is owned and maintained by E-Breuninger Developer Team.
For any feedbacks or inquiries related to this repository, you can contact [Olivier Bénard](https://github.com/olivierbenard) (Data Software Engineer).
## Dependencies
The dependencies are managed via `poetry`. We recommend to use and integrate this tool in your process.
However, we also provide the list of necessary requirements with the `requirements.txt` file if you decide otherwise.
**Note:** It might be possible that you have to switch your python version. We recommend using `pyenv` as a python version manager, to be installed via `brew install pyenv`.
## Quick Start
To install all the dependencies and rapidly start getting your hands dirty:
1. Create a `settings.toml` file based on the following template:
```toml
[default]
LOG_LEVEL = "DEBUG"
LATITUDE = ""
LONGITUDE = ""
APP_PATH = "/absolute/path/to/the/local/repository/"
```
2. Create a `.secrets.toml` file based on the following template (you can left the default if you have no key):
```toml
[default]
google_map_api_key = ""
```
3. Install all the dependencies on the virtual environment via `poetry`:
poetry install
4. You are ready to go and can start the `jupyter notebook` kernel:
make notebook
Only thing left to do if to naviguate through `notebooks/` and play with the notebooks.
**Bonus:** If you want to publish some changes, you first need to install pre-commit:
make pre-commit-install
This will guarantee that the code you push meets the best software development standards and the github CI/CD pipeline to succeed i.e. your code will be accepted.
**Notes:**
- You need to install poetry if you do not have it already via `brew install poetry`.
- The Google Map API key is used to display the weather stations on Google Map. However, you do not need it since by default, the developer mode (activated by default if you do not have a key or a valid one) - even though grants less opportunities - also does the job.
## Architecture
- The `data2day_2022/` foler contains reusable part of the code such as the `sql` queries and the `helpers` package.
- The `datasets/` folder contains the template you have to fill int to make the forecast.
- The `notebooks/` folder contains a couple of jupyter notebook where lies the main logic of the code.
- The `results/` folder contains the results to be generated by the notebooks.
- The `slides/` folder contains the anonymised presentation as a `.pdf` format.
- The `tests/` folder contains a couple of unittests to test our code.
- The `.pre-commit-config.yaml` file contains a couple of logics to be executed at the commit time before the code can be pushed.
- The `Makefile` contains a serie of redundant commands e.g. `make check` or `make notebook`.
- The `.secrets.toml` and `settings.toml` are parametrisation files containing the variables used in the code.
## Running your own forecast
* You can parametrised the serie you want to predict using the `datasets/customer_frequentation.csv` file. Fill it with your own data, respecting the following template:
|date|quantity|
|---|---|
|\|\|
* Rainfall data for Stuttgart in 2018 has been retrieved and collected in the `results/weather_prpc.csv` file. You can however query the intial tables on BigQuery using `notebooks/weather_data_on_biqguery.ipynb`. Results will be captured under the `results/` folder.
## Troubleshooting
The troubleshooting section is empty so far but should you encounter any issue not stated in the current documentation, please contact us.