https://github.com/e-breuninger/data2day-2022

Public repository containing the materials and slides for Breuninger's data2day 2022 presentation.
https://github.com/e-breuninger/data2day-2022

breuninger conference data2day

Last synced: 11 months ago
JSON representation

Public repository containing the materials and slides for Breuninger's data2day 2022 presentation.

Host: GitHub
URL: https://github.com/e-breuninger/data2day-2022
Owner: e-breuninger
Created: 2022-09-19T06:31:16.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-09-20T13:34:33.000Z (over 3 years ago)
Last Synced: 2025-06-30T05:06:09.553Z (12 months ago)
Topics: breuninger, conference, data2day
Language: Jupyter Notebook
Homepage: https://www.e-breuninger.de/de/karriere/
Size: 7.88 MB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# data2day-2022

Public repository containing the material for the 2022 data2day conference.

More on the program: [From PoCs to Large Scale ML Operationalization Covering the End-to-End Pipeline](https://www.data2day.de/veranstaltung-15085-0-from-pocs-to-large-scale-ml-operationalization-covering-the-end-to-end-pipeline.html).

## Developers

This repository is owned and maintained by E-Breuninger Developer Team.

For any feedbacks or inquiries related to this repository, you can contact [Olivier Bénard](https://github.com/olivierbenard) (Data Software Engineer).

## Dependencies

The dependencies are managed via `poetry`. We recommend to use and integrate this tool in your process.
However, we also provide the list of necessary requirements with the `requirements.txt` file if you decide otherwise.

**Note:** It might be possible that you have to switch your python version. We recommend using `pyenv` as a python version manager, to be installed via `brew install pyenv`.

## Quick Start

To install all the dependencies and rapidly start getting your hands dirty:

1. Create a `settings.toml` file based on the following template:
```toml
[default]
LOG_LEVEL = "DEBUG"
LATITUDE = ""
LONGITUDE = ""
APP_PATH = "/absolute/path/to/the/local/repository/"
```
2. Create a `.secrets.toml` file based on the following template (you can left the default if you have no key):
```toml
[default]
google_map_api_key = ""
```
3. Install all the dependencies on the virtual environment via `poetry`:

poetry install
4. You are ready to go and can start the `jupyter notebook` kernel:

make notebook

Only thing left to do if to naviguate through `notebooks/` and play with the notebooks.

**Bonus:** If you want to publish some changes, you first need to install pre-commit:

make pre-commit-install

This will guarantee that the code you push meets the best software development standards and the github CI/CD pipeline to succeed i.e. your code will be accepted.

**Notes:**
- You need to install poetry if you do not have it already via `brew install poetry`.
- The Google Map API key is used to display the weather stations on Google Map. However, you do not need it since by default, the developer mode (activated by default if you do not have a key or a valid one) - even though grants less opportunities - also does the job.

## Architecture

- The `data2day_2022/` foler contains reusable part of the code such as the `sql` queries and the `helpers` package.
- The `datasets/` folder contains the template you have to fill int to make the forecast.
- The `notebooks/` folder contains a couple of jupyter notebook where lies the main logic of the code.
- The `results/` folder contains the results to be generated by the notebooks.
- The `slides/` folder contains the anonymised presentation as a `.pdf` format.
- The `tests/` folder contains a couple of unittests to test our code.
- The `.pre-commit-config.yaml` file contains a couple of logics to be executed at the commit time before the code can be pushed.
- The `Makefile` contains a serie of redundant commands e.g. `make check` or `make notebook`.
- The `.secrets.toml` and `settings.toml` are parametrisation files containing the variables used in the code.

## Running your own forecast

* You can parametrised the serie you want to predict using the `datasets/customer_frequentation.csv` file. Fill it with your own data, respecting the following template:

|date|quantity|
|---|---|
|\|\|

* Rainfall data for Stuttgart in 2018 has been retrieved and collected in the `results/weather_prpc.csv` file. You can however query the intial tables on BigQuery using `notebooks/weather_data_on_biqguery.ipynb`. Results will be captured under the `results/` folder.

## Troubleshooting

The troubleshooting section is empty so far but should you encounter any issue not stated in the current documentation, please contact us.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/e-breuninger/data2day-2022

Awesome Lists containing this project

README