Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mohamedbsh/test-dlt-yato-avocado
https://github.com/mohamedbsh/test-dlt-yato-avocado
dlt france lawyers plotly yato
Last synced: 19 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/mohamedbsh/test-dlt-yato-avocado
- Owner: MohamedBsh
- License: mit
- Created: 2024-10-09T21:30:50.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-10-10T06:05:52.000Z (2 months ago)
- Last Synced: 2024-11-29T20:54:47.624Z (23 days ago)
- Topics: dlt, france, lawyers, plotly, yato
- Language: Python
- Homepage:
- Size: 289 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# dlt x yato 🥑🇫🇷
![Evolution of the number of lawyers in France](lawyers_evolution_visualization.png)
## Overview
This project analyzes and visualizes the evolution of the number of lawyers in France over time. Using public data, I created a data pipeline to process, analyze, and visualize this information.
The main objective is to provide insights into the growth and distribution of the legal profession in France.
## Data SourceThe project utilizes data from the French government's open data platform. The dataset is accessed directly from [link](https://static.data.gouv.fr/resources/evolution-du-nombre-davocats-en-france-par-barreau/20240403-143707/nombre-par-barreau.csv).
This CSV file contains detailed information about the number of lawyers per bar association in France over time. The data is regularly updated, ensuring that our analysis reflects the most current trends in the French legal profession.
## Stack
Our data pipeline leverages two powerful tools for efficient data processing and transformation:
### dlt (data load tool)
[dlt](https://github.com/dlt-hub/dlt) is an open-source library that simplifies the process of extracting, normalizing, and loading data. Key features include:
- Automated schema inference and evolution
- Built-in data verification and error handling
- Support for various data sources and destinationsIn this project, dlt is used for efficient data extraction from the CSV source and loading into our data processing pipeline.
### yato (yet another transformation orchestrator)
[yato](https://github.com/Bl3f/yato) is a lightweight SQL transformation orchestrator designed to work seamlessly with DuckDB. Its main advantages are:
- Efficient execution of SQL queries in the correct order
- Easy management of dependencies between transformationsWe use yato, the smallest DuckDB SQL orchestrator on Earth, to orchestrate our SQL transformations on the extracted data. Yato works seamlessly with DuckDB, ensuring a clean and well-structured dataset for analysis.
The combination of dlt and yato, leveraging the power of DuckDB, creates a flexible, maintainable, and easy-to-understand data pipeline that forms the backbone of our analysis. This setup allows us to efficiently process and transform data using SQL queries within the DuckDB environment.
## Installation
To set up the project environment:
1. Clone the repository:
```
git clone https://github.com/MohamedBsh/test-dlt-yato-avocado.git
cd test-dlt-yato-avocado
```2. Create and activate a virtual environment:
```
python -m venv venv
source venv/bin/activate
```3. Install the required dependencies:
```
pip install -r requirements.txt
```## Usage
To run the data pipeline and generate visualizations:
1. Ensure your virtual environment is activated.
2. Execute the main script:
```
python app/pipeline.py
```3. Clean the database
```
python app/data_explorer.py
Choose 'clean'
```4. Explore the database
```
python app/data_explorer.py
Choose 'explore'
```5. Visualize the data
```
python generate_plot.py
```