https://github.com/trungbac11/self-serving-customer-metrics
https://github.com/trungbac11/self-serving-customer-metrics
duckdb pyyaml sql
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/trungbac11/self-serving-customer-metrics
- Owner: trungbac11
- Created: 2025-11-14T19:24:41.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-11-14T19:49:07.000Z (7 months ago)
- Last Synced: 2025-11-14T21:26:14.160Z (7 months ago)
- Topics: duckdb, pyyaml, sql
- Language: Python
- Homepage:
- Size: 17.2 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SELF-SERVING-CUSTOMER-METRICS
**Context**: The Data Analysts in our team currently depend on data engineers to implement data pipelines for new customer metrics. This creates a bottleneck and slows down analysis.
**Goal**: to build a simple prototype for that system.
**Objective**: A lightweight prototype system that allows Data Analysts to define customer metrics using YAML + SQL, with automatic materialization in DuckDB.
## Setup instructions
### Prerequisites
- Python 3.12+
- Using Linux
### Installation Steps
1. **Clone the project** `git clone https://github.com/trungbac11/self-serving-customer-metrics.git`
2. **Create virtual enviroments**
`python -m venv venv`
3. **Active**
`source venv/bin/activate`
4. **Install dependencies:**
`pip install --upgrade pip`
`pip install -r requirements.txt`
5. **Initialize the database into DuckDB:**
`python src/setup_database.py`
## Development guideline
### Project Structure
``` text
project/
├── data/ # Source CSV files
├── metrics/ # Metric definitions (YAML)
│ ├── lifetime_revenue.yaml
│ ├── avg_order_revenue.yaml
│ └── customer_geography.yaml
├── src/ # Python scripts
│ ├── validate_yaml.py # YAML validation
│ ├── run_metrics.py # Metric execution
│ ├── setup_database.py # DB initialization
│ └── clean_database.py # DB cleanup
├── requirements.txt # Dependencies
└── README.md # Documentation
```
### Script Execution Guide
#### Database Initialization:
`python src/setup_database.py`
#### Metric Definition Validation:
`python src/validate_yaml.py`
#### Metric Calculation and Storage:
`python src/run_metrics.py`
#### Database Cleanup:
`python src/clean_database.py`
### System Testing
#### Validation Test Cases
- Test with missing required fields
- Test with invalid YAML syntax
- Test with malformed SQL queries
- Test with valid complete metric definitions
#### Error Handling
- All scripts include try-catch blocks for proper error handling
- Validation provides clear error messages for debugging
- Execution script continues processing other metrics if one fails
#### Maintenance
- Regular dependency updates: `pip install -r requirements.txt --upgrade`
- Database cleanup when needed: `python src/clean_database.py`
- Backup important data before major changes
## How to add a new metric
**Step 1: Create YAML Metric Definition**
- Create a new .yaml file in the metrics/ directory with this structure
**Step 2: Validate the Metric**
- Run validation to check for errors: `python src/validate_yaml.py`
**Step 3: Execute the Metric**
- Run the metric to create the table in DuckDB: `python src/run_metrics.py`
**This will:**
- Read all YAML files from /metrics
- Execute each SQL query against DuckDB
- Create or replace tables named after each metric