https://github.com/planet-a-ventures/dlt-source-personio
DLT (www.github.com/dlt-hub/dlt) source for Personio (www.personio.com)
https://github.com/planet-a-ventures/dlt-source-personio
data-engineering data-load-tool data-loading dlt dlthub personio python
Last synced: about 1 year ago
JSON representation
DLT (www.github.com/dlt-hub/dlt) source for Personio (www.personio.com)
- Host: GitHub
- URL: https://github.com/planet-a-ventures/dlt-source-personio
- Owner: planet-a-ventures
- License: mit
- Created: 2025-03-11T11:47:00.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-14T00:53:09.000Z (about 1 year ago)
- Last Synced: 2025-03-14T01:43:53.204Z (about 1 year ago)
- Topics: data-engineering, data-load-tool, data-loading, dlt, dlthub, personio, python
- Language: Python
- Homepage:
- Size: 2.93 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
---
description: dlt source for personio.com
keywords: [personio API, personio.com]
---
# dlt-source-personio
[](https://pypi.org/project/dlt-source-personio/)
[DLT](https://dlthub.com/) source for [personio](https://www.personio.com/).
Currently loads the following data:
| Table | Contains | Spec version |
| -- | -- | -- |
| `persons` | Items of the `Person` model with all properties | `V2` |
| `persons_custom_attributes` | All defined custom attributes. The table is pivoted, so each custom attribute becomes a column in the table. | `V2` |
| `employments` | Items of the `Employment` model with all properties | `V2` |
## Why are you not using the `dlt-hub/verified-sources` personio source / Differences
The [official verified source](https://github.com/dlt-hub/verified-sources/tree/master/sources/personio)
has a few drawbacks:
- it is based on the Personio API V1, not the current V2
- on usage of the verified source, a copy of the current state of
the `dlt-hub/verified-sources` repository is copied into your project;
Once you make changes to it, it effectively becomes a fork,
making it hard to update after the fact.
- The verified source does not use any data validation other than
ensuring dates are correct; This means that data shape is not guaranteed,
resulting in potential schema changes.
This data source uses an (unofficial) OpenAPI spec, which is transformed
into Pydantic 2 models.
Currently this data source does not support delta updates
(the verified source does) and it also does not contain some of the data
sources (absences, etc.). Contributions are welcome!
## Usage
Create a `.dlt/secrets.toml` with your API key and email:
```toml
personio_client_id = "papi-..."
personio_client_secret = "papi-..."
```
and then run the default source with optional list references:
```py
from dlt_source_personio import source as personio_source
pipeline = dlt.pipeline(
pipeline_name="personio_pipeline",
destination="duckdb",
dev_mode=True,
)
personio_data = personio_source()
pipeline.run(personio_data)
```
## Development
This project is using [devenv](https://devenv.sh/).
Commands:
| Command | What does it do? |
| -- | -- |
| `generate-model` | generates the personio Pydantic model from the current spec file, applies patches, etc. |
| `update-spec` | Pulls in the latest `master#HEAD` of [personio/api-docs](https://github.com/personio/api-docs) |
| `validate-spec` | Validates the local (unofficial) Personio V2 spec |
| `refresh-model` | Both commands above plus adds it to git and commits the changes. |
| `format` | Formats & lints all code |
| `sample-pipeline-run` | Runs the sample pipeline. By default `dev_mode=True` which fetches resources with a limit of 1 (page) |
| `sample-pipeline-show` | Starts the streamlit-based dlt hub |
### Run the sample
```sh
PERSONIO_CLIENT_ID=[...] \
PERSONIO_CLIENT_SECRET=[...] \
sample-pipeline-run
```
alternatively you can also create a `.dlt/secrets.toml`
(excluded from git) with the following content:
```toml
personio_client_id = "papi-..."
personio_client_secret = "papi-..."
```