https://github.com/monarch-initiative/koza
Data transformation framework for LinkML data models
https://github.com/monarch-initiative/koza
etl knowledge-graph koza linkml monarchinitiative obofoundry ontology
Last synced: 18 days ago
JSON representation
Data transformation framework for LinkML data models
- Host: GitHub
- URL: https://github.com/monarch-initiative/koza
- Owner: monarch-initiative
- License: bsd-3-clause
- Created: 2020-12-17T16:25:47.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-02-28T19:05:05.000Z (about 2 months ago)
- Last Synced: 2025-03-30T02:03:43.919Z (25 days ago)
- Topics: etl, knowledge-graph, koza, linkml, monarchinitiative, obofoundry, ontology
- Language: Python
- Homepage: https://koza.monarchinitiative.org/
- Size: 4.48 MB
- Stars: 50
- Watchers: 22
- Forks: 5
- Open Issues: 43
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Koza - a data transformation framework
[](https://pypi.python.org/pypi/koza)
[](https://pypi.python.org/pypi/koza)

[**Documentation**](https://koza.monarchinitiative.org/ )
_Disclaimer_: Koza is in beta - we are looking for testers!
## Overview
- Transform csv, json, yaml, jsonl, and xml and converting them to a target csv, json, or jsonl format based on your dataclass model.
- Koza also can output data in the [KGX format](https://github.com/biolink/kgx/blob/master/specification/kgx-format.md#kgx-format-as-tsv)
- Write data transforms in semi-declarative Python
- Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
- Create or import mapping files to be used in ingests (eg id mapping, type mappings)
- Create and use translation tables to map between source and target vocabularies## Installation
Koza is available on PyPi and can be installed via pip/pipx:
```
[pip|pipx] install koza
```## Usage
**NOTE: As of version 0.2.0, there is a new method for getting your ingest's `KozaApp` instance. Please see the [updated documentation](https://koza.monarchinitiative.org/Usage/configuring_ingests/#transform-code) for details.**
See the [Koza documentation](https://koza.monarchinitiative.org/) for usage information
### Try the Examples
#### Validate
Give Koza a local or remote csv file, and get some basic information (headers, number of rows)
```bash
koza validate \
--file https://raw.githubusercontent.com/monarch-initiative/koza/main/examples/data/string.tsv \
--delimiter ' '
```Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl
```bash
koza validate \
--file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
--format jsonl
``````bash
koza validate \
--file ./examples/data/ddpheno.json.gz \
--format json
```#### Transform
Run the example ingest, "string/protein-links-detailed"
```bash
koza transform \
--source examples/string/protein-links-detailed.yaml \
--global-table examples/translation_table.yamlkoza transform \
--source examples/string-declarative/protein-links-detailed.yaml \
--global-table examples/translation_table.yaml
```**Note**:
Koza expects a directory structure as described in the above example
with the source config file and transform code in the same directory:
```
.
├── ...
│ ├── your_source
│ │ ├── your_ingest.yaml
│ │ └── your_ingest.py
│ └── some_translation_table.yaml
└── ...
```