Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/e2fyi/kfx

Extensions to kubeflow pipeline sdk.
https://github.com/e2fyi/kfx
kfx kubeflow kubeflow-pipeline-task kubeflow-pipeline-ui kubeflow-pipelines python visualizations
Last synced: 3 months ago
JSON representation
Extensions to kubeflow pipeline sdk.
Host: GitHub
URL: https://github.com/e2fyi/kfx
Owner: e2fyi
Created: 2020-02-28T10:04:54.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2021-06-09T05:50:55.000Z (over 3 years ago)
Last Synced: 2024-10-14T09:32:05.369Z (4 months ago)
Topics: kfx, kubeflow, kubeflow-pipeline-task, kubeflow-pipeline-ui, kubeflow-pipelines, python, visualizations
Language: Python
Size: 135 KB
Stars: 5
Watchers: 4
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project

README

        # kfx

[![PyPI version](https://badge.fury.io/py/kfx.svg)](https://badge.fury.io/py/kfx)

[![Build Status](https://travis-ci.com/e2fyi/kfx.svg?branch=master)](https://travis-ci.com/e2fyi/kfx)

[![Coverage Status](https://coveralls.io/repos/github/e2fyi/kfx/badge.svg?branch=master)](https://coveralls.io/github/e2fyi/kfx?branch=master)

[![Documentation Status](https://readthedocs.org/projects/kfx/badge/?version=latest)](https://kfx.readthedocs.io/en/latest/?badge=latest)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![Downloads](https://pepy.tech/badge/kfx/month)](https://pepy.tech/project/kfx/month)

`kfx` is a python package with the namespace `kfx`. Currently, it provides the

following sub-packages

- `kfx.lib.dsl` - Extensions to the kubeflow pipeline dsl.

- `kfx.lib.vis` - Data models and helpers to help generate the  `mlpipeline-metrics.json` and `mlpipeline-ui-metadata.json` required to render visualization in the kubeflow pipeline UI. See also https://www.kubeflow.org/docs/pipelines/sdk/pipelines-metrics/ and https://www.kubeflow.org/docs/pipelines/sdk/output-viewer/

> - Documentation: [https://kfx.readthedocs.io](https://kfx.readthedocs.io).

> - Repo: [https://github.com/e2fyi/kfx](https://github.com/e2fyi/kfx)

> ### NOTE this is currently alpha

>

> There will likely to have breaking changes, and feel free to do a feature request

>

> ### Known issues

>

> - `kfx.vis.vega.vega_web_app` and `KfpArtifact` does not work well together (see example) because of CORs - the web app is hosted inside an iFrame which prevents it from accessing the `ml-pipeline-ui` API server.

> - `kfx.vis.vega.vega_web_app` is only supported in the latest kubeflow pipeline UI (as inline is only supported after `0.2.5`)

### Changelog

Refer to [CHANGELOG.md](./CHANGELOG.md).

## Quick start

Installation

```bash

pip install kfx

```

## Usage

Example: Using `ContainerOpTransform` to configure the internal k8s properties

of kubeflow pipelines tasks.

> `kfx.dsl.ContainerOpTransform` is a helper to modify the interal k8s properties

> (e.g. resources, environment variables, etc) of kubeflow pipeline tasks.

```python

import kfp.components

import kfp.dsl

import kfx.dsl

transforms = (

    kfx.dsl.ContainerOpTransform()

    .set_resources(cpu="500m", memory=("1G", "4G"))

    .set_image_pull_policy("Always")

    .set_env_vars({"ENV": "production"})

    .set_env_var_from_secret("AWS_ACCESS_KEY", secret_name="aws", secret_key="access_key")

    .set_annotations({"iam.amazonaws.com/role": "some-arn"})

)

@kfp.dsl.components.func_to_container_op

def echo(text: str) -> str:

    print(text)

    return text

@kfp.dsl.pipeline(name="demo")

def pipeline(text: str):

    op1 = echo(text)

    op2 = echo("%s-%s" % text)

    # u can apply the transform on op1 only

    # op1.apply(transforms)

    # or apply on all ops in the pipeline

    kfp.dsl.get_pipeline_conf().add_op_transformer(transforms)

```

Example: Using `ArtifactLocationHelper` and `KfpArtifact` to determine the

uri of your data artifact generated by the kubeflow pipeline task.

> `kfx.dsl.ArtifactLocationHelper` is a helper to modify the kubeflow pipeline task

> so that you can use `kfx.dsl.KfpArtifact` to represent the artifact generated

> inside the task.

```python

import kfp.components

import kfp.dsl

import kfx.dsl

# creates the helper that has the argo configs (tells you how artifacts will be stored)

# see https://github.com/argoproj/argo/blob/master/docs/workflow-controller-configmap.yaml

helper = kfx.dsl.ArtifactLocationHelper(

    scheme="minio", bucket="mlpipeline", key_prefix="artifacts/"

)

@kfp.components.func_to_container_op

def test_op(

    mlpipeline_ui_metadata: OutputTextFile(str), markdown_data_file: OutputTextFile(str)

):

    "A test kubeflow pipeline task."

    import json

    import kfx.dsl

    import kfx.vis

    import kfx.vis.vega

    # `KfpArtifact` provides the reference to data artifact created

    # inside this task

    spec = {

        "$schema": "https://vega.github.io/schema/vega-lite/v4.json",

        "description": "A simple bar chart",

        "data": {

            "values": [

                {"a": "A", "b": 28},

                {"a": "B", "b": 55},

                {"a": "C", "b": 43},

                {"a": "D", "b": 91},

                {"a": "E", "b": 81},

                {"a": "F", "b": 53},

                {"a": "G", "b": 19},

                {"a": "H", "b": 87},

                {"a": "I", "b": 52},

            ]

        },

        "mark": "bar",

        "encoding": {

            "x": {"field": "a", "type": "ordinal"},

            "y": {"field": "b", "type": "quantitative"},

        },

    }

    # write the markdown to the `markdown-data` artifact

    markdown_data_file.write("### hello world")

    # creates an ui metadata object

    ui_metadata = kfx.vis.kfp_ui_metadata(

        # Describes the vis to generate in the kubeflow pipeline UI.

        [

            # markdown vis from a markdown artifact.

            # `KfpArtifact` provides the reference to data artifact created

            # inside this task

            kfx.vis.markdown(kfx.dsl.KfpArtifact("markdown_data_file")),

            # a vega web app from the vega data artifact.

            kfx.vis.vega.vega_web_app(spec),

        ]

    )

    # writes the ui metadata object as the `mlpipeline-ui-metadata` artifact

    mlpipeline_ui_metadata.write(kfx.vis.asjson(ui_metadata))

    # prints the uri to the markdown artifact

    print(ui_metadata.outputs[0].source)

@kfp.dsl.pipeline()

def test_pipeline():

    "A test kubeflow pipeline"

    op: kfp.dsl.ContainerOp = test_op()

    # modify kfp operator with artifact location metadata through env vars

    op.apply(helper.set_envs())

```

Example: Using `pydantic` data models to generate `mlpipeline-metrics.json` and

`mlpipeline-ui-metadata.json`.

(See also https://www.kubeflow.org/docs/pipelines/sdk/output-viewer/ and

https://www.kubeflow.org/docs/pipelines/sdk/pipelines-metrics/).

> `kfx.vis` has helper functions (with corresponding hints) to describe and

> create `mlpipeline-metrics.json` and `mlpipeline-ui-metadata.json` files

> (required by kubeflow pipeline UI to render any metrics or visualizations).

```python

import functools

import kfp.components

# install kfx

kfx_component = functools.partial(kfp.components.func_to_container_op, packages_to_install=["kfx"])

@kfx_component

def some_op(

    # mlpipeline_metrics is a path - i.e. open(mlpipeline_metrics, "w")

    mlpipeline_metrics: kfp.components.OutputPath(str),

    # mlpipeline_ui_metadata is a FileLike obj - i.e. mlpipeline_ui_metadata.write("something")

    mlpipeline_ui_metadata: kfp.components.OutputTextFile(str),

):

    "kfp operator that provides metrics and metadata for visualizations."

    # import inside kfp task

    import kfx.vis

    # output metrics to mlpipeline_metrics path

    kfx.vis.kfp_metrics([

        # render as percent

        kfx.vis.kfp_metric("recall-score", 0.9, percent=true),

        # override metric format with custom value

        kfx.vis.kfp_metric(name="percision-score", value=0.8, metric_format="PERCENTAGE"),

        # render raw score

        kfx.vis.kfp_metric("raw-score", 123.45),

    ]).write_to(mlpipeline_metrics)

    # output visualization metadata to mlpipeline_ui_metadata obj

    kfx.vis.kfp_ui_metadata(

        [

            # creates a confusion matrix vis

            kfx.vis.confusion_matrix(

                source="gs://your_project/your_bucket/your_cm_file",

                labels=["True", "False"],

            ),

            # creates a markdown with inline source

            kfx.vis.markdown(

                "# Inline Markdown: [A link](https://www.kubeflow.org/)",

                storage="inline",

            ),

            # creates a markdown with a remote source

            kfx.vis.markdown(

                "gs://your_project/your_bucket/your_markdown_file",

            ),

            # creates a ROC curve with a remote source

            kfx.vis.roc(

                "gs://your_project/your_bucket/your_roc_file",

            ),

            # creates a Table with a remote source

            kfx.vis.table(

                "gs://your_project/your_bucket/your_csv_file",

                header=["col1", "col2"],

            ),

            # creates a tensorboard viewer

            kfx.vis.tensorboard(

                "gs://your_project/your_bucket/logs/*",

            ),

            # creates a custom web app from a remote html file

            kfx.vis.web_app(

                "gs://your_project/your_bucket/your_html_file",

            ),

            # creates a Vega-Lite vis as a web app

            kfx.vis.vega.vega_web_app(spec={

                "$schema": "https://vega.github.io/schema/vega-lite/v4.json",

                "description": "A simple bar chart with embedded data.",

                "data": {

                    "values": [

                        {"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43},

                        {"a": "D", "b": 91}, {"a": "E", "b": 81}, {"a": "F", "b": 53},

                        {"a": "G", "b": 19}, {"a": "H", "b": 87}, {"a": "I", "b": 52}

                    ]

                },

                "mark": "bar",

                "encoding": {

                    "x": {"field": "a", "type": "ordinal"},

                    "y": {"field": "b", "type": "quantitative"}

                }

            })

        ]

    ).write_to(mlpipeline_ui_metadata)

```

## Developer guide

This project used:

- isort: to manage import order

- pylint: to manage general coding best practices

- flake8: to manage code complexity and coding best practices

- black: to manage formats and styles

- pydocstyle: to manage docstr style/format

- pytest/coverage: to manage unit tests and code coverage

- bandit: to find common security issues

- pyenv: to manage dev env: python version (3.6)

- pipenv: to manage dev env: python packages

Convention for unit tests are to suffix with `_test` and colocate with the actual

python module - i.e. `_test.py`.

The version of the package is read from `version.txt` - i.e. please update the

appropriate semantic version (major -> breaking changes, minor -> new features, patch -> bug fix, postfix -> pre-release/post-release).

### `Makefile`:

```bash

# autoformat codes with docformatter, isort, and black

make format

# check style, formats, and code complexity

make check

# check style, formats, code complexity, and run unit tests

make test

# test everything including building the package and check the sdist

make test-all

# run unit test only

make test-only

# generate and update the requirements.txt and requirements-dev.txt

make requirements

# generate the docs with sphinx and autoapi extension

make docs

# generate distributions

make dists

# publish to pypi with twine (twine must be configured)

make publish

```