{"id":19140818,"url":"https://github.com/codait/flight-delay-notebooks","last_synced_at":"2025-05-06T23:17:23.485Z","repository":{"id":42225807,"uuid":"312364399","full_name":"CODAIT/flight-delay-notebooks","owner":"CODAIT","description":"Analyzing flight delay and weather data using Elyra, IBM Data Asset Exchange, Kubeflow Pipelines and KFServing","archived":false,"fork":false,"pushed_at":"2022-10-12T22:15:46.000Z","size":1751,"stargazers_count":15,"open_issues_count":1,"forks_count":6,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-05-06T23:17:17.569Z","etag":null,"topics":["codait","data-science","elyra","jupyter","jupyter-notebook","jupyterlab","kfserving","kubeflow-pipelines","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CODAIT.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-11-12T18:36:56.000Z","updated_at":"2024-05-21T19:21:30.000Z","dependencies_parsed_at":"2023-01-19T23:35:04.678Z","dependency_job_id":null,"html_url":"https://github.com/CODAIT/flight-delay-notebooks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fflight-delay-notebooks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fflight-delay-notebooks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fflight-delay-notebooks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CODAIT%2Fflight-delay-notebooks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CODAIT","download_url":"https://codeload.github.com/CODAIT/flight-delay-notebooks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252782835,"owners_count":21803410,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codait","data-science","elyra","jupyter","jupyter-notebook","jupyterlab","kfserving","kubeflow-pipelines","machine-learning"],"created_at":"2024-11-09T07:18:54.294Z","updated_at":"2025-05-06T23:17:23.467Z","avatar_url":"https://github.com/CODAIT.png","language":"Jupyter Notebook","readme":"# Analyzing flight delay and weather data using Elyra, Kubeflow Pipelines and KFServing\n\nThis repository contains a set of Python scripts and Jupyter notebooks that analyze and predict flight delays. The datasets are hosted on the [IBM Developer Data Asset Exchange](https://ibm.biz/data-exchange).\n\nWe use [Elyra](https://github.com/elyra-ai/elyra) to create a pipeline that can be executed locally or using a [Kubeflow Pipelines](https://www.kubeflow.org/docs/pipelines/overview/pipelines-overview/) runtime. This pipeline:\n\n* Loads the datasets\n* Pre-processes the datasets\n* Performs data merging and feature extraction\n* Analyzes and visualizes the processed dataset\n* Trains and evaluates machine learning models for predicting delayed flights, using features about flights as well as related weather features\n* _Optionally_ deploys the trained model to Kubeflow Serving\n\n![Flight Delays Pipeline](docs/source/images/flight-delays-pipeline.png)\n\n### Configuring the local development environment\n\nIt's highly recommended to create a dedicated and consistent Python environment for running the notebooks in this repository:\n\n1. Install [Anaconda](https://docs.anaconda.com/anaconda/install/)\n   or [Miniconda](https://docs.conda.io/en/latest/miniconda.html)\n1. Navigate to your local copy of this repository.\n1. Create an Anaconda environment from the `yaml` file in the repository:\n    ```console\n    $ conda env create -f flight-delays-env.yaml\n    ```\n1. Activate the new environment:\n    ```console\n    $ conda activate flight-delays-env\n    ```\n1. If running JupyterLab and Elyra for the first time, build the extensions:\n    ```console\n    $ jupyter lab build\n    ```\n1. Launch JupyterLab:\n    ```console\n    $ jupyter lab\n    ```\n\n### Configuring a Kubeflow Pipeline runtime\n\n[Elyra's Notebook pipeline visual editor](https://elyra.readthedocs.io/en/latest/getting_started/overview.html#notebook-pipelines-visual-editor)\ncurrently supports running these pipelines in a Kubeflow Pipeline runtime.  If required, these are\n[the steps to install a local deployment of KFP](https://elyra.readthedocs.io/en/latest/recipes/deploying-kubeflow-locally-for-dev.html).\n\nAfter installing your Kubeflow Pipeline runtime, use the command below (with proper updates) to configure the new\nKFP runtime with Elyra.\n\n```bash\nelyra-metadata install runtimes --replace=true \\\n       --schema_name=kfp \\\n       --name=kfp_runtime \\\n       --display_name=\"Kubeflow Pipeline Runtime\" \\\n       --api_endpoint=http://[host]:[api port]/pipeline \\\n       --cos_endpoint=http://[host]:[cos port] \\\n       --cos_username=[cos username] \\\n       --cos_password=[cos password] \\\n       --cos_bucket=flights\n```\n\n**Note:** The cloud object storage endpoint above assumes a local minio object storage but other cloud-based object storage services could be configured and used in this scenario.\n\nIf using the default minio storage - following the local Kubeflow installation instructions above - the arguments should be `--cos_endpoint=http://minio-service:9000`, `--cos_username=minio`, `--cos_password=minio123`. The api endpoint for local Kubeflow Pipelines would then be `--api_endpoint=http://127.0.0.1:31380/pipeline`.\n\n**Don't forget to setup port-forwarding for the KFP ML Pipelines API service and Minio service as per the above instructions.**\n\n## Elyra Notebook pipelines\n\nElyra provides a visual editor for building Notebook-based AI pipelines, simplifying the conversion of \nmultiple notebooks into batch jobs or workflows. By leveraging cloud-based resources to run their \nexperiments faster, the data scientists, machine learning engineers, and AI developers are then more productive,\nallowing them to spend their time using their technical skills.\n\n![Notebook pipeline](https://raw.githubusercontent.com/elyra-ai/community/master/resources/blog-announcement/elyra-pipelines.gif)\n\n### Running the Elyra pipeline\n\nThe Elyra pipeline `flight_delays.pipeline`, which is located in the `pipelines` directory, can be run by clicking\non the `play` button as seen on the image above. The `submit` dialog will request two inputs from the user: a name \nfor the pipeline and a runtime to use while executing the pipeline.\n\nThe list of available runtimes comes from the registered Kubeflow Pipelines runtimes documented above and includes a `Run in-place locally` option for local execution.\n\n#### Local execution\n\nIf running locally, the notebooks are executed and updated in-place. You can track the progress in the terminal screen where you ran `jupyter lab`. The downloaded and processed datasets will be available locally in `notebooks/data` in this case.\n\n#### Kubeflow Pipelines execution\n\nAfter submitting the pipeline to Kubeflow Pipelines, Elyra will show a dialog with a direct link to where the experiment is being executed on Kubeflow Piplines.\n\nThe user can access the pipelines, and respective experiment runs, via the `api_endpoint` of the Kubeflow Pipelines\nruntime (e.g. `http://[host]:[port]/pipeline`)\n\n![Pipeline experiment run](docs/source/images/kfp-experiment.png)\n\nThe output from the executed experiments are then available in the associated `object storage`\nand the executed notebooks are available as native `.ipynb` notebooks and also in `html` format\nto facilitate the visualization and sharing of the results.\n\n![Pipeline experiment results in object storage](docs/source/images/object-storage-results.png)\n\n\n### Running the Elyra pipeline with model deployment to Kubeflow Serving\n\nPlease follow the [instructions](kfserving.md) for running the pipeline `flight_delays_with_deployment.pipeline`, which adds a node at the end of the pipeline for deploying the model to [KFServing](https://www.kubeflow.org/docs/components/serving/kfserving/).\n\n### References\n\nFind more project details on [Elyra's GitHub](https://github.com/elyra-ai/elyra) or watching the\n[Elyra demo](https://www.youtube.com/watch?v=Nj0yga6T4U8).","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodait%2Fflight-delay-notebooks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodait%2Fflight-delay-notebooks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodait%2Fflight-delay-notebooks/lists"}