{"id":13651949,"url":"https://github.com/michaelosthege/fairflow","last_synced_at":"2025-04-11T17:32:04.335Z","repository":{"id":57428216,"uuid":"90883508","full_name":"michaelosthege/fairflow","owner":"michaelosthege","description":"Functional Airflow DAG definitions.","archived":false,"fork":false,"pushed_at":"2017-07-04T14:25:26.000Z","size":40,"stargazers_count":38,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-12T09:15:37.943Z","etag":null,"topics":["airflow","apache-airflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michaelosthege.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-10T16:05:19.000Z","updated_at":"2022-03-03T23:51:43.000Z","dependencies_parsed_at":"2022-09-02T18:30:19.083Z","dependency_job_id":null,"html_url":"https://github.com/michaelosthege/fairflow","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelosthege%2Ffairflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelosthege%2Ffairflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelosthege%2Ffairflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelosthege%2Ffairflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michaelosthege","download_url":"https://codeload.github.com/michaelosthege/fairflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248449875,"owners_count":21105581,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","apache-airflow"],"created_at":"2024-08-02T02:00:53.672Z","updated_at":"2025-04-11T17:32:03.935Z","avatar_url":"https://github.com/michaelosthege.png","language":"Python","funding_links":[],"categories":["Libraries, Hooks, Utilities"],"sub_categories":[],"readme":"# fairflow\nPythonically functional DAG-definitions.\n\n## Why would you want to use `fairflow`?\nDAGs are often made up of tasks that are functionally separated, for example ETL jobs, data analysis and reporting. If you are writing a new reporting tasks, should you really worry about dependencies in the ETL jobs?\n\nAlso, they are usually built from upstream to downstream which makes it hard to share substructures. By functionally building them you can start thinking downstream-upstream and also share substructures accross modules.\n\nRelated: [Streamlined (Functional) Airflow in the Wiki](https://cwiki.apache.org/confluence/display/AIRFLOW/Streamlined+%28Functional%29+Airflow)\n\n## How does it work?\nIn pure `airflow` you would construct a DAG by instantiating a bunch of `Operators` and then setting their relationships. However they require a `DAG` instance for instantiation! (It can only be inferred by one level.)\n\nIn `fairflow` you construct the DAG from a bunch of `FOperators` that only instantiate the required `Operators` when you call the last FOperator on a `DAG` instance.\n\nThe result is a `DAG` definition that is __exactly the same as the one you had before__, but now you can re-use and import them from other packages.\n\n## Show me the code!\nThe core of `fairflow` is the tiny abstract base class `FOperator` that can be inherited to make functional airflow operator definitions. It takes care of instantiating your airflow operators and setting their dependencies.\n\nThe following `Compare` `FOperator` for example will create a `PythonOperator` that executes a callable and uses `xcom_pull` to get the return values of upstream tasks. To make it work with an arbitrary number of upstream model tasks, we can feed a list of `FOperator` instances to its constructor.\n\n```python\nclass Compare(fairflow.FOperator):\n    \"\"\"A task that compare the LinearModel with the PolynomialModel. Returns: pandas.DataFrame\"\"\"\n    def __init__(self, fops_models, id=None):\n        self.fops_models = fops_models\n        return super().__init__(id)\n\n    @staticmethod\n    def compare(**context):\n        \"\"\"Accumulates the results of upstream tasks into a DataFrame\"\"\"\n        task_ids = fairflow.utils.get_param(\"model_taskids\", context)\t\t\t\t# get the task ids of the upstream tasks\n        comparison = pandas.DataFrame(columns=[\"modelname\", \"result\"])\n        for task_id in task_ids:\n            modelresult = context[\"ti\"].xcom_pull(task_id)\t\t\t\t\t# pull the return value of the upstream task\n            comparison.loc[-1] = task_id, modelresult\n        return comparison\n\n    def call(self, dag):\n        \"\"\"Instantiate upstream tasks, this task and set dependencies. Returns: task\"\"\"\n        model_tasks = [\t\t\t\t\t# instantiate tasks for running the different models\n            f(dag)                      # by calling their FOperators on the current `dag`\n            for f in self.fops_models\t# notice that we do not know about the models upstream dependencies!\n        ]\n        t = python_operator.PythonOperator(\n            task_id=self.__class__.__name__,\n            python_callable=self.compare,\n            provide_context=True,\n            templates_dict={\n                \"model_taskids\": [mt.task_id for mt in model_tasks]\n            },\n            dag=dag\n        )\n        t.set_upstream(model_tasks)\n        return t\n```\n\nIn your DAG definition file, you create an instance of the task you want to get done.\n\n```python\nf_linear = LinearModel()\nf_poly = PolynomialModel(degree=3)\nf_compare = Compare([f_linear, f_poly])\n```\n\nThen you create a `DAG` like you would usually do and call your task on the `DAG`:\n\n```python\ndag = airflow.DAG('model-comparison',\n    default_args=default_args,\n    schedule_interval=None\n)\n\nt_compare = f_compare(dag)\n```\n\nAnd that's how you functionally define a workflow.\n\nDid you notice that in our DAG-definition, we did not explicitly instantiate the `Dataset` task? The `LinearModel.call` or `PolynomialModel.call` methods did that on their own. So we do not need to care about the models dependencies and can focus on comparing them.\n\n## Testing\nThe repository comes with an example DAG (`example_models`) and another one (`test_fairflow`) that runs some testing.\n\n\u003cfigure\u003e\n    \u003cimg src=\"example_models.png\"\u003e\n    \u003cfigcaption\u003e\u003cb\u003eThe example-DAG:\u003c/b\u003e Both models depend on the same dataset and are compared.\u003c/figcaption\u003e\n\u003c/figure\u003e\n\n\u003cfigure\u003e\n    \u003cimg src=\"test_dag.png\"\u003e\n    \u003cfigcaption\u003e\u003cb\u003eThe test-DAG:\u003c/b\u003e The leftmost tasks return json-dumpsed integers (1,2,3) and the ones in the middle xcom_pull those return values to apply some operations.\u003c/figcaption\u003e\n\u003c/figure\u003e\n\nAfter activating your virtual environment with `airflow[mysql]` installed, you can `cd` to the repository and run the following scripts:\n\n```bash\nbash run_airflow-setup.sh\nbash run_webserver.sh\nbash run_scheduler.sh\n```\n\nThey will use the repository folder as `AIRFLOW_HOME`.\n\n\n## FAQ\n__What if two `FOperator` classes have the same upstream dependencies?__\n\nThe task will only be instantiated once, because the `FOperator.call` method caches all tasks in a dictionary by their `task_id`.\nThis means that Bob and Charlie can independently depend on Alice and Daniel can still merge Bob's and Charlie's work into the same DAG by simply importing their `FOperator` definitions.\n\n__How is the resulting DAG different to the one I have right now?__\n\n`fairflow` only matters at DAG-definition time and the resulting DAG is identical to the one you get by instantiating all tasks in the same file.\n\n__How do I get it?__\n\n```bash\npip install fairflow\n```\n\n__What if ...?__\n\nOpen an issue and let's have a discussion about it!\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichaelosthege%2Ffairflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichaelosthege%2Ffairflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichaelosthege%2Ffairflow/lists"}