{"id":27443886,"url":"https://github.com/josephmachado/e2e_datapipeline_test","last_synced_at":"2025-04-15T02:58:05.034Z","repository":{"id":115180622,"uuid":"429805128","full_name":"josephmachado/e2e_datapipeline_test","owner":"josephmachado","description":"Example repo to create end to end tests for data pipeline.","archived":false,"fork":false,"pushed_at":"2024-06-14T18:14:51.000Z","size":2168,"stargazers_count":23,"open_issues_count":0,"forks_count":9,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-15T02:58:01.447Z","etag":null,"topics":["aws","dataengineering","moto","pytest","python3","testing"],"latest_commit_sha":null,"homepage":"https://www.startdataengineering.com/post/setting-up-e2e-tests/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/josephmachado.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-19T13:21:15.000Z","updated_at":"2025-03-30T09:37:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"f8668514-0c9d-49e5-88ca-14f46f82f6c6","html_url":"https://github.com/josephmachado/e2e_datapipeline_test","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/josephmachado%2Fe2e_datapipeline_test","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/josephmachado%2Fe2e_datapipeline_test/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/josephmachado%2Fe2e_datapipeline_test/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/josephmachado%2Fe2e_datapipeline_test/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/josephmachado","download_url":"https://codeload.github.com/josephmachado/e2e_datapipeline_test/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248997095,"owners_count":21195797,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","dataengineering","moto","pytest","python3","testing"],"created_at":"2025-04-15T02:58:04.472Z","updated_at":"2025-04-15T02:58:05.028Z","avatar_url":"https://github.com/josephmachado.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n* [End to End data pipeline test](#end-to-end-data-pipeline-test)\n    * [Architecture](#architecture)\n    * [Run on codespaces](#run-on-codespaces)\n    * [Prerequisites \u0026 Setup](#prerequisites--setup)\n    * [Run tests](#run-tests)\n    * [Tear down](#tear-down)\n\n## End to End data pipeline test\n\nCode for the post [Setting up end-to-end tests for cloud data pipelines](https://www.startdataengineering.com/post/setting-up-e2e-tests/)\n\n### Architecture\n\nThis is what our data pipeline architecture looks like.\n\n![Architecture](/assets/images/arch.png)\n\nFor our local setup, we will use\n\n1. Open source sftp server\n2. Moto server to mock S3 and Lambda\n3. Postgres as a substitute for AWS Redshift\n\n![Local Architecture](/assets/images/arch-lcl.png)\n\n### Run on codespaces\n\nYou can run this data pipeline using GitHub codespaces. Follow the instructions below.\n\n1. Create codespaces by going to the **[e2e_datapipeline_test](https://github.com/josephmachado/e2e_datapipeline_test)** repository, cloning(or fork) it and then clicking on `Create codespaces on main` button.\n2. Wait for codespaces to start and for codespaces to automatically install the libraries in `requirements.txt`, then in the terminal type `make up \u0026\u0026 export PYTHONPATH=${PYTHONPATH}:./src `.\n3. Wait for the above to complete.\n4. Now you can run our event pipeline end to end test using `pytest` command and you can clean up code with the `make ci` command\n\n**NOTE**: The screenshots below, show the general process to start codespaces, please follow the instructions shown above for this project.\n\n![codespace start](./assets/images/cs1.png)\n![codespace make up](./assets/images/cs2.png)\n\n**Note** Make sure to switch off codespaces instance, you only have limited free usage; see docs [here](https://github.com/features/codespaces#pricing).\n\n### Prerequisites \u0026 Setup\n\nTo run, you will need\n\n1. [Docker](https://docs.docker.com/engine/install/)\n2. [Python3.6 or above](https://www.python.org/downloads/)\n\nClone, create a virtual env, set up python path, spin up containers and run tests as shown below.\n\n```bash\ngit clone https://github.com/josephmachado/e2e_datapipeline_test.git\ncd e2e_datapipeline_test\npython -m venv ./env\nsource env/bin/activate # use virtual environment\npip install -r requirements.txt\nmake up # spins up the SFTP, Motoserver, Warehouse docker containers\nexport PYTHONPATH=${PYTHONPATH}:./src # set path to enable imports\n```\n\n### Run tests\n\nWe can run our tests using `pytest`.\n\n```bash\npytest # runs all tests under the ./test folder\n```\n\nClean up\n\n```bash\nmake ci\n```\n\n### Tear down\n\n```bash\nmake down # spins down the docker containers\ndeactivate # stop using the virtual environment\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjosephmachado%2Fe2e_datapipeline_test","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjosephmachado%2Fe2e_datapipeline_test","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjosephmachado%2Fe2e_datapipeline_test/lists"}