{"id":24692945,"url":"https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest","last_synced_at":"2025-10-28T10:06:44.911Z","repository":{"id":217347245,"uuid":"743648684","full_name":"ivanildobarauna-dev/data-pipeline-sync-ingest","owner":"ivanildobarauna-dev","description":"ETL Process for Currency Quotes Data\" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.","archived":false,"fork":false,"pushed_at":"2025-08-26T15:08:05.000Z","size":7192,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-25T03:33:24.091Z","etag":null,"topics":["business-intelligence","data-analysis","data-analytics","data-engineering","data-pipeline","data-visualization","etl-pipeline","python"],"latest_commit_sha":null,"homepage":"https://ivanildobarauna-dev.github.io/data-pipeline-sync-ingest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ivanildobarauna-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/funding.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["IvanildoBarauna"]}},"created_at":"2024-01-15T17:27:03.000Z","updated_at":"2025-09-12T19:57:23.000Z","dependencies_parsed_at":"2024-10-23T02:18:09.340Z","dependency_job_id":"aa3d481d-6eed-4d4e-b08c-9544dd44cf7c","html_url":"https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest","commit_stats":null,"previous_names":["ivanildobarauna/etl-awesome-api","ivdatahub/data-consumer-api","ivanildobarauna-dev/data-consumer-api","ivanildobarauna-dev/data-pipeline-sync-ingest"],"tags_count":27,"template":false,"template_full_name":null,"purl":"pkg:github/ivanildobarauna-dev/data-pipeline-sync-ingest","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivanildobarauna-dev%2Fdata-pipeline-sync-ingest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivanildobarauna-dev%2Fdata-pipeline-sync-ingest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivanildobarauna-dev%2Fdata-pipeline-sync-ingest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivanildobarauna-dev%2Fdata-pipeline-sync-ingest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ivanildobarauna-dev","download_url":"https://codeload.github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ivanildobarauna-dev%2Fdata-pipeline-sync-ingest/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281418472,"owners_count":26497805,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-28T02:00:06.022Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["business-intelligence","data-analysis","data-analytics","data-engineering","data-pipeline","data-visualization","etl-pipeline","python"],"created_at":"2025-01-26T20:17:53.229Z","updated_at":"2025-10-28T10:06:44.884Z","avatar_url":"https://github.com/ivanildobarauna-dev.png","language":"Python","funding_links":["https://github.com/sponsors/IvanildoBarauna"],"categories":[],"sub_categories":[],"readme":"# Data Consumer API Project: ETL Process for Currency Quotes Data\n\nETL Process for Currency Quotes Data\" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.\n\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ivanildobarauna-dev/data-pipeline-sync-ingest)\n\n![Project Status](https://img.shields.io/badge/status-done-brightgreen?style=for-the-badge\u0026logo=github)\n![License](https://img.shields.io/badge/license-MIT-blue?style=for-the-badge\u0026logo=mit)\n![GitHub release (latest by date)](https://img.shields.io/github/v/release/ivanildobarauna-dev/data-pipeline-sync-ingest?style=for-the-badge\u0026logo=github)\n![Python Version](https://img.shields.io/badge/python-3.10-blue?style=for-the-badge\u0026logo=python)\n\n![Black](https://img.shields.io/badge/code%20style-black-000000.svg?style=for-the-badge\u0026logo=python)\n![pylint](https://img.shields.io/badge/pylint-10.00-green?style=for-the-badge\u0026logo=python)\n\n[![CI-CD](https://img.shields.io/github/actions/workflow/status/ivanildobarauna-dev/data-pipeline-sync-ingest/CI-CD.yaml?\u0026style=for-the-badge\u0026logo=githubactions\u0026cacheSeconds=60\u0026label=CI-CD)](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/actions/workflows/CI-CD.yml)\n[![DOCKER-DEPLOY](https://img.shields.io/github/actions/workflow/status/ivanildobarauna-dev/data-pipeline-sync-ingest/deploy-image.yml?\u0026style=for-the-badge\u0026logo=githubactions\u0026cacheSeconds=60\u0026label=DOCKER-DEPLOY)](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/actions/workflows/deploy-image.yml)\n\n[![Codecov](https://img.shields.io/codecov/c/github/ivanildobarauna-dev/data-pipeline-sync-ingest?style=for-the-badge\u0026logo=codecov)](https://app.codecov.io/gh/ivanildobarauna-dev/data-pipeline-sync-ingest)\n\n## Code Coverage KPI Graph\n\n[![codecov](https://codecov.io/gh/ivanildobarauna-dev/data-pipeline-sync-ingest/graphs/sunburst.svg?token=GEGNHFM6PS)](https://codecov.io/gh/ivanildobarauna-dev/data-pipeline-sync-ingest)\n\n## Project Stack\n\n\u003cimg src=\"https://github.com/devicons/devicon/blob/master/icons/python/python-original.svg\" Alt=\"Python\" width=\"50\" height=\"50\"\u003e \u003cimg src=\"https://github.com/devicons/devicon/blob/master/icons/docker/docker-original.svg\" Alt=\"Docker\" width=\"50\" height=\"50\"\u003e \u003cimg src=\"https://github.com/devicons/devicon/blob/master/icons/poetry/poetry-original.svg\" Alt=\"Poetry\" width=\"50\" height=\"50\"\u003e \u003cimg src=\"https://github.com/devicons/devicon/blob/master/icons/pandas/pandas-original.svg\" Alt=\"Pandas\" width=\"50\" height=\"50\"\u003e \u003cimg src=\"https://github.com/devicons/devicon/blob/master/icons/jupyter/jupyter-original.svg\" Alt=\"Jupyter\" width=\"50\" height=\"50\"\u003e \u003cimg src=\"https://github.com/devicons/devicon/blob/master/icons/matplotlib/matplotlib-original.svg\" Alt=\"Matplotlib\" width=\"50\" height=\"50\"\u003e \u003cimg src=\"https://github.com/devicons/devicon/blob/master/icons/githubactions/githubactions-original.svg\" Alt=\"GitHub Actions\" width=\"50\" height=\"50\"\u003e\n\n## Project description\n\nETL Process for Currency Quotes Data\" project is a complete solution dedicated to extracting, transforming and loading (ETL) currency quote data. This project uses several advanced techniques and architectures to ensure the efficiency and robustness of the ETL process.\n\n## Contributing\n\nSee the following docs:\n\n- [Contributing Guide](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/CONTRIBUTING.md)\n- [Code Of Conduct](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/CODE_OF_CONDUCT.md)\n\n## Project Highlights:\n\n- MVC Architecture: Implementation of the Model-View-Controller (MVC) architecture, separating business logic, user interface and data manipulation for better organization and code maintenance.\n\n- Comprehensive Testing: Development of tests to ensure the quality and robustness of the code at various stages of the ETL process\n\n- Parallelism in Models: Use of parallelism in the data transformation and loading stages, increasing efficiency and reducing processing time.\n\n- Fire-Forget Messaging: Use of messaging (queue.queue) in the fire-forget model to manage files generated between the transformation and loading stages, ensuring a continuous and efficient data flow.\n\n- Parameter Validation: Sending valid parameters based on the request data source itself, ensuring the integrity and accuracy of the information processed.\n\n- Configuration Management: Use of a configuration module to manage endpoints, retry times and number of attempts, providing flexibility and ease of adjustment.\n\n- Common Module: Implementation of a common module for code reuse across the project, promoting consistency and reducing redundancies.\n\n- Dynamic Views: Generation of views with index.html using nbConvert, based on consolidated data from a Jupyter Notebook that integrates the generated files into a single dataset for exploration and analysis.\n\n# ETL Process:\n\n- Extraction: A single request is made to a specific endpoint to obtain quotes from multiple currencies.\n- Transformation: The request response is processed, separating each currency quote and storing it in individual files in Parquet format, facilitating data organization and retrieval.\n- Upload: Individual Parquet files are consolidated into a single dataset using a Jupyter Notebook, allowing for comprehensive analysis and valuable insights into currency quotes.\n\nIn summary, this project offers a robust and efficient solution for collecting, processing and analyzing currency quote data, using advanced architecture and parallelism techniques to optimize each step of the ETL process.\n\n \u003cdetails\u003e\n \u003csummary\u003eRepository structure\u003c/summary\u003e\n\n- [`data/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/data): Stores raw data in Parquet format.\n  - ETH-EUR-1713658884.parquet: Example: Raw data for ETH-EUR quotes. file_name = symbol + extraction unix timestamp\n- [`notebooks/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/notebooks): Contains the `data_explorer.ipynb` notebook for data exploration.\n- [`etl/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl): Contains the project's source code.\n  - [`run.py`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/etl/run.py): Entrypoint of the application\n- [`common/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/common): Library for code reuse and standardization.\n  - [`utils/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/utils)\n    - [`logs.py`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/etl/utils/logs.py): Package for log management.\n  - [`common.py`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/etl/utils/common.py): Package for common code tasks like output directory retrieval or default timestamp.\n  - [`logs/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/common/logs): For storing debug logs.\n- [`controller/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/controller)\n  - [`pipeline.py`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/etl/controller/pipeline.py): Receives data extraction requests and orchestrates ETL models .\n- [`models/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/models):\n  - [`extract/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/models/extract)\n    - [`api_data_extractor.py`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/etl/models/extract/api_data_extractor.py): Receives the parameters from the controller, sends the request and returns in JSON.\n  - [`transform/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/models/transform)\n    - [`publisher.py`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/etl/models/transform/publisher.py): Receives the JSON from the extractor, separates the dictionary by currency and publishes each of them to a queue to be processed individually.\n  - [`load/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/models/load)\n    - [`parquet_loader.py`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/blob/main/etl/models/load/parquet_loader.py): In a separate thread, receive a new dictionary from queue that the transformer is publishing and generates .parquet files in the default directory.\n- [`views/`](https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest/tree/main/etl/views): For storing data analysis and visualization.\n\n\u003c/details\u003e\n\n \u003csummary\u003eHow to run the application locally\u003c/summary\u003e\n\n## Step by Step\n\nEnsure Python 3.10 or higher is installed on your machine\n\n- Clone the repository:\n\n```sh\n$ git clone https://github.com/ivanildobarauna-dev/data-pipeline-sync-ingest.git\n```\n\n- Go to directory\n\n```sh\n$ cd data-pipeline-sync-ingest\n```\n\n- Install dependencies and execute project\n\n```sh\n$ poetry install \u0026\u0026 poetry run python etl/run.py\n```\n\nLearn more about [`poetry`](https://python-poetry.org/)\n\n## ETL and Data Analysis Results:\n\nYou can see the complete data analysis, the Jupyter Notebook is deployed in [GitHub Pages](https://ivanildobarauna-dev.github.io/data-pipeline-sync-ingest/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fivanildobarauna-dev%2Fdata-pipeline-sync-ingest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fivanildobarauna-dev%2Fdata-pipeline-sync-ingest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fivanildobarauna-dev%2Fdata-pipeline-sync-ingest/lists"}