{"id":15107059,"url":"https://github.com/rafaeljurkfitz/etl-excel","last_synced_at":"2026-01-31T05:01:56.459Z","repository":{"id":234308681,"uuid":"788626446","full_name":"rafaeljurkfitz/etl-excel","owner":"rafaeljurkfitz","description":"A study case of develop a simple etl project to convert excel files into a single one.","archived":false,"fork":false,"pushed_at":"2024-12-04T17:01:31.000Z","size":1019,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-17T01:32:28.200Z","etag":null,"topics":["ci-cd","data-engineering","data-science","etl-pipeline","excel","lvgalvao","mkdocs","pep8","poetry","precommit-hooks","pyenv-virtualenv","pytest","taskipy"],"latest_commit_sha":null,"homepage":"https://rafaeljurkfitz.github.io/etl-excel/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rafaeljurkfitz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-04-18T19:24:24.000Z","updated_at":"2024-12-04T17:01:36.000Z","dependencies_parsed_at":"2024-04-18T20:40:46.574Z","dependency_job_id":"fa63678e-7f1f-469c-9390-a952387e37d0","html_url":"https://github.com/rafaeljurkfitz/etl-excel","commit_stats":{"total_commits":28,"total_committers":2,"mean_commits":14.0,"dds":0.0714285714285714,"last_synced_commit":"f49534b745dff03070460c03f1eb3c2f9860fb73"},"previous_names":["rafaeljurkfitz/etl-excel"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rafaeljurkfitz/etl-excel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaeljurkfitz%2Fetl-excel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaeljurkfitz%2Fetl-excel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaeljurkfitz%2Fetl-excel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaeljurkfitz%2Fetl-excel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rafaeljurkfitz","download_url":"https://codeload.github.com/rafaeljurkfitz/etl-excel/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rafaeljurkfitz%2Fetl-excel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28929862,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T04:05:25.756Z","status":"ssl_error","status_checked_at":"2026-01-31T04:02:35.005Z","response_time":128,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ci-cd","data-engineering","data-science","etl-pipeline","excel","lvgalvao","mkdocs","pep8","poetry","precommit-hooks","pyenv-virtualenv","pytest","taskipy"],"created_at":"2024-09-25T21:04:08.415Z","updated_at":"2026-01-31T05:01:56.441Z","avatar_url":"https://github.com/rafaeljurkfitz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ETL Excel\n\n![Flow](docs/static/fluxo.png)\n\n## About the Project 🗃️\n\nThis repository aims to serve as a portfolio. The goal is to demonstrate the benefits of software development best practices in the data field and provide a standardized structure to start engineering, science, and data analysis projects.\n\n**The main focus is on best practices, automation, testing, and documentation.**\n\n### Requirements 🚧\n\nThere are two things to set up before starting any Python project:\n\n- Python version control.\n- Package and virtual environment management.\n\n#### Pyenv 🔖\n\n```Pyenv``` allows you to manage **multiple Python versions on the same system**, ensuring you can use the correct version for each project.\n\n#### Poetry 📦\n\n```Poetry``` is a tool for managing **dependencies**, **virtual environments**, and Python project packaging.\n\n**Advantages of Poetry:**\n\n- Centralized management in the ```pyproject.toml``` file.\n- Automatic creation of isolated virtual environments.\n- Simplified installation flow.\n\n**Poetry automatically uses the Python version configured locally in the project via Pyenv to ensure seamless integration between the tools.**\n\n### Dependencies ➕\n\n#### Project Dependencies 🔧\n\nThese are the essential dependencies required for the project to run. They include libraries for processing and handling Excel files.\n\n- ```pandas```: Library for data analysis and manipulation.\n- ```openpyxl```: Library for reading and writing Excel files.\n\n#### Development Dependencies 💻\n\nThese dependencies are needed during project development, such as tools for code formatting, linting, and task automation.\n\n- ```taskipy```: For automating tasks like running scripts and tests.\n- ```pre-commit```: For configuring pre-commit hooks to ensure the code adheres to project conventions.\n- ```pip-audit```: For auditing dependencies and checking for vulnerabilities.\n- ```pydocstyle```: To check code documentation style.\n- ```blue```: Code formatter similar to Black.\n- ```isort```: For consistently organizing imports.\n- ```loguro```: For logging.\n\n#### Testing Dependencies 🧪\n\nThese dependencies are required for running the project tests, such as the testing framework and its plugins.\n\n- ```pytest```: Framework for writing and running automated tests.\n\n#### Documentation Dependencies 📚\n\nThese dependencies are used to generate and serve the project documentation. They include tools for building documentation sites and generating dynamic content.\n\n- ```mkdocstrings-python```: For rendering Python docstrings in documentation generated by MkDocs.\n- ```pygments```: For syntax highlighting in the documentation.\n- ```pymdown-extensions```: Extensions for MkDocs, enabling advanced Markdown usage.\n- ```mkdocs-bootstrap386```: Bootstrap theme for MkDocs.\n- ```mkdocs-material```: Material theme for MkDocs.\n- ```mkdocs```: Tool for creating documentation websites using Markdown.\n\n### Installation and Configuration\n\n1. Clone the repository:\n\n    ```bash\n    git clone https://github.com/rafaeljurkfitz/etl-excel.git\n    cd etl-excel\n    ```\n\n2. Set up the correct Python version using `pyenv`:\n\n    ```bash\n    pyenv install 3.12.0\n    pyenv local 3.12.0\n    ```\n\n3. Configure Poetry for Python version 3.12.0 and activate the virtual environment:\n\n    ```bash\n    poetry env use 3.12.0\n    poetry shell\n    ```\n\n4. Install the project dependencies:\n\n    ```bash\n    poetry install\n    ```\n\n5. Run the tests to ensure everything is correct and working:\n\n    ```bash\n    task test\n    ```\n\n6. Run the command to view the project documentation:\n\n    ```bash\n    task doc\n    ```\n\n7. Start the pipeline execution by running the command to initiate the ETL:\n\n    ```bash\n    task run\n    ```\n\n8. Check the ```data/output``` folder path to ensure the generated file is correct.\n\n## Contact\n\nFor questions, suggestions, or feedback:\n\n- **Rafael Jurkfitz** - [rjurkfitz@gmail.com](mailto:rjurkfitz@gmail.com)\n\n## License\n\nThis project is licensed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frafaeljurkfitz%2Fetl-excel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frafaeljurkfitz%2Fetl-excel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frafaeljurkfitz%2Fetl-excel/lists"}