{"id":15914348,"url":"https://github.com/ggeop/multiple-fields-management","last_synced_at":"2026-05-01T21:33:06.579Z","repository":{"id":102404072,"uuid":"540400349","full_name":"ggeop/Multiple-Fields-Management","owner":"ggeop","description":"Fields management from/to different data sources. :bulb:","archived":false,"fork":false,"pushed_at":"2022-09-23T11:08:01.000Z","size":4,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-08T17:33:46.802Z","etag":null,"topics":["data","data-engineering","data-organization","data-retrieval","data-science","pandas","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ggeop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-23T11:04:39.000Z","updated_at":"2022-09-23T11:22:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"afbef29e-b020-4e54-a803-f0b2b3866b81","html_url":"https://github.com/ggeop/Multiple-Fields-Management","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FMultiple-Fields-Management","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FMultiple-Fields-Management/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FMultiple-Fields-Management/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ggeop%2FMultiple-Fields-Management/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ggeop","download_url":"https://codeload.github.com/ggeop/Multiple-Fields-Management/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246930024,"owners_count":20856524,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-engineering","data-organization","data-retrieval","data-science","pandas","python"],"created_at":"2024-10-06T17:01:51.458Z","updated_at":"2025-10-25T04:42:48.077Z","avatar_url":"https://github.com/ggeop.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"This repo presents an idea of fields management from/to different data sources.\n\n# Requirements :triangular_ruler:\n* Python 3.8.x (Tested in Python 3.8.5)\n* Python Pandas\n\n# Target Audience :loudspeaker:\n* Data Engineers\n* Data Scientists\n* Data Analysts\n* Software Engineers\n\n# Problem :confused:\nA very common headache in data processing from multiple sources is the managment of column names.\nIn a nutshell, few problems are:\n\n* **Lack of naming convention**: Column names are not following the same convention or any convention at all.\n* **Unefficient refactoring**: Hardcoded strings are all over the project. In refactoring, its possible to forgot to update something.\n* **Column dublications**: Column names are not in the same place, so its possible the user will use similar column names for the same thing.\n\n# Solution :ok_hand:\nOrganize all the fields (column names) in a cetral place by using a Python object (class).\nThe benefits from this strategy is the elimination of the above problems and much more (e.g adds fields logic)!!\nA Python class gives the ability to use methods and also bound other metadata with fields (e.g type).\n\n## An example :yum:\nBelow there is a simple example, but you can find more code examples in `examples.ipynb` Notebook\n\nLet's say that we have a pandas dataframe with one column.\n\n```{python}\ninput_df = pd.DataFrame({'dummy_column_1': [1,2,3,4]})\n```\nOutput:\n```\n+----+----------------+\n|    | dummy_column_1 |\n|----+----------------|\n|  0 |              1 |\n|  1 |              2 |\n|  2 |              3 |\n|  3 |              4 |\n+----+----------------+\n```\n\nNow, we want to create a new column `dummy_column_2`. The first step, is to create a new entity in `Field` class. For example:\n```\ndummy_field_2 = field(input_name='dummy_column_2', exported_name='Dummy Column 2')\n```\n\nThen we can use the new field in our dataframe.\n\n```{python}\nimport Field\ndf[Fields.dummy_field_2.input_name] = 2\n```\nOutput:\n```\n+----+----------------+------------------+\n|    | dummy_column_1 |   dummy_column_2 |\n|----+----------------+------------------|\n|  0 |              1 |                2 |\n|  1 |              2 |                2 |\n|  2 |              3 |                2 |\n|  3 |              4 |                2 |\n+----+----------------+------------------+\n```\n\nThen at the end, after we have finished with the processing steps, we can rename the fields before we export the dataframe by leveraging field functions.\n\n\n```{python}\nfield_renames = Field.get_renames()\ndf = df.rename(columns={field: field_renames[field] for field in input_df.columns})\n```\nOutput:\n```\n+----+----------------+------------------+\n|    |   Dummy Column |   Dummy Column 2 | \u003c---- New column names\n|----+----------------+------------------|\n|  0 |              1 |                2 |\n|  1 |              2 |                2 |\n|  2 |              3 |                2 |\n|  3 |              4 |                2 |\n+----+----------------+------------------+\n```\n\n----\n\n# Contributing\n* Pull Requests (PRs) are welcome ☺️!\n\n# Thanks!\nThanks for your time! This repo is just an idea, the above logic can be expanded and change the way you handle multiple data sources! Cheers :beers:","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fggeop%2Fmultiple-fields-management","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fggeop%2Fmultiple-fields-management","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fggeop%2Fmultiple-fields-management/lists"}