{"id":18750441,"url":"https://github.com/os-climate/osc-ingest-tools","last_synced_at":"2025-04-12T23:32:06.348Z","repository":{"id":37675374,"uuid":"422267834","full_name":"os-climate/osc-ingest-tools","owner":"os-climate","description":"python tools to assist with standardized data ingestion workflows","archived":false,"fork":false,"pushed_at":"2025-04-07T18:07:20.000Z","size":336,"stargazers_count":7,"open_issues_count":13,"forks_count":10,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-07T19:24:31.222Z","etag":null,"topics":["pandas","python","sql","sqlalchemy","trino","trinodb"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/os-climate.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-28T16:00:06.000Z","updated_at":"2025-03-05T18:13:18.000Z","dependencies_parsed_at":"2023-02-19T20:35:19.321Z","dependency_job_id":"7b57a535-3587-4dc2-a19f-44b091133a4d","html_url":"https://github.com/os-climate/osc-ingest-tools","commit_stats":{"total_commits":51,"total_committers":5,"mean_commits":10.2,"dds":"0.23529411764705888","last_synced_commit":"848546ad221248b0125d87f4a0aae5823377ac58"},"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/os-climate%2Fosc-ingest-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/os-climate%2Fosc-ingest-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/os-climate%2Fosc-ingest-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/os-climate%2Fosc-ingest-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/os-climate","download_url":"https://codeload.github.com/os-climate/osc-ingest-tools/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248647257,"owners_count":21139081,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pandas","python","sql","sqlalchemy","trino","trinodb"],"created_at":"2024-11-07T17:11:52.040Z","updated_at":"2025-04-12T23:32:05.849Z","avatar_url":"https://github.com/os-climate.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# osc-ingest-tools\n\npython tools to assist with standardized data ingestion workflows\n\n## Installation, Usage, and Release Management\n\n### Install from PyPi\n\n```console\npip install osc-ingest-tools\n```\n\n### Examples\n\n```python\n\u003e\u003e\u003e from osc_ingest_trino import *\n\n\u003e\u003e\u003e import pandas as pd\n\n\u003e\u003e\u003e data = [['tom', 10], ['nick', 15], ['juli', 14]]\n\n\u003e\u003e\u003e df = pd.DataFrame(data, columns = ['First Name', 'Age In Years']).convert_dtypes()\n\n\u003e\u003e\u003e df\n  First Name  Age In Years\n0        tom            10\n1       nick            15\n2       juli            14\n\n\u003e\u003e\u003e enforce_sql_column_names(df)\n  first_name  age_in_years\n0        tom            10\n1       nick            15\n2       juli            14\n\n\u003e\u003e\u003e enforce_sql_column_names(df, inplace=True)\n\n\u003e\u003e\u003e df\n  first_name  age_in_years\n0        tom            10\n1       nick            15\n2       juli            14\n\n\u003e\u003e\u003e df.info(verbose=True)\n\u003cclass 'pandas.core.frame.DataFrame'\u003e\nRangeIndex: 3 entries, 0 to 2\nData columns (total 2 columns):\n #   Column        Non-Null Count  Dtype\n---  ------        --------------  -----\n 0   first_name    3 non-null      string\n 1   age_in_years  3 non-null      Int64\ndtypes: Int64(1), string(1)\nmemory usage: 179.0 bytes\n\n\u003e\u003e\u003e p = create_table_schema_pairs(df)\n\n\u003e\u003e\u003e print(p)\n    first_name varchar,\n    age_in_years bigint\n\n\u003e\u003e\u003e\n```\n\n#### Adding custom type mappings to `create_table_schema_pairs`\n\n```python\n\u003e\u003e\u003e df = pd.DataFrame(data, columns = ['First Name', 'Age In Years'])\n\n\u003e\u003e\u003e enforce_sql_column_names(df, inplace=True)\n\n\u003e\u003e\u003e df.info(verbose=True)\n\u003cclass 'pandas.core.frame.DataFrame'\u003e\nRangeIndex: 3 entries, 0 to 2\nData columns (total 2 columns):\n #   Column        Non-Null Count  Dtype\n---  ------        --------------  -----\n 0   first_name    3 non-null      object\n 1   age_in_years  3 non-null      int64\ndtypes: int64(1), object(1)\nmemory usage: 176.0+ bytes\n\n\u003e\u003e\u003e p = create_table_schema_pairs(df, typemap={'object':'varchar'})\n\n\u003e\u003e\u003e print(p)\n    first_name varchar,\n    age_in_years bigint\n\n\u003e\u003e\u003e\n```\n\n### Development\n\nPatches may be contributed via pull requests to\n\u003chttps://github.com/os-climate/osc-ingest-tools\u003e.\n\nAll changes must pass the automated test suite, along with various static\nchecks.\n\n[Black](https://black.readthedocs.io/) code style and\n[isort](https://pycqa.github.io/isort/) import ordering are enforced.\n\nEnabling automatic formatting via [pre-commit](https://pre-commit.com/) is\nrecommended:\n\n```console\npip install black isort pre-commit\npre-commit install\n```\n\nTo ensure compliance with static check tools, developers may wish to run;\n\n```console\npip install black isort\n# auto-sort imports\nisort .\n# auto-format code\nblack .\n```\n\nCode can then be tested using tox:\n\n```console\n=======\n# run static checks and tests\ntox\n# run only tests\ntox -e py3\n# run only static checks\ntox -e static\n# run tests and produce a code coverage report\ntox -e cov\n```\n\n### Releasing\n\nTo release a new version of this library, authorized developers should;\n\n- Prepare a signed release commit updating `version` in setup.py\n- Tag the commit using [Semantic Versioning](https://semver.org/spec/v2.0.0.html)\n  prepended with \"v\"\n- Push the tag\n\nE.g.,\n\n```console\ngit commit -sm \"Release v0.3.4\"\ngit tag v0.3.4\ngit push --follow-tags\n```\n\nA Github workflow will then automatically release the version to PyPI.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fos-climate%2Fosc-ingest-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fos-climate%2Fosc-ingest-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fos-climate%2Fosc-ingest-tools/lists"}