https://github.com/os-climate/osc-ingest-tools
python tools to assist with standardized data ingestion workflows
https://github.com/os-climate/osc-ingest-tools
pandas python sql sqlalchemy trino trinodb
Last synced: 6 months ago
JSON representation
python tools to assist with standardized data ingestion workflows
- Host: GitHub
- URL: https://github.com/os-climate/osc-ingest-tools
- Owner: os-climate
- License: apache-2.0
- Created: 2021-10-28T16:00:06.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2025-04-07T18:07:20.000Z (7 months ago)
- Last Synced: 2025-04-07T19:24:31.222Z (7 months ago)
- Topics: pandas, python, sql, sqlalchemy, trino, trinodb
- Language: Python
- Homepage:
- Size: 328 KB
- Stars: 7
- Watchers: 2
- Forks: 10
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# osc-ingest-tools
python tools to assist with standardized data ingestion workflows
## Installation, Usage, and Release Management
### Install from PyPi
```console
pip install osc-ingest-tools
```### Examples
```python
>>> from osc_ingest_trino import *>>> import pandas as pd
>>> data = [['tom', 10], ['nick', 15], ['juli', 14]]
>>> df = pd.DataFrame(data, columns = ['First Name', 'Age In Years']).convert_dtypes()
>>> df
First Name Age In Years
0 tom 10
1 nick 15
2 juli 14>>> enforce_sql_column_names(df)
first_name age_in_years
0 tom 10
1 nick 15
2 juli 14>>> enforce_sql_column_names(df, inplace=True)
>>> df
first_name age_in_years
0 tom 10
1 nick 15
2 juli 14>>> df.info(verbose=True)
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 first_name 3 non-null string
1 age_in_years 3 non-null Int64
dtypes: Int64(1), string(1)
memory usage: 179.0 bytes>>> p = create_table_schema_pairs(df)
>>> print(p)
first_name varchar,
age_in_years bigint>>>
```#### Adding custom type mappings to `create_table_schema_pairs`
```python
>>> df = pd.DataFrame(data, columns = ['First Name', 'Age In Years'])>>> enforce_sql_column_names(df, inplace=True)
>>> df.info(verbose=True)
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 first_name 3 non-null object
1 age_in_years 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes>>> p = create_table_schema_pairs(df, typemap={'object':'varchar'})
>>> print(p)
first_name varchar,
age_in_years bigint>>>
```### Development
Patches may be contributed via pull requests to
.All changes must pass the automated test suite, along with various static
checks.[Black](https://black.readthedocs.io/) code style and
[isort](https://pycqa.github.io/isort/) import ordering are enforced.Enabling automatic formatting via [pre-commit](https://pre-commit.com/) is
recommended:```console
pip install black isort pre-commit
pre-commit install
```To ensure compliance with static check tools, developers may wish to run;
```console
pip install black isort
# auto-sort imports
isort .
# auto-format code
black .
```Code can then be tested using tox:
```console
=======
# run static checks and tests
tox
# run only tests
tox -e py3
# run only static checks
tox -e static
# run tests and produce a code coverage report
tox -e cov
```### Releasing
To release a new version of this library, authorized developers should;
- Prepare a signed release commit updating `version` in setup.py
- Tag the commit using [Semantic Versioning](https://semver.org/spec/v2.0.0.html)
prepended with "v"
- Push the tagE.g.,
```console
git commit -sm "Release v0.3.4"
git tag v0.3.4
git push --follow-tags
```A Github workflow will then automatically release the version to PyPI.