https://github.com/mikethoun/redshift-auto-schema
Redshift Auto Schema is a Python library that takes a delimited flat file or parquet file as input, parses it, and provides a variety of functions that allow for the creation and validation of tables within Amazon Redshift.
https://github.com/mikethoun/redshift-auto-schema
python3 redshift
Last synced: 5 months ago
JSON representation
Redshift Auto Schema is a Python library that takes a delimited flat file or parquet file as input, parses it, and provides a variety of functions that allow for the creation and validation of tables within Amazon Redshift.
- Host: GitHub
- URL: https://github.com/mikethoun/redshift-auto-schema
- Owner: mikethoun
- License: apache-2.0
- Created: 2019-06-24T18:36:47.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2021-06-21T16:35:44.000Z (about 5 years ago)
- Last Synced: 2025-08-28T17:53:14.726Z (10 months ago)
- Topics: python3, redshift
- Language: Python
- Size: 33.2 KB
- Stars: 29
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Redshift Auto Schema
Redshift Auto Schema is a Python library that takes a delimited flat file or parquet file as input, parses it, and provides a variety of functions that allow for the creation and validation of tables within Amazon Redshift. For each field, the appropriate Redshift data type is inferred from the contents of the file.
## Installation
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install Redshift Auto Schema.
```bash
pip install redshift-auto-schema
```
## Usage
```python
from redshift_auto_schema import RedshiftAutoSchema
import psycopg2 as pg
redshift_conn = pg.connect()
new_table = RedshiftAutoSchema(file='sample_file.parquet',
schema='test_schema',
table='test_table',
conn=redshift_conn)
if not new_table.check_table_existence():
ddl = new_table.generate_table_ddl()
with redshift_conn as conn:
with conn.cursor() as cur:
cur.execute(ddl)
```
## Methods
|NAME|DESCRIPTION|
|---|---|
|**get_column_list**|Returns column list based on header of file.|
|**check_schema_existence**|Checks Redshift for the existence of a schema.|
|**check_table_existence**|Checks Redshift for the existence of a table.|
|**generate_schema_ddl**|Returns a SQL statement that creates a Redshift schema.|
|**generate_schema_permissions**|Returns a SQL statement that grants schema usage to the default group.|
|**generate_table_ddl**|Returns a SQL statement that creates a Redshift table.|
|**generate_column_ddl**|Returns SQL statement(s) that adds missing column(s) a Redshift table.|
|**generate_table_permissions**|Returns a SQL statement that grants table read access to the default group.|
|**evaluate_table_ddl_diffs**|Returns a dataframe containing differences between generated and existing table DDL.|
## Contributing
Pull requests are welcome.
## License
[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)