{"id":19233123,"url":"https://github.com/portfoliome/postpy","last_synced_at":"2025-06-28T19:39:20.771Z","repository":{"id":57454315,"uuid":"77242213","full_name":"portfoliome/postpy","owner":"portfoliome","description":"Postgresql utilities for ETL and data analysis","archived":false,"fork":false,"pushed_at":"2017-10-24T14:27:49.000Z","size":63,"stargazers_count":24,"open_issues_count":0,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-31T22:35:24.281Z","etag":null,"topics":["database","etl","postgres","postgresql","python-3","streaming"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/portfoliome.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-12-23T17:57:05.000Z","updated_at":"2023-04-01T19:59:23.000Z","dependencies_parsed_at":"2022-08-29T05:30:13.387Z","dependency_job_id":null,"html_url":"https://github.com/portfoliome/postpy","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/portfoliome/postpy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portfoliome%2Fpostpy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portfoliome%2Fpostpy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portfoliome%2Fpostpy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portfoliome%2Fpostpy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/portfoliome","download_url":"https://codeload.github.com/portfoliome/postpy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/portfoliome%2Fpostpy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262485513,"owners_count":23318636,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","etl","postgres","postgresql","python-3","streaming"],"created_at":"2024-11-09T16:09:06.864Z","updated_at":"2025-06-28T19:39:20.737Z","avatar_url":"https://github.com/portfoliome.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/portfoliome/postpy.svg?branch=master)](https://travis-ci.com/portfoliome/postpy)\n[![codecov.io](http://codecov.io/github/portfoliome/postpy/coverage.svg?branch=master)](http://codecov.io/github/portfoliome/postpy?branch=master)\n[![Code Health](https://landscape.io/github/portfoliome/postpy/master/landscape.svg?style=flat)](https://landscape.io/github/portfoliome/postpy/master)\n\n# postpy\nPostgresql utilities for ETL and data analysis.\n\n# Purpose\npostpy focuses on processes that typically arise from ETL processes and data analysis. Generally, these situtations arise when third-party data providers provide a default schema and handle data migration. The benefits over sqlalchemy are dml statements accepting iterable sequences, and upsert statements prior to sqlalchemy 1.1. While the library protects against SQL injection, ddl compiler functions do not check against things like reserved keywords.\n\n# Example Usage\n\nLet's say a third-party provider has given you a JSON schema file, all referring to different zipped data files.\n\nMocking out a single file load might look something like:\n\n```python\nimport csv\n\nfrom foil.fileio import DelimitedReader\nfrom foil.parsers import parse_str, parse_int, passthrough\n\nfrom postpy import dml\n\nENCODING = 'utf-8'\n\nclass DataDialect(csv.Dialect):\n    delimiter = '|'\n    quotechar = '\"'\n    lineterminator = '\\r\\n'\n    doublequote = False\n    quoting = csv.QUOTE_NONE\n\ndialect = DataDialect()\n\n# Gathering table/file attributes\n\ntablename = 'my_table'\nfields = DelimitedReader.from_zipfile(zip_path, filename, encoding=ENCODING,\n                                      dialect=dialect, fields=[], converters=[]).header\nfield_parsers = [parse_str, parse_int, passthrough, parse_it]  # would get through reflection or JSON file\n\n# loading one file and insert\nreader = DelimitedReader.from_zipfile(zip_path, filename, encoding=ENCODING,\n                                      dialect=dialect, fields=fields, converters=field_parsers)\n\n# Insert records by loading only 10,000 records/file lines into memory each iteration\ndml.insert_many(conn, tablename, fields, records=reader, chunksize=10000)\n```\n\nSince each process is very light-weight, each loader can reside on a micro-instance. Queues like RabbitMQ or SNS/SQS can be setup to handle message notifications between each process.\n\nInstead of worrying about async/threads, each miro-instance can handle a single table load and pass off a message upon completion.\n\n# Potential Near-term Plans\nThe ddl compilers maybe converted to sqlalchemy compilers to allow for greater flexibility in constraint definitions without adding code maintainability. Python 3.6's f-strings may be incorporated into the ddl compilers, breaking 3.5 compatibility.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fportfoliome%2Fpostpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fportfoliome%2Fpostpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fportfoliome%2Fpostpy/lists"}