Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vincentlaucsb/pgreaper

A Python library for loading data from various formats into PostgreSQL databases.
https://github.com/vincentlaucsb/pgreaper

convert-data csv-converter html-parser postgresql python sql sql-database sql-table sqlite3-database

Last synced: about 2 months ago
JSON representation

A Python library for loading data from various formats into PostgreSQL databases.

Host: GitHub
URL: https://github.com/vincentlaucsb/pgreaper
Owner: vincentlaucsb
License: mit
Created: 2017-04-09T08:00:34.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2017-10-18T23:05:22.000Z (about 7 years ago)
Last Synced: 2024-09-23T09:03:52.437Z (3 months ago)
Topics: convert-data, csv-converter, html-parser, postgresql, python, sql, sql-database, sql-table, sqlite3-database
Language: Python
Homepage:
Size: 35.6 MB
Stars: 12
Watchers: 3
Forks: 1
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# pgreaper
[![Build Status](https://travis-ci.org/vincentlaucsb/pgreaper.svg?branch=master)](https://travis-ci.org/vincentlaucsb/pgreaper)
[![Coverage Status](https://coveralls.io/repos/github/vincentlaucsb/pgreaper/badge.svg?branch=master)](https://coveralls.io/github/vincentlaucsb/pgreaper?branch=master)

PGReaper is the easy Pythonic way to upload data from CSV, JSON, HTML, and
SQLite sources to PostgreSQL databases. Interally, it uses the fast `COPY`
streaming protocol, but wraps it in a way that makes it more flexible, robust and
easier to use. Features include:
* Automatic schema inference
* Ability to parse and normalize ``s in HTML
* Ability to flatten and/or extract nested keys from JSON before copying
* Ability to copy files in `.zip` archives without decompressing them
* Ability to copy over networks

Lastly, but certainly not least, PGReaper ships with its own `Table` data structure
while also being able to copy `pandas` DataFrames. These allows programmatic creation
or updating of SQL tables without the verbosity associated with traditional ORMs.
Furthermore, when uploading Python data structures, PGReaper automatically infers the schema,
including for `JSONB` (from dict or list) and `timestamp` (from datetime) objects.

## Benchmarks
Speed is one of `pgreaper`'s main design goals. A list of benchmarks may be found under the `benchmarks` subdirectory.

## Installation
I've been using PGReaper heavily for my own projects, such as Twitter and web scraping, but I have only recently started polishing up the documentation and API for public consumption. PGReaper will be released on PyPI when I feel it is mature enough, but if you would like to use it now, you can clone this repository and run the following command where the files are extracted.

```pip install .```

Currently, you may need Cython to build this project.

### Dependencies
`PGReaper` requires a minimum of dependencies. Namely, these are Python 3.5+, `psycopg2`, and obviously PostgreSQL (use 9.3+ for JSON features).

## Documentation
Full documentation (or at least that's the goal!) is [available here](http://vincela.com/pgreaper/).
As a user of software, I frequently get frustrated with inconsistent, inaccurate
or unclear documentation. However, as a writer of software I also realize
writing documentation isn't always fun and what is obvious to you isn't always
obvious to others. If you see something that needs
improvement, feel free to submit an issue or pull request.