https://github.com/patrickdavies100/pipeline38

An application to automate the creation and execution of SQL queries.
https://github.com/patrickdavies100/pipeline38

data pandas-dataframe pipeline postgresql psycopg2 sqlalchemy

Last synced: about 2 months ago
JSON representation

An application to automate the creation and execution of SQL queries.

Host: GitHub
URL: https://github.com/patrickdavies100/pipeline38
Owner: PatrickDavies100
License: gpl-3.0
Created: 2024-10-23T10:44:14.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-31T20:21:21.000Z (over 1 year ago)
Last Synced: 2025-02-09T08:34:52.697Z (over 1 year ago)
Topics: data, pandas-dataframe, pipeline, postgresql, psycopg2, sqlalchemy
Language: Python
Homepage:
Size: 39.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Pipeline 38

This is a follow- up to Pipeline 37. The aim is to create a serialisation format of a data pipeline. There is one key change from Pipeline 37:

Data will be manipulated using PostgreSQL commands rather than in a Pandas Dataframe format.

**Technologies used:**
PostgreSQL 17.0,

Python 3.13.0,

Pandas,

SQLAlchemy 2.0.36,

psycopg 2 2.9.10,

pgAdmin 4

PyCharm

**Objectives:**
1. Create tools for automated data process including cleaning, transformation, and processing.
2. The application can generate a working serialisation format of a pipeline.
3. Improve performance for large datasets with use of PostgreSQL queries.

**Goal:**
Improve my workflow for large datasets to create useful analysis for Tableau.

**Architecture**
The basic structure of this project has a few simple elements. There is a connection to a PostgreSQL database that uses LocalSettings (this file is not on Github). The user can enter commands, the args are passed to the relevant function in SQLFunctions, and the query is constructed there and passed back to 'Connection' to be executed. These commands will include both changes to the data being examined and the creation of new tables. Every time a command is successfully executed, a row is also added to a DF called Query DF that is recording the completed instructions.

This DF is a record of the data processing. It can then be saved, loaded, or exported so that the user can automate the steps for another file.

There is a second dataframe (Derived DF) that stores the results of user commands, IE derived values that are not added to the original dataset. In this way the user is able to create a table of derived data and perform different operations on it directly.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/patrickdavies100/pipeline38

Awesome Lists containing this project

README