{"id":17965889,"url":"https://github.com/patrickdavies100/pipeline38","last_synced_at":"2026-04-30T00:08:05.034Z","repository":{"id":259452591,"uuid":"877248223","full_name":"PatrickDavies100/Pipeline38","owner":"PatrickDavies100","description":"An application to automate the creation and execution of SQL queries. ","archived":false,"fork":false,"pushed_at":"2024-10-31T20:21:21.000Z","size":40,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-09T08:34:52.697Z","etag":null,"topics":["data","pandas-dataframe","pipeline","postgresql","psycopg2","sqlalchemy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PatrickDavies100.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-23T10:44:14.000Z","updated_at":"2024-10-31T20:21:26.000Z","dependencies_parsed_at":"2024-10-25T15:48:20.691Z","dependency_job_id":"e12fd312-c196-4772-b4aa-d8c0fc1cd663","html_url":"https://github.com/PatrickDavies100/Pipeline38","commit_stats":{"total_commits":14,"total_committers":1,"mean_commits":14.0,"dds":0.0,"last_synced_commit":"90cc1b499e745f9e0d53e7f7523b22c00d45f911"},"previous_names":["patrickdavies100/pipeline38"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PatrickDavies100%2FPipeline38","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PatrickDavies100%2FPipeline38/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PatrickDavies100%2FPipeline38/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PatrickDavies100%2FPipeline38/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PatrickDavies100","download_url":"https://codeload.github.com/PatrickDavies100/Pipeline38/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247071224,"owners_count":20878639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","pandas-dataframe","pipeline","postgresql","psycopg2","sqlalchemy"],"created_at":"2024-10-29T13:05:54.290Z","updated_at":"2026-04-30T00:08:04.971Z","avatar_url":"https://github.com/PatrickDavies100.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pipeline 38\n\nThis is a follow- up to Pipeline 37. The aim is to create a serialisation format of a data pipeline. There is one key change from Pipeline 37: \u003cbr /\u003e\n\nData will be manipulated using PostgreSQL commands rather than in a Pandas Dataframe format. \u003cbr /\u003e\n\n**Technologies used:**\nPostgreSQL 17.0, \u003cbr /\u003e\nPython 3.13.0, \u003cbr /\u003e\nPandas, \u003cbr /\u003e\nSQLAlchemy 2.0.36, \u003cbr /\u003e\npsycopg 2 2.9.10, \u003cbr /\u003e\n\n\npgAdmin 4 \u003cbr /\u003e\nPyCharm \u003cbr /\u003e\n\n**Objectives:**\n 1. Create tools for automated data process including cleaning, transformation, and processing.\n 2. The application can generate a working serialisation format of a pipeline.\n 3. Improve performance for large datasets with use of PostgreSQL queries.\n\n**Goal:**\nImprove my workflow for large datasets to create useful analysis for Tableau.\n\n**Architecture**\nThe basic structure of this project has a few simple elements. There is a connection to a PostgreSQL database that uses LocalSettings (this file is not on Github). The user can enter commands, the args are passed to the relevant function in SQLFunctions, and the query is constructed there and passed back to 'Connection' to be executed. These commands will include both changes to the data being examined and the creation of new tables. Every time a command is successfully executed, a row is also added to a DF called \u003cins\u003eQuery DF\u003c/ins\u003e that is recording the completed instructions. \n\nThis DF is a record of the data processing. It can then be saved, loaded, or exported so that the user can automate the steps for another file. \n\nThere is a second dataframe (\u003cins\u003eDerived DF\u003c/ins\u003e) that stores the results of user commands, IE derived values that are not added to the original dataset. In this way the user is able to create a table of derived data and perform different operations on it directly.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatrickdavies100%2Fpipeline38","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpatrickdavies100%2Fpipeline38","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatrickdavies100%2Fpipeline38/lists"}