{"id":30369395,"url":"https://github.com/dsacms/npd_plainerflow","last_synced_at":"2025-08-20T02:15:36.351Z","repository":{"id":306020532,"uuid":"1024487173","full_name":"DSACMS/npd_plainerflow","owner":"DSACMS","description":"Plain.. no plainer than that.. data pipelines","archived":false,"fork":false,"pushed_at":"2025-08-09T21:05:10.000Z","size":189,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-09T22:14:01.090Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"ftrotter/plainerflow","license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DSACMS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-22T19:22:32.000Z","updated_at":"2025-08-09T20:32:45.000Z","dependencies_parsed_at":"2025-07-28T21:37:41.842Z","dependency_job_id":"0f497508-ef7d-400e-b9e7-e866759b19d3","html_url":"https://github.com/DSACMS/npd_plainerflow","commit_stats":null,"previous_names":["dsacms/ndh_plainerflow","dsacms/npd_plainerflow"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DSACMS/npd_plainerflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_plainerflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_plainerflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_plainerflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_plainerflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DSACMS","download_url":"https://codeload.github.com/DSACMS/npd_plainerflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_plainerflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271252993,"owners_count":24726918,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-20T02:00:09.606Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-20T02:15:32.915Z","updated_at":"2025-08-20T02:15:36.342Z","avatar_url":"https://github.com/DSACMS.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# plainerflow\n\nA Python package for plain flow operations with SQLAlchemy integration.\n\n## Installation\n\n### From PyPI (recommended)\n\n```bash\npip install plainerflow\n```\n\n### Manual Installation\n\nIf pip is not available in your environment, you can use the package by adding it to your Python path:\n\n```python\nimport sys\nsys.path.insert(0, \"/path/to/the/right/plainerflow/subdirectory\")\nimport plainerflow\n```\n\n## Using PlainerFlow\n\nPlainerFlow provides a set of components designed to work together for data transformation pipelines. See the complete working example in [`pipeline_example.py`](pipeline_example.py) which demonstrates importing CSV data, transforming it using SQL, and validating the results.\n\n### Complete Pipeline Example\n\nThe [`pipeline_example.py`](pipeline_example.py) script shows a full data transformation pipeline that:\n\n1. **Connects to a database** using CredentialFinder (with PostgreSQL testcontainer fallback)\n2. **Defines table references** using DBTable for customers, orders, and derived tables\n3. **Loads CSV data** from the `readme_example_data/` directory\n4. **Transforms data** using SQL queries organized in a FrostDict\n5. **Validates results** using InLaw test classes\n6. **Displays sample output** showing the transformation results\n\n**To run the example:**\n\n```bash\n# Install dependencies\npip install plainerflow pandas great-expectations testcontainers\n\n# Run the complete pipeline\npython pipeline_example.py\n```\n\n**Expected output:**\n```\n=== PlainerFlow Pipeline Example Program ===\nStep 1: Connecting to database...\n✅ Connected to PostgreSQL test container\n\nStep 2: Defining table references...\nWill create tables: public.customers, public.orders, public.customer_summary\n\nStep 3: Defining data loading SQL...\n\nStep 4: Defining complete SQL pipeline...\n\nStep 5: Executing complete SQL pipeline...\n===== EXECUTING SQL LOOP =====\ncreate_customers_DBTable: DROP TABLE IF EXISTS public.customers CASCADE;...\ncreate_orders_DBTable: DROP TABLE IF EXISTS public.orders CASCADE;...\nload_customers_data: INSERT INTO public.customers...\nload_orders_data: INSERT INTO public.orders...\ncreate_customer_summary: CREATE TABLE IF NOT EXISTS public.customer_summary AS...\ncreate_order_metrics: CREATE TABLE IF NOT EXISTS order_metrics AS...\ncustomer_summary_sample: SELECT * FROM public.customer_summary LIMIT 3\norder_metrics_report: SELECT * FROM order_metrics ORDER BY order_count DESC\n===== SQL LOOP COMPLETE =====\n\nStep 6: Defining validation tests...\n\nStep 7: Running data validation tests...\n===== IN-LAW TESTS =====\nRunning: Customer summary should have same number of rows as customers\nPASS\nRunning: Customer summary should have no null names  \nPASS\nRunning: Active customers with orders should have positive total_spent\nPASS\nSummary: 3 passed\n\n✅ Pipeline completed successfully!\n   - Validation results: 3 passed, 0 failed\n🎉 All validation tests passed!\n```\n\n### Key Components Explained\n\n#### 1. CredentialFinder - Automatic Database Connection\n```python\n# Automatically detects your environment and provides a database connection\nengine = CredentialFinder.detect_config(verbose=True)\n# Supports: Spark/Databricks, Google Colab, .env files, SQLite fallback\n```\n\n#### 2. DBTable - Database Table References\n```python\n# Define table references before tables exist\ncustomers_table = DBTable(database='analytics', table='customers')\norders_table = DBTable(database='analytics', table='orders')\n\n# Create child tables with suffixes\nbackup_table = customers_table.make_child('backup')  # analytics.customers_backup\n\n# Use in SQL queries via f-strings\nsql = f\"SELECT * FROM {customers_table} WHERE status = 'active'\"\n```\n\n#### 3. FrostDict - Immutable SQL Configuration\n```python\n# Create frozen dictionary for SQL templates\nsql_queries = FrostDict({\n    'create_table': f\"CREATE TABLE {table_name} AS SELECT ...\",\n    'update_data': f\"UPDATE {table_name} SET ...\"\n})\n\n# Keys cannot be reassigned once set\nsql_queries['new_query'] = \"SELECT 1\"  # Works - new key\nsql_queries['create_table'] = \"SELECT 2\"  # Raises FrozenKeyError\n```\n\n#### 4. SQLoopcicle - SQL Execution Loop\n```python\n# Execute SQL statements in order\nSQLoopcicle.run_sql_loop(sql_queries, engine)\n\n# Dry-run mode to preview without execution\nSQLoopcicle.run_sql_loop(sql_queries, engine, is_just_print=True)\n```\n\n#### 5. InLaw - Data Validation Framework\n```python\n# Create validation test classes\nclass MyDataTest(InLaw):\n    title = \"Descriptive test name\"\n    \n    @staticmethod\n    def run(engine):\n        sql = \"SELECT COUNT(*) as row_count FROM my_table\"\n        gdf = InLaw.to_gx_dataframe(sql, engine)\n        \n        result = gdf.expect_column_values_to_be_between(\n            column=\"row_count\", min_value=1, max_value=1000\n        )\n        \n        return True if result.success else f\"Row count out of range: {gdf.iloc[0]['row_count']}\"\n\n# Run all validation tests\nInLaw.run_all(engine)\n```\n\n\n## Basic Usage\n\nFor simpler use cases, you can use individual components:\n\n```python\nimport plainerflow\n\n# Just get a database connection\nengine = plainerflow.CredentialFinder.detect_config()\n\n# Define a table reference\nmy_table = plainerflow.DBTable(database='mydb', table='users')\nprint(f\"Table reference: {my_table}\")  # Output: mydb.users\n\n# Create frozen configuration\nconfig = plainerflow.FrostDict({'query': f'SELECT * FROM {my_table}'})\n\n# Execute SQL\nplainerflow.SQLoopcicle.run_sql_loop(config, engine)\n```\n\n## Dependencies\n\n- SQLAlchemy \u003e= 1.4.0\n\n## Development\n\n### Setting up development environment\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/ftrotter/plainerflow.git\ncd plainerflow\n```\n\n2. Create a virtual environment:\n```bash\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n```\n\n3. Install development dependencies:\n```bash\npip install -e \".[dev]\"\n```\n\n### Building the package\n\n```bash\npython -m build\n```\n\n### Uploading to PyPI\n\n```bash\npython -m twine upload dist/*\n```\n\n## License\n\nThis project is licensed under the CC0 1.0 Universal License - see the [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsacms%2Fnpd_plainerflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdsacms%2Fnpd_plainerflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsacms%2Fnpd_plainerflow/lists"}