{"id":18399098,"url":"https://github.com/bayoadejare/dw-optimization-insurance","last_synced_at":"2025-06-28T23:38:42.331Z","repository":{"id":257536016,"uuid":"858570403","full_name":"BayoAdejare/dw-optimization-insurance","owner":"BayoAdejare","description":"Insurance Data Warehouse Optimization Project","archived":false,"fork":false,"pushed_at":"2024-09-26T12:47:11.000Z","size":40,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-22T23:33:20.814Z","etag":null,"topics":["datawarehousing","index","insurance","optimization","sql"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BayoAdejare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-17T06:19:25.000Z","updated_at":"2024-10-07T09:08:23.000Z","dependencies_parsed_at":"2024-09-17T08:55:53.168Z","dependency_job_id":"37ede499-ac83-467f-9f5c-8b2209e4bdd8","html_url":"https://github.com/BayoAdejare/dw-optimization-insurance","commit_stats":null,"previous_names":["bayoadejare/dw-optimization-insurance"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BayoAdejare/dw-optimization-insurance","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Fdw-optimization-insurance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Fdw-optimization-insurance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Fdw-optimization-insurance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Fdw-optimization-insurance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BayoAdejare","download_url":"https://codeload.github.com/BayoAdejare/dw-optimization-insurance/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BayoAdejare%2Fdw-optimization-insurance/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262513634,"owners_count":23322663,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datawarehousing","index","insurance","optimization","sql"],"created_at":"2024-11-06T02:25:45.696Z","updated_at":"2025-06-28T23:38:42.282Z","avatar_url":"https://github.com/BayoAdejare.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Insurance Data Warehouse Optimization Project\n\n## Overview\n\nThis project focuses on optimizing a large-scale insurance data warehouse to significantly improve query performance through strategic implementation of indexes and partitioning. In the insurance industry, quick access to accurate data is crucial for risk assessment, claims processing, and regulatory reporting. By employing advanced optimization techniques, we aim to reduce query execution times, enhance data retrieval efficiency, and improve overall system performance, enabling insurance professionals to make faster, data-driven decisions.\n\n## Business Context\n\nOur insurance data warehouse contains vast amounts of data, including:\n\n- Policy information\n- Claims history\n- Customer demographics\n- Risk assessments\n- Financial transactions\n- Regulatory compliance data\n\nThe optimization project addresses several key business challenges:\n\n1. Slow response times for complex actuarial queries\n2. Delays in real-time risk assessment during policy underwriting\n3. Performance bottlenecks during end-of-month reporting cycles\n4. Inefficient access to historical claims data for trend analysis\n\nBy optimizing the data warehouse, we expect to:\n\n- Reduce policy underwriting time by 40%\n- Decrease monthly reporting generation time by 60%\n- Improve customer service response times by providing faster access to policy and claims information\n- Enhance real-time fraud detection capabilities\n\n## Features\n\n- Comprehensive analysis of existing insurance-specific query patterns\n- Implementation of appropriate indexing strategies for insurance data models\n- Table partitioning for improved query performance on large insurance datasets\n- Query optimization and rewriting for common insurance analytics scenarios\n- Performance benchmarking and reporting tailored to insurance KPIs\n\n## Technologies Used\n\n- Database: Snowflake\n- ETL Tool: Apache NiFi\n- Monitoring: Datadog\n- Version Control: Git\n- Scripting: Python, SQL\n\n## Project Structure\n\n```\ninsurance-data-warehouse-optimization/\n│\n├── scripts/\n│   ├── analysis/\n│   │   ├── policy_query_pattern_analysis.py\n│   │   ├── claims_data_distribution_analysis.py\n│   │   └── regulatory_reporting_query_analysis.py\n│   ├── indexing/\n│   │   ├── create_policy_indexes.sql\n│   │   ├── create_claims_indexes.sql\n│   │   └── create_customer_indexes.sql\n│   ├── partitioning/\n│   │   ├── partition_policy_tables.sql\n│   │   ├── partition_claims_history.sql\n│   │   └── repartition_financial_data.py\n│   └── performance/\n│       ├── benchmark_underwriting_queries.py\n│       ├── benchmark_claims_processing.py\n│       └── generate_optimization_report.py\n│\n├── config/\n│   ├── snowflake_config.yaml\n│   ├── nifi_config.yaml\n│   └── optimization_config.yaml\n│\n├── nifi/\n│   ├── templates/\n│   │   ├── policy_data_ingestion.xml\n│   │   ├── claims_data_processing.xml\n│   │   └── regulatory_report_generation.xml\n│   └── scripts/\n│       ├── custom_processors/\n│       └── nifi_api_interactions.py\n│\n├── docs/\n│   ├── insurance_data_model.md\n│   ├── indexing_strategy.md\n│   ├── partitioning_scheme.md\n│   ├── nifi_workflow.md\n│   └── performance_results.md\n│\n├── tests/\n│   ├── test_policy_data_retrieval.py\n│   ├── test_claims_processing_speed.py\n│   └── test_regulatory_reporting_queries.py\n│\n├── requirements.txt\n├── .gitignore\n└── README.md\n```\n\n## Installation\n\n1. Clone the repository:\n   ```\n   git clone https://github.com/your-insurance-company/insurance-dw-optimization.git\n   cd insurance-dw-optimization\n   ```\n\n2. Install the required dependencies:\n   ```\n   pip install -r requirements.txt\n   ```\n\n3. Configure the Snowflake connection in `config/snowflake_config.yaml`.\n\n4. Set up Apache NiFi configuration in `config/nifi_config.yaml`.\n\n5. Review and adjust optimization settings in `config/optimization_config.yaml`.\n\n## Usage\n\n1. Run the analysis scripts to identify optimization opportunities:\n   ```\n   python scripts/analysis/policy_query_pattern_analysis.py\n   python scripts/analysis/claims_data_distribution_analysis.py\n   python scripts/analysis/regulatory_reporting_query_analysis.py\n   ```\n\n2. Create indexes based on the analysis results:\n   ```\n   python scripts/indexing/create_policy_indexes.sql\n   python scripts/indexing/create_claims_indexes.sql\n   python scripts/indexing/create_customer_indexes.sql\n   ```\n\n3. Implement table partitioning:\n   ```\n   python scripts/partitioning/partition_policy_tables.sql\n   python scripts/partitioning/partition_claims_history.sql\n   python scripts/partitioning/repartition_financial_data.py\n   ```\n\n4. Set up Apache NiFi data flows:\n   - Import the templates from `nifi/templates/` into your NiFi instance\n   - Configure the processors according to your data sources and Snowflake connection details\n\n5. Benchmark the performance improvements:\n   ```\n   python scripts/performance/benchmark_underwriting_queries.py\n   python scripts/performance/benchmark_claims_processing.py\n   ```\n\n6. Generate a performance report:\n   ```\n   python scripts/performance/generate_optimization_report.py\n   ```\n\n## Optimization Techniques\n\n### Indexing\n\nWe employ the following indexing strategies tailored for insurance data:\n\n- B-tree indexes for high-cardinality columns like policy numbers and claim IDs\n- Bitmap indexes for low-cardinality columns such as policy types and claim status\n- Covering indexes for frequently accessed columns in policy and customer tables\n- Partial indexes for active policies and open claims\n\nDetailed indexing strategy can be found in `docs/indexing_strategy.md`.\n\n### Partitioning\n\nOur partitioning scheme includes:\n\n- Range partitioning for date-based queries on policy effective dates and claim dates\n- List partitioning for categorical data like policy types or risk categories\n- Hash partitioning for evenly distributed data such as customer IDs\n\nFor more information, refer to `docs/partitioning_scheme.md`.\n\n### ETL Optimization\n\nWe use Apache NiFi to create efficient and scalable data integration workflows:\n\n- Parallel processing of policy and claims data ingestion\n- Real-time data validation and cleansing\n- Automated regulatory report generation\n\nThe NiFi workflow documentation is available in `docs/nifi_workflow.md`.\n\n## Performance Metrics\n\nWe track the following metrics to measure optimization effectiveness:\n\n- Query execution time for common insurance operations (e.g., policy lookup, claims processing)\n- I/O operations during peak underwriting periods\n- CPU utilization for complex actuarial calculations\n- Memory usage for large-scale data analytics tasks\n- Data skew in policy and claims distributions\n- ETL job completion times and throughput\n\nDetailed performance results are available in `docs/performance_results.md`.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbayoadejare%2Fdw-optimization-insurance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbayoadejare%2Fdw-optimization-insurance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbayoadejare%2Fdw-optimization-insurance/lists"}