{"id":31635106,"url":"https://github.com/mtholahan/guided-capstone-project","last_synced_at":"2026-05-08T14:03:03.222Z","repository":{"id":306108899,"uuid":"1025039175","full_name":"mtholahan/guided-capstone-project","owner":"mtholahan","description":"Build an end-to-end pipeline for high-frequency equity market data. Designed database schemas, ingested daily trade and quote records from CSV/JSON into Spark, implemented EOD batch loads with deduplication, and engineered ETL jobs to calculate trade indicators, moving averages, and bid/ask movements for market analysis.","archived":false,"fork":false,"pushed_at":"2025-09-15T04:58:06.000Z","size":451,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-15T05:38:12.794Z","etag":null,"topics":["azure","big-data","bootcamp","csv","data-engineering","data-pipeline","etl","finance","json","parquet","pyspark","spark","springboard","stock-market"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mtholahan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-23T16:14:56.000Z","updated_at":"2025-09-15T04:58:10.000Z","dependencies_parsed_at":"2025-09-15T05:49:44.657Z","dependency_job_id":null,"html_url":"https://github.com/mtholahan/guided-capstone-project","commit_stats":null,"previous_names":["mtholahan/equity_market_data_analysis","mtholahan/guided-capstone-project"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/mtholahan/guided-capstone-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtholahan%2Fguided-capstone-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtholahan%2Fguided-capstone-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtholahan%2Fguided-capstone-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtholahan%2Fguided-capstone-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mtholahan","download_url":"https://codeload.github.com/mtholahan/guided-capstone-project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtholahan%2Fguided-capstone-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278703580,"owners_count":26031205,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-06T02:00:05.630Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","big-data","bootcamp","csv","data-engineering","data-pipeline","etl","finance","json","parquet","pyspark","spark","springboard","stock-market"],"created_at":"2025-10-07T00:48:16.542Z","updated_at":"2025-10-07T00:48:19.336Z","avatar_url":"https://github.com/mtholahan.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Guided Capstone Project\n\n\n## 📖 Abstract\nThis guided capstone focuses on designing and implementing an end-to-end data pipeline to process and analyze high-frequency equity market data. The simulated client, Spring Capital, is an investment bank that depends on real-time analytics for trade and quote data across multiple exchanges. The engineering goal is to create a scalable platform that ingests raw trade and quote records, applies daily ETL processes, and generates key financial indicators to support decision-making.  The pipeline design spans multiple stages:  * Schema design: normalized trade and quote tables with composite keys for efficient querying.  * Data ingestion: parsing semi-structured daily exchange files (CSV and JSON) to extract valid records and discard malformed ones.  * Batch load: an end-of-day process that consolidates daily submissions, resolves late-arriving corrections, and ensures only the most current records persist.  * Analytical ETL: deriving business-critical metrics, including latest trade price, rolling 30-minute average, and bid/ask price movements relative to prior day close.  * Orchestration: scheduling jobs with retry logic and status tracking to guarantee operational reliability.  By the end of the project, the platform demonstrates scalable, fault-tolerant data engineering practices, combining database design, PySpark data ingestion, and workflow orchestration. This project bridges foundational design skills with applied big data engineering in a realistic financial services context.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n*Generated automatically via Python + Jinja2 + SQL Server table `tblMiniProjectProgress` on 09-15-2025 18:04:08*","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtholahan%2Fguided-capstone-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmtholahan%2Fguided-capstone-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtholahan%2Fguided-capstone-project/lists"}