{"id":18377413,"url":"https://github.com/digitalghost-dev/stock-data-pipeline","last_synced_at":"2025-04-06T21:31:33.778Z","repository":{"id":44757693,"uuid":"512990163","full_name":"digitalghost-dev/stock-data-pipeline","owner":"digitalghost-dev","description":"Code Repository for my 1st Data Project.","archived":true,"fork":false,"pushed_at":"2023-03-31T04:14:31.000Z","size":35,"stargazers_count":22,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-01T18:31:13.472Z","etag":null,"topics":["bigquery","google-cloud-platform","python"],"latest_commit_sha":null,"homepage":"https://www.digitalghost.dev/stock-data-pipeline","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/digitalghost-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-12T03:49:54.000Z","updated_at":"2025-02-12T01:36:01.000Z","dependencies_parsed_at":"2023-02-12T17:15:39.898Z","dependency_job_id":null,"html_url":"https://github.com/digitalghost-dev/stock-data-pipeline","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalghost-dev%2Fstock-data-pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalghost-dev%2Fstock-data-pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalghost-dev%2Fstock-data-pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitalghost-dev%2Fstock-data-pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/digitalghost-dev","download_url":"https://codeload.github.com/digitalghost-dev/stock-data-pipeline/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247555141,"owners_count":20957715,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","google-cloud-platform","python"],"created_at":"2024-11-06T00:28:03.128Z","updated_at":"2025-04-06T21:31:33.514Z","avatar_url":"https://github.com/digitalghost-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Stock Data Pipeline with Python and Google Cloud\n\n\u003e \u003cpicture\u003e\n\u003e   \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://storage.googleapis.com/website-storage-bucket/icons/danger.svg\"\u003e\n\u003e   \u003cimg alt=\"Tip\" src=\"https://storage.googleapis.com/website-storage-bucket/icons/danger.svg\"\u003e\n\u003e \u003c/picture\u003e\u003cbr\u003e\n\u003e This project is now archived. The visualization still works but has stopped being updated as of March 30th, 2022. Archival was set due to no longer wanting to pay for API usage.\n\n\n\u003e \u003cpicture\u003e\n\u003e   \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://storage.googleapis.com/website-storage-bucket/icons/warning.svg\"\u003e\n\u003e   \u003cimg alt=\"Tip\" src=\"https://storage.googleapis.com/website-storage-bucket/icons/warning.svg\"\u003e\n\u003e \u003c/picture\u003e\u003cbr\u003e\n\u003e Any data in this project or on my website is for informational purposes only and should not be taken as invesment advice.\n\n\u003cbr\u003e\n\n\u003cdiv\u003e\n    \u003cimg alt=\"Version\" src=\"https://img.shields.io/badge/Project Number-1-orange.svg?cacheSeconds=2592000\" /\u003e\n\u003c/div\u003e\n\n## Overview\n* Extracts and transforms S\u0026P 500 stock data with Python from a financial API.\n* Data is loaded into Cloud Storage then transferred to BigQuery and rendered on my webpage.\n* Python code runs on a scheduled cron job through a virtual machine with GCP Compute Engine.\n\n### Important Links\n* [Visualization](https://www.digitalghost.dev/stock-data-pipeline)\n* [Documentation](https://github.com/digitalghost-dev/stock-data-pipeline/wiki/Stock-Data-Pipeline-Documentation)\n\n## How the Pipeline Works\n\n### Data Pipeline\n1. A cron job triggers `main.py` to run.\n2. `main.py` calls the IEX Cloud API.\n3. The data is processed and cleaned by removing commas, hyphens, and/or other extra characters from the **company name** column.\n4. `main.py` creates a `csv` file with the prepared data.\n5. `load.py` copies the `csv` file to a Cloud Storage bucket.\n6. The `csv` file is loaded to BigQuery.\n7. Using the [BigQuery API](https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries) and when the [webpage](https://www.digitalghost.dev/projects/stock-data-pipeline) is loaded, the data is queried and then displayed.\n\n### CI/CD\n* None\n\n### Notes:\n* The file that connects to BigQuery to pull the data when the page loads is located in my [wesbite repository](https://github.com/digitalghost-dev/website/) since that renders the frontend.\n* The pipeline does not account for holidays.\n\n### Pipeline Flowchart\n![stock-data-flowchart](https://storage.googleapis.com/pipeline-flowcharts/stock-data-pipeline-flowchart.png)\n\n## Services Used\n* **APIs:** [IEX Cloud](https://www.iexcloud.io)\n* **Google Cloud Services:**\n    * **Virtual Machine:** [Compute Engine ](https://cloud.google.com/compute)\n    * **Object Storage:** [Cloud Storage](https://cloud.google.com/storage)\n    * **Data Warehouse:** [BigQuery](https://cloud.google.com/bigquery/)\n* **Scheduler:** [cron](https://en.wikipedia.org/wiki/Cron)\n* **Visualization:** [Flask](https://flask.palletsprojects.com/en/2.2.x/) and HTML\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalghost-dev%2Fstock-data-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdigitalghost-dev%2Fstock-data-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitalghost-dev%2Fstock-data-pipeline/lists"}