{"id":26187401,"url":"https://github.com/betkh/ingest_programattically","last_synced_at":"2025-03-11T23:49:45.724Z","repository":{"id":268957285,"uuid":"905970839","full_name":"BeTKH/Ingest_programattically","owner":"BeTKH","description":null,"archived":false,"fork":false,"pushed_at":"2024-12-19T23:20:00.000Z","size":2188,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-20T00:23:58.991Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BeTKH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-19T22:11:20.000Z","updated_at":"2024-12-19T23:20:04.000Z","dependencies_parsed_at":"2024-12-20T00:24:00.555Z","dependency_job_id":"af5a2e92-e04a-4507-96bf-183ebf83b34b","html_url":"https://github.com/BeTKH/Ingest_programattically","commit_stats":null,"previous_names":["betkh/ingest_programattically"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BeTKH%2FIngest_programattically","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BeTKH%2FIngest_programattically/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BeTKH%2FIngest_programattically/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BeTKH%2FIngest_programattically/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BeTKH","download_url":"https://codeload.github.com/BeTKH/Ingest_programattically/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243131006,"owners_count":20241177,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-11T23:49:45.218Z","updated_at":"2025-03-11T23:49:45.716Z","avatar_url":"https://github.com/BeTKH.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Ingestion from various Sources\n\nThis project showcases workflows for data ingestion from various sources including REST API, KCL - Kaggle Clinet lIBRARY, SQL Databases, NoSQL Database - MongoDB, ObjectStorage (S3), etc.\n\nImportant step: `ignore all config files` in gitignore to hide access codes from the public.\n\n1. **Ingestion from REST API and sving to Blob container**:\n\n- Ingestion of CO₂ and coal emissions data in the US from EIA using REST API.\n\nSteps:\n\n    a). create api key at EIA \u0026 store the file in `config file`.\n\n    b). create sas key for blob storage and save it as `config file`.\n\n    c). explore api page to get specific data that you want and get ednpoint\n\n    d). ingest\n\n2. **World Population (Kaggle)**: Examines global population trends using datasets ingested with the Kaggle Client Library (KCL).\n\n   ```\n\n   from kaggle.api.kaggle_api_extended import KaggleApi\n\n   # Instantiate the Kaggle API\n   api = KaggleApi()\n\n   # Authentication defaults to use the config file in the predefined location.\n   api.authenticate()\n\n   # Example: List datasets\n   datasets = api.dataset_list()\n   for dataset in datasets:\n       print(dataset)\n\n\n   ```\n\n3. **SQL Database Source - PostgreSQL**: [TBD]\n\n4. **NoSQL Database Source - MongoDB**: [TBD]\n\n5. **Object storage Source - AWS S3**: [TBD]\n\n6. **File System Source - AWS EFS**: [TBD]\n\n## Sources\n\n- **REST API Source**: energy \u0026 emissions data programmatically from EIA API.\n- **KCL - Kaggle Source**: Using KCL (Kaggle Clinet Library) for ingesting data, `global population` dataset.\n\n### Data Analysis\n\n- **Energy Emissions**:\n\n  - Identifies historical trends in U.S. CO₂ and coal emissions.\n  - Highlights emission-heavy sectors and their contributions.\n\n- **World Population**:\n  - Explores global population distribution and growth.\n  - Visualizes regional trends and urbanization patterns.\n\n---\n\n## Technologies Used\n\n- **APIs**: EIA REST API, Kaggle Client Library (KCL).\n- **Data Processing**: Python (`pandas`, `numpy`).\n- **Visualization**: `matplotlib`, `seaborn`.\n- **Tool**: `JupyterNotebook`\n\n---\n\n## Insights US Co2 emissions:\n\n\u003cimg src=\"Data/CO2/Visuals_BubblePlot/StatesRank_TotalCO2emission.png\" alt=\"StatesRank_TotalCO2emission.png\" style=\"width:100%; height:auto;\"\u003e\n\n\u003cimg src=\"Data/CO2/Visuals_BubblePlot/StatesRank_CO2_emission_fromCoal.png\" alt=\"co2 emissions by state in the US\" style=\"width:100%; height:auto;\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbetkh%2Fingest_programattically","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbetkh%2Fingest_programattically","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbetkh%2Fingest_programattically/lists"}