Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ayanhussain81/olympics-data-etl
Tokyo Olympic Azure Data Engineering Project
https://github.com/ayanhussain81/olympics-data-etl
azure databricks datafactory python synapseanalytics
Last synced: 22 days ago
JSON representation
Tokyo Olympic Azure Data Engineering Project
- Host: GitHub
- URL: https://github.com/ayanhussain81/olympics-data-etl
- Owner: ayanhussain81
- Created: 2024-02-05T11:55:08.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-09-15T06:49:29.000Z (4 months ago)
- Last Synced: 2024-09-15T10:15:05.105Z (4 months ago)
- Topics: azure, databricks, datafactory, python, synapseanalytics
- Language: Jupyter Notebook
- Homepage:
- Size: 734 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Tokyo Olympic Azure Data Engineering Project
## Overview
This project focuses on building a comprehensive data engineering pipeline for the Tokyo Olympic Games, leveraging Azure services such as Data Lake Gen2, Data Factory, Databricks, and Synapse Analytics. The pipeline aims to handle data integration, transformation, and analysis to support valuable insights for the Olympic events.## Technologies Used
- **Azure Data Lake Gen2**: Storage for raw and processed data.
- **Azure Data Factory**: Orchestration and automation of data workflows.
- **Azure Databricks**: Advanced analytics and data transformation.
- **Azure Synapse Analytics**: Data warehousing and analytics.## Project Structure
1. **Data Ingestion**: Raw data from various sources is ingested into Data Lake Gen2.
2. **ETL Pipeline**: Data is processed and transformed using Azure Data Factory, leading to curated datasets.
3. **Advanced Analytics**: Complex analytics and transformations are performed in Azure Databricks.
4. **Data Warehousing**: Synapse Analytics is utilized for scalable data warehousing and efficient querying.## Setup Instructions
1. **Azure Account**: Ensure you have an active Azure account.
2. **Azure Resources**: Create necessary Azure resources - Data Lake Gen2, Data Factory, Databricks, and Synapse Analytics.
3. **Configuration**: Update configuration files with your Azure credentials and project-specific details.
4. **Run Pipelines**: Execute Data Factory pipelines for ETL, monitor Databricks jobs, and utilize Synapse Analytics for analytics.## Usage
- Follow the documentation provided in the 'docs' directory for detailed instructions on setting up, running, and maintaining the project.
- For any issues or inquiries, refer to the 'issues' section in this repository.## Contribution
Contributions are welcome! Please follow the guidelines in the 'CONTRIBUTING.md' file.## License
This project is licensed under the [MIT License](LICENSE).---
Feel free to reach out for any questions or clarifications.
Happy coding!