Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ayanhussain81/olympics-data-etl

Tokyo Olympic Azure Data Engineering Project
https://github.com/ayanhussain81/olympics-data-etl

azure databricks datafactory python synapseanalytics

Last synced: 22 days ago
JSON representation

Tokyo Olympic Azure Data Engineering Project

Host: GitHub
URL: https://github.com/ayanhussain81/olympics-data-etl
Owner: ayanhussain81
Created: 2024-02-05T11:55:08.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-09-15T06:49:29.000Z (4 months ago)
Last Synced: 2024-09-15T10:15:05.105Z (4 months ago)
Topics: azure, databricks, datafactory, python, synapseanalytics
Language: Jupyter Notebook
Homepage:
Size: 734 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Tokyo Olympic Azure Data Engineering Project

## Overview
This project focuses on building a comprehensive data engineering pipeline for the Tokyo Olympic Games, leveraging Azure services such as Data Lake Gen2, Data Factory, Databricks, and Synapse Analytics. The pipeline aims to handle data integration, transformation, and analysis to support valuable insights for the Olympic events.

## Technologies Used
- **Azure Data Lake Gen2**: Storage for raw and processed data.
- **Azure Data Factory**: Orchestration and automation of data workflows.
- **Azure Databricks**: Advanced analytics and data transformation.
- **Azure Synapse Analytics**: Data warehousing and analytics.

## Project Structure
1. **Data Ingestion**: Raw data from various sources is ingested into Data Lake Gen2.
2. **ETL Pipeline**: Data is processed and transformed using Azure Data Factory, leading to curated datasets.
3. **Advanced Analytics**: Complex analytics and transformations are performed in Azure Databricks.
4. **Data Warehousing**: Synapse Analytics is utilized for scalable data warehousing and efficient querying.

## Setup Instructions
1. **Azure Account**: Ensure you have an active Azure account.
2. **Azure Resources**: Create necessary Azure resources - Data Lake Gen2, Data Factory, Databricks, and Synapse Analytics.
3. **Configuration**: Update configuration files with your Azure credentials and project-specific details.
4. **Run Pipelines**: Execute Data Factory pipelines for ETL, monitor Databricks jobs, and utilize Synapse Analytics for analytics.

## Usage
- Follow the documentation provided in the 'docs' directory for detailed instructions on setting up, running, and maintaining the project.
- For any issues or inquiries, refer to the 'issues' section in this repository.

## Contribution
Contributions are welcome! Please follow the guidelines in the 'CONTRIBUTING.md' file.

## License
This project is licensed under the [MIT License](LICENSE).

---

Feel free to reach out for any questions or clarifications.

Happy coding!