https://github.com/mehassanhmood/bigdata-analytics

Retrieving data from different resources and bringing the preprocessed data to PowerBI for Visualizations
https://github.com/mehassanhmood/bigdata-analytics

azuresql dataware elt etl-pipeline powerbi

Last synced: 8 months ago
JSON representation

Retrieving data from different resources and bringing the preprocessed data to PowerBI for Visualizations

Host: GitHub
URL: https://github.com/mehassanhmood/bigdata-analytics
Owner: mehassanhmood
Created: 2024-02-22T16:15:05.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-03-17T09:33:32.000Z (about 2 years ago)
Last Synced: 2025-03-27T04:43:21.010Z (about 1 year ago)
Topics: azuresql, dataware, elt, etl-pipeline, powerbi
Language: Jupyter Notebook
Homepage:
Size: 5.9 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Data Extraction and Pipeline Project

This repository contains the code and documentation for a data extraction and pipeline project. The project involves extracting data from various resources, transforming it, and loading it into different databases. Below is an overview of the project:

## Overview
- Extracted data from different resources such as API’s, CSVs, JSON.
- Saved the loaded data into different databases:
- Structured data was stored in SQL databases including SQL Express Server and MySQL.
- Semi-structured data was stored in MongoDB.
- Built a pipeline to retrieve the data from these sources, perform transformations, such as sentiment analysis on news data using a pretrained model, and load it into a local staging database.
- Utilized PostgreSQL for storing transformed data in the local staging database.
- Used Pyspark to design and implement the pipeline for data processing.
- Shifted the data from the local data warehouse to a cloud-based service, specifically Azure SQL.
- Utilized Power BI for creating visualizations and dashboards to analyze the data.

## Data Flow Diagram
![alt text](DataFlow.png)
## Project Structure
The project is structured as follows:
- `models/`: Contains pretrained models used for sentiment analysis.
- `docs/`: Contains project documentation.
- `visualizations/`: Contains visualizations and dashboards created using Power BI.

## Usage
To run the data extraction and pipeline:
1. Install the required dependencies specified in `requirements.txt`.
2. Change the configuration based on your env in `conf.yaml` file.
3. Run the main script to execute different components of the pipeline.
4. Use Power BI to open and explore the visualizations and dashboards in the `visualizations/` directory.

Feel free to contribute by submitting bug fixes, enhancements, or additional features.

## License
This project is licensed under the [MIT License](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mehassanhmood/bigdata-analytics

Awesome Lists containing this project

README