Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sadafasad/banks-mc-etl-pipeline
Banks' market capital ETL data pipeline
https://github.com/sadafasad/banks-mc-etl-pipeline
apache-airflow beautifulsoup docker numpy pandas python requests sqlite
Last synced: 10 days ago
JSON representation
Banks' market capital ETL data pipeline
- Host: GitHub
- URL: https://github.com/sadafasad/banks-mc-etl-pipeline
- Owner: SadafAsad
- Created: 2024-01-25T17:02:39.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-01-27T15:10:04.000Z (12 months ago)
- Last Synced: 2024-11-12T03:05:30.272Z (2 months ago)
- Topics: apache-airflow, beautifulsoup, docker, numpy, pandas, python, requests, sqlite
- Language: Python
- Homepage:
- Size: 112 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Banks MC ETL Pipeline
The Banks MC ETL Pipeline is a Python project with the primary objective of automating the extraction, transformation, and loading (ETL) process for banks' market capital data. Leveraging web scraping capabilities through Requests and BeautifulSoup. Subsequently, Pandas, numpy, SQLite, and Apache Airflow are utilized to create the project. Docker is used to containerize Airflow, ensuring a simplified deployment.
## ETL Pipeline
- Extract: It is initiated by leveraging web scraping capabilities through Requests and BeautifulSoup to extract market capital data from an archived Wikipedia page. This phase involves retrieving specific information related to banks' market capital from the source.
- Transform: Following data extraction, the pipeline utilizes Pandas and numpy to transform the raw data according to a predefined CSV file. This transformation involves calculating market capital in other currencies based on predefined conversion rates.
- Load: Once the data has been successfully transformed, the final step involves loading it into both a CSV file and an SQLite database. This is facilitated by incorporating SQLite for database management.
- Apache Airflow orchestrates these tasks, ensuring a systematic and automated execution of the entire ETL process.## Tools & Libraries
- Python
- Airflow
- SQLite
- Requests
- BeautifulSoup
- Pandas
- Numpy