https://github.com/vasanthakumar70/project_stockmarket

This project uses PySpark to create an ETL pipeline. It extracts stock market data from the Alpha Vantage API, transforms it, and then loads it into a SQL Server database for analysis.
https://github.com/vasanthakumar70/project_stockmarket

etl-pipeline json mssql pyspark python

Last synced: about 2 months ago
JSON representation

This project uses PySpark to create an ETL pipeline. It extracts stock market data from the Alpha Vantage API, transforms it, and then loads it into a SQL Server database for analysis.

Host: GitHub
URL: https://github.com/vasanthakumar70/project_stockmarket
Owner: vasanthakumar70
Created: 2024-10-16T17:27:19.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-17T10:37:22.000Z (over 1 year ago)
Last Synced: 2025-02-05T11:37:29.067Z (over 1 year ago)
Topics: etl-pipeline, json, mssql, pyspark, python
Language: Python
Homepage:
Size: 28.3 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Project_stockmarket

This project shows how to get stock market data from API, use PySpark to process it, and then save the processed data in an SQL Database.

## Project Summary

The aim of this project is to make it simpler to gather data for many companies, making it easier to analyze and store this data in a relational database. This solution uses:

- **PySpark** for parallel data processing
- **Alpha Vantage API** for stock market data
- **SQL Server** for storing and managing structured data

The project is designed to run on a set schedule (e.g., daily) to stay up-to-date with the latest stock market data.

## Highlights

- **Data Mining**: Gets stock prices and volumes for using the Alpha Vantage API
- **Data Transformation**: Converts the raw JSON data into useful information, including the date, opening price, closing price, highest price, lowest price, and volume.
- **Data Loading**: Saves the processed data into the SQL Database.
- **Logging**: It includes logging process to track successes and failures.

### Process:
![Process Flow](https://github.com/vasanthakumar70/Project_stockmarket/blob/ce232c40bcb0f2626fcc37e952a6425dd98306c2/Process%20Diagram.svg)

## Project Structure

```
.
├── etl_process.py # Main Python script
├── .env # Environment variables (API key, database credentials, etc.)
├── etl_process.log # Log file
├── requirements.txt # Required Python packages
├── README.md # This readme file
└── sqljdbc_12.8/ # JDBC driver for connecting PySpark to SQL Server
```

### References & Downloads

- **Alpha Vantage API**: [RapidAPI](https://rapidapi.com/alphavantage/api/alpha-vantage/playground/)
- **Apache Spark**: [Spark Downloads](https://spark.apache.org/downloads.html)
- **Java Development Kit (JDK)**: [Oracle JDK 11 Downloads](https://www.oracle.com/java/technologies/javase/jdk11-archive-downloads.html)
- **Hadoop Winutils**: [Winutils GitHub Repository](https://github.com/cdarlint/winutils)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vasanthakumar70/project_stockmarket

Awesome Lists containing this project

README