https://github.com/phanchenh/datawrangling_pythonproject
Data Wrangling Project - Securities Upload for Strong Oak Security Management
https://github.com/phanchenh/datawrangling_pythonproject
data-wrangling os pandas pathlib processing python validation
Last synced: 3 months ago
JSON representation
Data Wrangling Project - Securities Upload for Strong Oak Security Management
- Host: GitHub
- URL: https://github.com/phanchenh/datawrangling_pythonproject
- Owner: PhanChenh
- Created: 2025-02-12T08:54:47.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-02-20T02:04:31.000Z (3 months ago)
- Last Synced: 2025-02-20T03:21:59.888Z (3 months ago)
- Topics: data-wrangling, os, pandas, pathlib, processing, python, validation
- Language: Python
- Homepage:
- Size: 771 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project Title: Data Wrangling Project - Securities Upload for Strong Oak Security Management
## Table of Contents
- [Overview](#overview)
- [Dataset](#dataset)
- [Objective](#objective)
- [Analysis Approach](#analysis-approach)
- [Key Findings](#key-findings)
- [How to run code](#how-to-run-code)
- [Technologies Used](#technologies-used)
- [Results](#results)
- [Recommendation](#recommendation)
- [Contact](#contact)## Overview
Strong Oak Security Management has outsourced their security management to Euler, a financial data platform. Euler’s data engineering team is responsible for loading, processing, and validating security data from Strong Oak’s security master file.
## Dataset
The project uses the following datasets:
- [attributes.data](Data/attributes.data): Includes various security attributes.
- [exchange.data](Data/exchange.data): Provides exchange details like name and location.
- [stock.data](Data/stock.data): Contains stock-related information, possibly including Symbol and QUEUESIP.
- [strong_oak_security_master.csv](Data/strong_oak_security_master.csv): Contains the raw list of securities.These datasets contain information about securities, attributes, and exchanges, necessary for the validation and upload process.
## Objective
The primary goal of this project is to clean, validate, and transform the security data to fit Euler’s proprietary security management platform. The process involves:
1. Identifying and filtering valid securities.
2. Uploading attributes in the required format.
3. Minimizing missing data where possible.## Analysis Approach
1. Loading Securities into the Platform:
- Extract relevant security identifiers: MIC, QUEUESIP, Symbol, RequestId.
- Assign a unique EulerId to each security.
- Ensure at least one of QUEUESIP or Symbol is populated.
- Minimize null values in QUEUESIP and Symbol.
- Save the cleaned securities data in {firstName}_{lastName}_section1.csv.2. Uploading Attributes:
- Extract security attributes from multiple sources.
- Convert data to a long format with columns: EulerId, AttributeName, AttributeValue.
- Ensure no null values exist in AttributeValue.
- Save the attributes dataset in {firstName}_{lastName}_section2.csv.**Noted:** For detail steps, please view [detail file](Detail_Steps_README.md)
## Key Findings
- Some securities in the master file were invalid due to missing identifiers or being inactive.
- Merging data from multiple sources introduced inconsistencies that needed filtering.
- Some false-positive matches were identified and removed.
- The exchange.data file provided necessary location data for exchange names.## How to run code
1. Ensure all required datasets are stored in the /data directory.
2. Install dependencies: pip install pandas os pathlib.
3. Run the Python script that processes the data:
```
python firstName_lastName_data_solutions.py
```
- The output csv files will be saved in the specified directory.## Technologies Used
- Programming Language: Python
- Libraries: pandas, os, pathlib
- File Formats: csv, .data
- Data Cleaning & Transformation: Pandas DataFrame operations## Results
- Breakdown of missing data before and after processing.
Figure 1: Validation and Breakdown Before Processing
There are presence of missing data in stock (RequestId, Symbol, QUEUESIP, MIC), strong_oak_security_master (Ticker, QUEUESIP, Strong Oak Identifier), and attributes data (Asset Class, Inception Date, Return Since Inception).

Figure 2: Validation and Breakdown After Processing
- No missing values for certain columns (e.g., EulerId, MIC).
- No duplicate EulerId or RequestId after processing.
- No invalid MIC values, indicating that only valid MICs are in the dataset.
- No issues with Exchange Location format or invalid AttributeName.## Recommendation
- Automate the validation process to improve efficiency.
- Implement a robust error-handling mechanism for invalid securities.
- Maintain a historical record of uploaded securities for future reference.## Contact
📧 Email: [email protected]
🔗 [LinkedIn](https://www.linkedin.com/in/phan-chenh-6a7ba127a/) | Portfolio