An open API service indexing awesome lists of open source software.

https://github.com/jcaperella29/financial-data-scraper

Financial Data Scraper is a Python-based web scraping tool using Selenium to extract financial data from Stock Analysis. It scrapes Income Statement, Balance Sheet, Cash Flow, and Ratios for multiple companies and saves them as CSV files.
https://github.com/jcaperella29/financial-data-scraper

automation data-analysis finance financial-statements investment python selenium stock-market web-scraping

Last synced: 11 months ago
JSON representation

Financial Data Scraper is a Python-based web scraping tool using Selenium to extract financial data from Stock Analysis. It scrapes Income Statement, Balance Sheet, Cash Flow, and Ratios for multiple companies and saves them as CSV files.

Awesome Lists containing this project

README

          

πŸ“œ Updated README for Financial Data Scraper πŸš€πŸ’°
πŸ“Œ Overview
This project is a multi-company web scraping utility built using Python and Selenium to extract financial data from Stock Analysis. It navigates through financial tabs for multiple companiesβ€”including Income Statement, Balance Sheet, Cash Flow, and Ratiosβ€”and saves the extracted data as CSV files.

πŸ”₯ New Feature:

Users can now provide stock tickers & URLs in a CSV file (tickers.csv), eliminating the need to modify the script manually.
A sample tickers.csv file is included to help users get started quickly.
⚑ Features
βœ… Supports multiple stock symbols (e.g., GM, AAPL, TSLA).
βœ… Reads stock tickers & URLs from tickers.csv (no need to edit the script).
βœ… Scrapes key financial data from multiple tabs:

πŸ“„ Income Statement (default tab)
πŸ“Š Balance Sheet
πŸ’° Cash Flow
πŸ“ˆ Ratios
βœ… Saves extracted data into CSV files (e.g., AAPL_income_statement.csv).
βœ… Logs progress & handles errors gracefully πŸ›‘
βœ… Captures debugging screenshot (page_debug.png) πŸ–ΌοΈ
πŸ”§ Requirements
πŸ“Œ Python 3.7+
πŸ“Œ Selenium 4+
πŸ“Œ Firefox Browser
πŸ“Œ GeckoDriver (Ensure it's in your PATH or provide the full path).

πŸ“₯ Installation
1️⃣ Clone this repository:

bash
Copy
Edit
git clone https://github.com/yourusername/financial-data-scraper.git
cd financial-data-scraper
2️⃣ Install dependencies:

bash
Copy
Edit
pip install -r requirements.txt
3️⃣ Download and install:

πŸ”Ή Firefox Browser
πŸ”Ή GeckoDriver
πŸš€ Usage
1️⃣ Edit tickers.csv to Add Companies
Instead of modifying Python code, simply edit tickers.csv with stock symbols and their corresponding URLs.

πŸ“„ Example tickers.csv (included in the repo):

csv
Copy
Edit
ticker,url
GM,https://stockanalysis.com/stocks/gm/financials/
AAPL,https://stockanalysis.com/stocks/aapl/financials/
TSLA,https://stockanalysis.com/stocks/tsla/financials/
MSFT,https://stockanalysis.com/stocks/msft/financials/
2️⃣ Run the script
bash
Copy
Edit
python scraper.py
3️⃣ View results
Extracted CSV files will be saved inside the financial_data/ folder.

πŸ“‚ File Structure
bash
Copy
Edit
financial-data-scraper/
β”œβ”€β”€ financial_data/ # Output folder for CSV files
β”‚ β”œβ”€β”€ GM_income_statement.csv # GM Income Statement
β”‚ β”œβ”€β”€ AAPL_balance_sheet.csv # AAPL Balance Sheet
β”‚ β”œβ”€β”€ TSLA_cash_flow.csv # TSLA Cash Flow
β”‚ └── ... (more files)
β”œβ”€β”€ tickers.csv # User-provided ticker/URL input (new feature!)
β”œβ”€β”€ scraper.py # Main scraper script
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # Project documentation
└── page_debug.png # Debugging screenshot
⚠ Notes
πŸ”Ή Income Statement is preloaded when the page loads, so no click is needed for it.
πŸ”Ή JavaScript fallback clicking is used if normal Selenium clicks fail.
πŸ”Ή If the website structure changes, XPath adjustments may be needed.

πŸ›  Troubleshooting
❌ Browser Not Found β†’ Verify FIREFOX_BINARY_PATH in the script.
❌ Version Mismatch β†’ Ensure Firefox & GeckoDriver versions are compatible.
❌ Timeout Errors β†’ Increase Selenium WebDriverWait timeout values.

🎯 Future Enhancements
πŸ”œ Automatic detection of new stock listings
πŸ”œ Multi-threading for faster data extraction
πŸ”œ Database integration to store financial data

πŸŽ‰ Now users can easily scrape financials by just updating a CSV! πŸš€πŸ“ŠπŸ’°