https://github.com/jcaperella29/financial-data-scraper
Financial Data Scraper is a Python-based web scraping tool using Selenium to extract financial data from Stock Analysis. It scrapes Income Statement, Balance Sheet, Cash Flow, and Ratios for multiple companies and saves them as CSV files.
https://github.com/jcaperella29/financial-data-scraper
automation data-analysis finance financial-statements investment python selenium stock-market web-scraping
Last synced: 11 months ago
JSON representation
Financial Data Scraper is a Python-based web scraping tool using Selenium to extract financial data from Stock Analysis. It scrapes Income Statement, Balance Sheet, Cash Flow, and Ratios for multiple companies and saves them as CSV files.
- Host: GitHub
- URL: https://github.com/jcaperella29/financial-data-scraper
- Owner: jcaperella29
- Created: 2025-02-03T15:57:15.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-04T17:52:09.000Z (over 1 year ago)
- Last Synced: 2025-03-02T01:37:29.578Z (over 1 year ago)
- Topics: automation, data-analysis, finance, financial-statements, investment, python, selenium, stock-market, web-scraping
- Language: Python
- Homepage:
- Size: 15.6 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
π Updated README for Financial Data Scraper ππ°
π Overview
This project is a multi-company web scraping utility built using Python and Selenium to extract financial data from Stock Analysis. It navigates through financial tabs for multiple companiesβincluding Income Statement, Balance Sheet, Cash Flow, and Ratiosβand saves the extracted data as CSV files.
π₯ New Feature:
Users can now provide stock tickers & URLs in a CSV file (tickers.csv), eliminating the need to modify the script manually.
A sample tickers.csv file is included to help users get started quickly.
β‘ Features
β
Supports multiple stock symbols (e.g., GM, AAPL, TSLA).
β
Reads stock tickers & URLs from tickers.csv (no need to edit the script).
β
Scrapes key financial data from multiple tabs:
π Income Statement (default tab)
π Balance Sheet
π° Cash Flow
π Ratios
β
Saves extracted data into CSV files (e.g., AAPL_income_statement.csv).
β
Logs progress & handles errors gracefully π‘
β
Captures debugging screenshot (page_debug.png) πΌοΈ
π§ Requirements
π Python 3.7+
π Selenium 4+
π Firefox Browser
π GeckoDriver (Ensure it's in your PATH or provide the full path).
π₯ Installation
1οΈβ£ Clone this repository:
bash
Copy
Edit
git clone https://github.com/yourusername/financial-data-scraper.git
cd financial-data-scraper
2οΈβ£ Install dependencies:
bash
Copy
Edit
pip install -r requirements.txt
3οΈβ£ Download and install:
πΉ Firefox Browser
πΉ GeckoDriver
π Usage
1οΈβ£ Edit tickers.csv to Add Companies
Instead of modifying Python code, simply edit tickers.csv with stock symbols and their corresponding URLs.
π Example tickers.csv (included in the repo):
csv
Copy
Edit
ticker,url
GM,https://stockanalysis.com/stocks/gm/financials/
AAPL,https://stockanalysis.com/stocks/aapl/financials/
TSLA,https://stockanalysis.com/stocks/tsla/financials/
MSFT,https://stockanalysis.com/stocks/msft/financials/
2οΈβ£ Run the script
bash
Copy
Edit
python scraper.py
3οΈβ£ View results
Extracted CSV files will be saved inside the financial_data/ folder.
π File Structure
bash
Copy
Edit
financial-data-scraper/
βββ financial_data/ # Output folder for CSV files
β βββ GM_income_statement.csv # GM Income Statement
β βββ AAPL_balance_sheet.csv # AAPL Balance Sheet
β βββ TSLA_cash_flow.csv # TSLA Cash Flow
β βββ ... (more files)
βββ tickers.csv # User-provided ticker/URL input (new feature!)
βββ scraper.py # Main scraper script
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ page_debug.png # Debugging screenshot
β Notes
πΉ Income Statement is preloaded when the page loads, so no click is needed for it.
πΉ JavaScript fallback clicking is used if normal Selenium clicks fail.
πΉ If the website structure changes, XPath adjustments may be needed.
π Troubleshooting
β Browser Not Found β Verify FIREFOX_BINARY_PATH in the script.
β Version Mismatch β Ensure Firefox & GeckoDriver versions are compatible.
β Timeout Errors β Increase Selenium WebDriverWait timeout values.
π― Future Enhancements
π Automatic detection of new stock listings
π Multi-threading for faster data extraction
π Database integration to store financial data
π Now users can easily scrape financials by just updating a CSV! πππ°