https://github.com/caesarw0/lancaster-property-tax-scraper
Automated Python scraper for extracting delinquent tax data from Lancaster County, PA's public parcel viewer. Accepts parcel list input, extracts only relevant data, and outputs clean CSV files. Built for large-scale use with request throttling and error handling.
https://github.com/caesarw0/lancaster-property-tax-scraper
csv-export playwright python python-scraping real-estate-data scrapping-python web-scraper webscraping
Last synced: about 2 months ago
JSON representation
Automated Python scraper for extracting delinquent tax data from Lancaster County, PA's public parcel viewer. Accepts parcel list input, extracts only relevant data, and outputs clean CSV files. Built for large-scale use with request throttling and error handling.
- Host: GitHub
- URL: https://github.com/caesarw0/lancaster-property-tax-scraper
- Owner: caesarw0
- License: mit
- Created: 2025-06-05T07:04:55.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-16T23:19:49.000Z (about 1 year ago)
- Last Synced: 2025-06-17T00:25:01.269Z (about 1 year ago)
- Topics: csv-export, playwright, python, python-scraping, real-estate-data, scrapping-python, web-scraper, webscraping
- Language: Python
- Homepage:
- Size: 19.9 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Lancaster County Tax Delinquency Scraper
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://playwright.dev/)
## Demo
Watch the scraper in action:

*Automated scraping process demonstration*
## Overview
An automated Python script to extract delinquent tax information from Lancaster County, PA's public parcel viewer system.
This scraper extracts delinquent tax data from the Lancaster County Property Tax portal:
```javascript
https://lancasterpa.devnetwedge.com/parcel/view/{parcel_number}/{tax_year}
```
Example URL:
[https://lancasterpa.devnetwedge.com/parcel/view/5408465600000/2025](https://lancasterpa.devnetwedge.com/parcel/view/5408465600000/2025)
### Web Interface


*Screenshot of the Lancaster County Parcel Viewer interface where data is extracted from*
## Sample Output
### CSV Output Format

The script generates a structured CSV file containing delinquent tax information:
```csv
parcel_number,address,owner,scrape_date,tax_year,amount_due,amount_paid,total_due
5408465600000,123 MAIN ST LANCASTER PA,JOHN DOE,2024-03-20,2023,1500.00,0.00,1500.00
1200794700000,456 ELM ST LANCASTER PA,JANE SMITH,2024-03-20,2022,2000.00,500.00,1500.00
```
## Project Structure
```text
lancaster-property-tax-scraper/
├── src/
│ └── property_scraper.py # Main scraper implementation
├── output/
│ └── delinquent_taxes.csv # Generated output file
├── img/ # Documentation images
├── requirements.txt # Python dependencies
└── README.md # Documentation
```
## How It Works
### Overall Workflow
```mermaid
graph LR
A["Input Parcel List"] --> B["Initialize Scraper"]
B --> C["Process Each Parcel"]
C --> D["Check for
Delinquent Taxes"]
D --> E{"Has Delinquent
Taxes?"}
E -->|"Yes"| F["Extract Data"]
E -->|"No"| G["Skip Parcel"]
F --> H["Add to Results"]
G --> C
H --> C
C --> I["Export to CSV"]
```
### Data Extraction Process
```mermaid
graph TD
A["Parcel Page"] --> B["Basic Info"]
A --> C["Tax Info"]
B --> D["Parcel Number"]
B --> E["Property Address"]
B --> F["Owner Details"]
C --> G["Tax Year
2022-2024"]
C --> H["Amount Due"]
C --> I["Amount Paid"]
C --> J["Total Due"]
G & H & I & J --> K["CSV Record"]
```
### Error Handling & Rate Limiting
```mermaid
sequenceDiagram
participant S as Scraper
participant W as Web Server
participant D as Database
S->>W: Request Parcel Page
Note over S,W: 2-5 second delay
W->>S: Return Page
S->>S: Extract Data
alt Success
S->>D: Store Results
else Network Timeout
S->>S: Retry Request
else No Data Found
S->>S: Log & Skip
end
```
## Features
- Automated scraping of delinquent tax data from Lancaster County's parcel viewer
- Handles multiple parcel numbers in batch
- Extracts data for tax years 2022-2024
- Collects property address and owner information
- Outputs results to CSV format
- Built-in rate limiting to prevent server overload
- Only captures parcels with actual delinquent taxes
## Data Extracted
For each parcel with delinquent taxes, the script collects:
- Parcel number
- Property address
- Owner information
- Tax year (2022-2024)
- Amount due
- Amount paid
- Total due
- Scrape date
## Prerequisites
- Python 3.7+
- Playwright
- Pandas
## Installation
1. Clone this repository:
```bash
git clone https://github.com/caesarw0/lancaster-property-tax-scraper.git
cd lancaster-property-tax-scraper
```
2. Install required packages:
```bash
pip install -r requirements.txt
```
3. Install Playwright browsers:
```bash
playwright install
```
## Usage
1. Prepare a list of parcel numbers in the script or import them from a file.
2. Run the script:
```bash
python src/property_scraper.py
```
The script will:
- Process each parcel number
- Extract delinquent tax information if available
- Save results to `output/delinquent_taxes.csv`
### Example Code
```python
from property_scraper import scrape_multiple_parcels
parcel_numbers = [
"5408465600000",
"1200794700000",
]
df = scrape_multiple_parcels(parcel_numbers)
```
## Rate Limiting
The script includes built-in delays between requests (2-5 seconds) to avoid overwhelming the server. This helps ensure:
- Ethical scraping practices
- Reduced likelihood of IP blocking
- Server resource conservation
## Output Format
The script generates a CSV file with the following columns:
- parcel_number
- address
- owner
- scrape_date
- tax_year
- amount_due
- amount_paid
- total_due
## Error Handling
The script includes robust error handling for:
- Network timeouts
- Missing data
- Invalid parcel numbers
- Server errors
## Legal Notice
This tool is designed for legitimate data collection from publicly available information. Users should:
- Review and comply with Lancaster County's terms of service
- Use reasonable request rates
- Respect the public resource
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
[MIT License](LICENSE)