https://github.com/ahmadalsharef994/apartment-data-parser
A legacy parser for real estate apartment listings, focused on cleaning and transforming XML data into structured formats for ETL workflows.
https://github.com/ahmadalsharef994/apartment-data-parser
data-cleaning etl parsing python real-estate xml
Last synced: 12 months ago
JSON representation
A legacy parser for real estate apartment listings, focused on cleaning and transforming XML data into structured formats for ETL workflows.
- Host: GitHub
- URL: https://github.com/ahmadalsharef994/apartment-data-parser
- Owner: ahmadalsharef994
- Created: 2025-04-22T18:05:11.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-26T12:11:36.000Z (about 1 year ago)
- Last Synced: 2025-06-29T20:39:16.102Z (about 1 year ago)
- Topics: data-cleaning, etl, parsing, python, real-estate, xml
- Language: Python
- Homepage:
- Size: 38.5 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Apartment Data Parser
## Overview
This Python project parses and cleans apartment listing data from XML files. Originally a legacy project, it extracts attributes like location and performs data cleaning tasks such as coordinate correction and fuzzy matching of area names.
## Features
- Parses apartment data from XML sources
- Cleans and standardizes coordinates and area names
- Modular design for easy extension
## Installation
1. Clone the repository: `git clone https://github.com/ahmadalsharef994/apartments-parser-legacy.git`
2. Install dependencies: `pip install -r requirements.txt`
## Usage
1. Place raw XML data in the `data/` directory.
2. Run the parser: `python parse_apartments.py`
3. Find cleaned outputs in the `output/` directory.
## Project Structure
- `parsers/`: XML parsing scripts
- `actions/`: Data cleaning utilities
- `data/`: Input data directory
- `output/`: Processed data directory
## License
This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.