https://github.com/jamro/genai-masterclass-home-match
GenAI Masterclass Project submisssion
https://github.com/jamro/genai-masterclass-home-match
Last synced: 8 months ago
JSON representation
GenAI Masterclass Project submisssion
- Host: GitHub
- URL: https://github.com/jamro/genai-masterclass-home-match
- Owner: jamro
- Created: 2025-10-05T08:43:19.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-10-05T11:02:45.000Z (9 months ago)
- Last Synced: 2025-10-05T11:32:19.625Z (9 months ago)
- Language: Jupyter Notebook
- Size: 5.86 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# genai-masterclass-home-match
GenAI Masterclass Project submission for Udacity's "Building Generative AI Solutions" training
## Quick Start
```bash
make setup # Create virtual environment and install dependencies
make check-env # Verify .env file is properly configured
make run # Start Jupyter Lab
```
## Project Overview
This implementation demonstrates a **GenAI-powered real estate matching system** that showcases following AI concepts:
1. **Synthetic Data Generation**: Using LLMs to generate 500 realistic real estate listings across Polish cities
2. **Vector Database Creation**: Storing and organizing listing embeddings for semantic search with ChromaDB
3. **Semantic Search**: Finding relevant properties based on natural language buyer preferences
4. **Augmented Response Generation**: Personalizing listings using LLM-generated descriptions
### Key Features
- **500 AI-generated listings** across 20+ Polish cities with realistic pricing (300K-2.5M PLN)
- **Semantic search** using OpenAI embeddings and ChromaDB vector database
- **Cross-encoder reranking** for improved relevance scoring
- **Personalized descriptions** that emphasize buyer-relevant features while preserving facts
- **Metadata filtering** for precise property matching (bedrooms, location, price, etc.)
### Tech Stack
- **LLMs**: OpenAI GPT-4.1 for generation and personalization
- **Vector DB**: ChromaDB with cosine similarity indexing
- **Embeddings**: OpenAI text-embedding-3-small (1536 dimensions)
- **Reranking**: BAAI/bge-reranker-base cross-encoder
- **Framework**: LangChain with Pydantic structured output
- **Data**: 500 listings in JSON format with rich metadata
For detailed project requirements and evaluation criteria, see [rubric.md](rubric.md).
## Getting Started
The project consists of three main Jupyter notebooks:
1. **`generate_listings.ipynb`** - Creates 500 realistic real estate listings using GPT-4.1
2. **`create_vector_db.ipynb`** - Builds ChromaDB vector database with embeddings
3. **`search.ipynb`** - Demonstrates semantic search and personalized recommendations
Run them in order, or skip to `search.ipynb` if you already have the data and vector database (included in the project).
## Available Commands
Run `make help` to see all available commands:
- `make setup` - Create virtual environment and install dependencies
- `make install` - Install dependencies from requirements.txt
- `make run` - Start Jupyter Lab
- `make freeze` - Update requirements.txt with current dependencies
- `make check-env` - Check if .env file exists and has OPENAI_API_KEY
- `make clean` - Remove virtual environment and cache files
## Manual Setup (Alternative)
If you prefer manual setup:
```bash
python -m venv venv
source venv/bin/activate # macOS / Linux
# or:
venv\Scripts\activate # Windows
pip install -r requirements.txt
```
Create `.env` file and set `OPENAI_API_KEY=your_api_key_here`.
## Data Structure
- **`data/raw/`** - 500 JSON files containing generated real estate listings
- **`data/embeddings/`** - Cached embeddings for each listing (auto-generated)
- **`data/.chroma_db/`** - ChromaDB vector database (auto-created)
Each listing includes structured data like bedrooms, bathrooms, price, location, features, and lifestyle benefits.