{"id":31665380,"url":"https://github.com/jamro/genai-masterclass-home-match","last_synced_at":"2025-10-18T00:48:37.298Z","repository":{"id":318133914,"uuid":"1070089983","full_name":"jamro/genai-masterclass-home-match","owner":"jamro","description":"GenAI Masterclass Project submisssion","archived":false,"fork":false,"pushed_at":"2025-10-05T11:02:45.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-05T11:32:19.625Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jamro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-05T08:43:19.000Z","updated_at":"2025-10-05T11:02:48.000Z","dependencies_parsed_at":"2025-10-05T11:32:37.283Z","dependency_job_id":"9020dfc0-26a6-415f-8a59-3b1f0705e689","html_url":"https://github.com/jamro/genai-masterclass-home-match","commit_stats":null,"previous_names":["jamro/genai-masterclass-home-match"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/jamro/genai-masterclass-home-match","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamro%2Fgenai-masterclass-home-match","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamro%2Fgenai-masterclass-home-match/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamro%2Fgenai-masterclass-home-match/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamro%2Fgenai-masterclass-home-match/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jamro","download_url":"https://codeload.github.com/jamro/genai-masterclass-home-match/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamro%2Fgenai-masterclass-home-match/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279445424,"owners_count":26171513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-17T02:00:07.504Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-07T21:56:43.446Z","updated_at":"2025-10-18T00:48:37.267Z","avatar_url":"https://github.com/jamro.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# genai-masterclass-home-match\nGenAI Masterclass Project submission for Udacity's \"Building Generative AI Solutions\" training\n\n## Quick Start\n\n```bash\nmake setup          # Create virtual environment and install dependencies\nmake check-env      # Verify .env file is properly configured\nmake run            # Start Jupyter Lab\n```\n\n## Project Overview\n\nThis implementation demonstrates a **GenAI-powered real estate matching system** that showcases following AI concepts:\n\n1. **Synthetic Data Generation**: Using LLMs to generate 500 realistic real estate listings across Polish cities\n2. **Vector Database Creation**: Storing and organizing listing embeddings for semantic search with ChromaDB\n3. **Semantic Search**: Finding relevant properties based on natural language buyer preferences\n4. **Augmented Response Generation**: Personalizing listings using LLM-generated descriptions\n\n### Key Features\n\n- **500 AI-generated listings** across 20+ Polish cities with realistic pricing (300K-2.5M PLN)\n- **Semantic search** using OpenAI embeddings and ChromaDB vector database\n- **Cross-encoder reranking** for improved relevance scoring\n- **Personalized descriptions** that emphasize buyer-relevant features while preserving facts\n- **Metadata filtering** for precise property matching (bedrooms, location, price, etc.)\n\n### Tech Stack\n\n- **LLMs**: OpenAI GPT-4.1 for generation and personalization\n- **Vector DB**: ChromaDB with cosine similarity indexing\n- **Embeddings**: OpenAI text-embedding-3-small (1536 dimensions)\n- **Reranking**: BAAI/bge-reranker-base cross-encoder\n- **Framework**: LangChain with Pydantic structured output\n- **Data**: 500 listings in JSON format with rich metadata\n\nFor detailed project requirements and evaluation criteria, see [rubric.md](rubric.md).\n\n## Getting Started\n\nThe project consists of three main Jupyter notebooks:\n\n1. **`generate_listings.ipynb`** - Creates 500 realistic real estate listings using GPT-4.1\n2. **`create_vector_db.ipynb`** - Builds ChromaDB vector database with embeddings\n3. **`search.ipynb`** - Demonstrates semantic search and personalized recommendations\n\nRun them in order, or skip to `search.ipynb` if you already have the data and vector database (included in the project).\n\n## Available Commands\n\nRun `make help` to see all available commands:\n\n- `make setup` - Create virtual environment and install dependencies\n- `make install` - Install dependencies from requirements.txt\n- `make run` - Start Jupyter Lab\n- `make freeze` - Update requirements.txt with current dependencies\n- `make check-env` - Check if .env file exists and has OPENAI_API_KEY\n- `make clean` - Remove virtual environment and cache files\n\n## Manual Setup (Alternative)\n\nIf you prefer manual setup:\n\n```bash\npython -m venv venv\nsource venv/bin/activate      # macOS / Linux\n# or:\nvenv\\Scripts\\activate         # Windows\n\npip install -r requirements.txt\n```\n\nCreate `.env` file and set `OPENAI_API_KEY=your_api_key_here`.\n\n## Data Structure\n\n- **`data/raw/`** - 500 JSON files containing generated real estate listings\n- **`data/embeddings/`** - Cached embeddings for each listing (auto-generated)\n- **`data/.chroma_db/`** - ChromaDB vector database (auto-created)\n\nEach listing includes structured data like bedrooms, bathrooms, price, location, features, and lifestyle benefits.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamro%2Fgenai-masterclass-home-match","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjamro%2Fgenai-masterclass-home-match","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamro%2Fgenai-masterclass-home-match/lists"}