{"id":21418456,"url":"https://github.com/bsovs/winhacks-ml-pipeline","last_synced_at":"2025-06-20T19:33:28.395Z","repository":{"id":103642987,"uuid":"351923040","full_name":"bsovs/winhacks-ml-pipeline","owner":"bsovs","description":"recommendation encoder for home data using Spotify's annoy package","archived":false,"fork":false,"pushed_at":"2021-03-28T01:19:29.000Z","size":18,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-13T09:02:38.506Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bsovs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-26T22:04:53.000Z","updated_at":"2021-03-28T02:48:24.000Z","dependencies_parsed_at":"2023-05-24T04:45:46.814Z","dependency_job_id":null,"html_url":"https://github.com/bsovs/winhacks-ml-pipeline","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsovs%2Fwinhacks-ml-pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsovs%2Fwinhacks-ml-pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsovs%2Fwinhacks-ml-pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bsovs%2Fwinhacks-ml-pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bsovs","download_url":"https://codeload.github.com/bsovs/winhacks-ml-pipeline/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243918641,"owners_count":20368745,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-22T19:21:31.813Z","updated_at":"2025-03-16T19:24:40.294Z","avatar_url":"https://github.com/bsovs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Annoy](https://badgen.net/badge/Powered%20by/Annoy/blue)](https://github.com/spotify/annoy)\n\n# winhacks ml pipeline\nThis is a reference encoder for a home recommendation system created for winhacks 2021.\n\n## About\n### Encoder\n- Pulls down all home data stored on a bigQuery instance and transforms it to a dataframe\n- All text based attributes are assigned to a label encoder (saved to pkl files for query reference later)\n- Each row is converted to a vector and added to the annoy index\n- X trees are created with these vectors to the n nearest neighbours can be looked up very quick in memory\n\n## FastApi App\n### Endpoints Available\n- `/run/model/update` (update annoy model for all homes)\n- `/run/model/fit` (update collaborative rating in bigQuery) *needs payed deployment\n- `/run/profile/update` (return an average vector for a user based on the homes they like)\n- `/query/by-embed` (query n best fit homes for a vector)\n- `/query/by-attributes` (query n best fit homes by given attributes)\n\n### BigQuery\n- example query to insert homes:\n```angular2\nINSERT INTO `winhacks-308216.homes_data.sample` (\n        id,\n        created_on,\n        operation,\n        property_type,\n        place_name,\n        place_with_parent_names,\n        country_name,\n        state_name,\n        geonames_id,\n        lat_lon,\n        lat,\n        lon,\n        price,\n        currency,\n        price_aprox_local_currency,\n        price_aprox_usd,\n        surface_total_in_m2,\n        surface_covered_in_m2,\n        price_usd_per_m2,\n        price_per_m2,\n        floor,\n        rooms,\n        expenses,\n        properati_url,\n        description,\n        title,\n        images\n)\n VALUES \n    (GENERATE_UUID(),\n     CURRENT_DATE(),\n     'sell',\n        'house',\n        '67 Shelborne Ave',\n        null,\n        'Canada',\n        'Ontario',\n        null,\n        '43.7675433, -79.2738102',\n        43.7675433,\n        -79.2738102,\n        4199000,\n        'CAD',\n        4199000,\n        3311111,\n        48009,\n        48009,\n        null,\n        null,\n        null,\n        6,\n        null,\n        'https://www.zillow.com/homedetails/67-Shelborne-Ave-Toronto-ON-M5N-1Z2/2076679623_zpid/?',\n        'A Truly Spectacular New Custom Home With High Quality Finishes Throughout. The Attention To Detail And Fine Craftsmanship Is Very Noticeable Upon Inspection. A Chef\\'s Kitchen With Two Dishwashers, Two Sinks, Make Meal Prep An Occasion. Tarion New Home Warranty Included. High Ceilings And Heated Floors In All Tiled Areas, Main Floor, Bsmt And All Bathrooms. Snow Melt System In Driveway. Built In Sound System. Control 4 Electronic Lighting System.(Programmable)',\n        '67 Shelborne Ave, Toronto, ON',\n        [\n        'https://photos.zillowstatic.com/fp/5538f14b835aaf729b65aa2cc27c6747-cc_ft_768.jpg',\n        'https://photos.zillowstatic.com/fp/04c6dcb90326ed32c513cc3e65058e74-uncropped_scaled_within_1536_1152.webp', \n        'https://photos.zillowstatic.com/fp/7b58101189039bef8749b80d63ecc8e3-uncropped_scaled_within_1536_1152.webp',\n        'https://photos.zillowstatic.com/fp/9ceb2ab6e26918c50abcaab346977873-uncropped_scaled_within_1536_1152.webp',\n        'https://photos.zillowstatic.com/fp/93315d75c9a3484dc704f5330e00271d-uncropped_scaled_within_1536_1152.webp'\n        ]\n)\n```\n- query by distance and filter\n```angular2\nWITH params AS (\n  SELECT ST_GeogPoint(@longitude, @latitude) AS center,\n         @limit AS maxn_homes,\n         @radius AS maxdist_km\n),\ndistance_from_center AS (\n  SELECT\n    id,\n    created_on,\n    operation,\n    property_type,\n    place_name,\n    place_with_parent_names,\n    country_name,\n    state_name,\n    geonames_id,\n    lat_lon,\n    lat,\n    lon,\n    price,\n    currency,\n    price_aprox_local_currency,\n    price_aprox_usd,\n    surface_total_in_m2,\n    surface_covered_in_m2,\n    price_usd_per_m2,\n    price_per_m2,\n    floor,\n    rooms,\n    expenses,\n    properati_url,\n    description,\n    title,\n    images,\n    ST_GeogPoint(lon, lat) AS loc,\n    ST_Distance(ST_GeogPoint(lon, lat), params.center) AS dist_meters\n  FROM\n    `winhacks-308216.homes_data.sample`,\n    params\n  WHERE ST_DWithin(ST_GeogPoint(lon, lat), params.center, params.maxdist_km*1000)\n),\nnearest_homes AS (\n  SELECT \n    *, \n    RANK() OVER (ORDER BY dist_meters ASC) AS rank\n  FROM \n    distance_from_center\n),\nfiltered_homes AS (\n  SELECT \n    station.*\n  FROM \n    nearest_homes AS station, params\n  WHERE \n    id NOT IN UNNEST(@filter)\n),\nnearest_nhomes AS (\n  SELECT \n    station.* \n  FROM \n    filtered_homes AS station, params\n  WHERE \n    rank \u003c= params.maxn_homes\n)\nSELECT * from nearest_nhomes\n```\n- run fit on user interactions\n```angular2\nCREATE OR REPLACE MODEL `winhacks-308216.ml.model`\nOPTIONS\n (model_type='matrix_factorization',\n  feedback_type='implicit',\n  user_col='profile_id',\n  item_col='home_id',\n  rating_col='rating',\n  l2_reg=30,\n  num_factors=15) AS\nSELECT\n profile_id,\n home_id,\n rating\nFROM `winhacks-308216.profiles_data.sample`\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbsovs%2Fwinhacks-ml-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbsovs%2Fwinhacks-ml-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbsovs%2Fwinhacks-ml-pipeline/lists"}