{"id":31132016,"url":"https://github.com/kausmeows/clothsy","last_synced_at":"2025-09-18T04:44:24.672Z","repository":{"id":167858601,"uuid":"643485206","full_name":"kausmeows/clothsy","owner":"kausmeows","description":"Transformer based search/rec engine to fetch Amazon URLs for similar clothing items given a text description","archived":false,"fork":false,"pushed_at":"2023-06-07T12:42:40.000Z","size":87816,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-02-29T13:32:10.471Z","etag":null,"topics":["fastapi","nlp","transformer"],"latest_commit_sha":null,"homepage":"https://huggingface.co/spaces/kausmos/clothsy","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kausmeows.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-21T10:12:25.000Z","updated_at":"2024-02-29T13:32:11.577Z","dependencies_parsed_at":null,"dependency_job_id":"d70dcbda-feb9-4f0f-85f4-23236d8b3259","html_url":"https://github.com/kausmeows/clothsy","commit_stats":null,"previous_names":["kaustubh-s1/clothing-similarity-search","kausmeows/clothsy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kausmeows/clothsy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kausmeows%2Fclothsy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kausmeows%2Fclothsy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kausmeows%2Fclothsy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kausmeows%2Fclothsy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kausmeows","download_url":"https://codeload.github.com/kausmeows/clothsy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kausmeows%2Fclothsy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275711548,"owners_count":25514201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-18T02:00:09.552Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","nlp","transformer"],"created_at":"2025-09-18T04:44:20.699Z","updated_at":"2025-09-18T04:44:24.656Z","avatar_url":"https://github.com/kausmeows.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: Clothsy\nemoji: 👕\ncolorFrom: purple\ncolorTo: purple\nsdk: gradio\nsdk_version: 3.24.1\napp_file: main.py\npinned: false\n---\n\n## [HF Space Demo](https://huggingface.co/spaces/kausmos/clothsy)\n![HF](assets/hf_space.png)\n\n## [Working Demo](https://youtu.be/LZ-mWgL5qx4)\n[![Watch the video](assets/demo.png)](https://youtu.be/LZ-mWgL5qx4)\n\n## Data Collection\nTo scrape quality clothing data containing proper description and url for the product I used `Apify's` [Amazon Product Scraper](https://blog.apify.com/step-by-step-guide-to-scraping-amazon/#step-1-go-to-amazon-product-scraper-on-apify-store)\nBy creating an account and logging into the console we can input links of the amazon fashion category like- `Men's Fashion -\u003e Shirts`\n\nI downloaded all the scraped data for various clothing categories into a CSV file with columns `url|title|description`\n\nApify Console\n![Apify Console](assets/apify.png)\n\nThe full data consists of 2900 different clothing products of men and women, it can be found at `data/clothing_similarity_search.csv`\n\n## Data Cleaning\nI used `pandas` to clean the data and preprocess the text data by cleaning it (remove special characters, lowercasing, etc.), and possibly by applying some form of text normalization (like stemming or lemmatization).\n\n## Making Embeddings\n`sentence-transformers` has been used to make embeddings for the cleaned data. I used `all-MiniLM-L6-v2` model to make the embeddings. The model card can be found [here](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)\n```py\nfrom sentence_transformers import SentenceTransformer\nsentences = [\"This is an example sentence\", \"Each sentence is converted\"]\n\nmodel = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')\nembeddings = model.encode(sentences)\nprint(embeddings)\n\nThe choice of this model selection was based on its small size and good accuracy which favors the API response speed\n```\n\nThe embeddings generated for the whole dataset has been saved into a `.npy` at `/data/embeddings.npy` file which can be loaded and used for similarity search retrieval. This makes sure searching takes place via vector-similarity which is faster.\n\nI used the `cosine similarity` to find the similarity between the embeddings of the query and the embeddings of the products.\n\n## API\nUsed `FastAPI` to create the API. The API has a single endpoint `/predict` which takes a query string and returns the top 5 most similar products as json\n\n```py\nWe hit the endpoint http://0.0.0.0:8080/predict with a JSON payload as \n{\n    \"query\": \"Men's winter jacket black and white\"\n}\n\nThis will return\n{\n  \"similar_urls\": [\n    \"https://www.amazon.in/dp/B082L3BGGM\",\n    \"https://www.amazon.in/dp/B08KWFRY6W\",\n    \"https://www.amazon.in/dp/B08Q3VBFPD\",\n    \"https://www.amazon.com/dp/B07S1LMK58\",\n    \"https://www.amazon.in/dp/B0B8YY38VF\"\n  ]\n}\n```\n\n## Deployment\nI used `Docker` to containerize the API. Was trying to use Google Cloud Functions to deploy the endpoint but faced some issues since it was my first time using GCP:-\n- Wasn't able to load the `embeddings.npy` file from cloud storage into the cloud function. Some help on this would be appreciated.\n\n## Running Locally\n- Clone the repo\n- Make a virtual environment\n- Install the dependencies `pip install -r requirements.txt`\n- Run the server `python main.py`","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkausmeows%2Fclothsy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkausmeows%2Fclothsy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkausmeows%2Fclothsy/lists"}