{"id":23070558,"url":"https://github.com/sametcodes/product-taxonomy","last_synced_at":"2025-08-15T13:33:07.386Z","repository":{"id":167880231,"uuid":"631645691","full_name":"sametcodes/product-taxonomy","owner":"sametcodes","description":"Classify your e-commerce products into categories of well-known e-commerce platforms. It uses OpenAI embeddings and LangChain.","archived":false,"fork":false,"pushed_at":"2024-02-08T15:19:01.000Z","size":58084,"stargazers_count":18,"open_issues_count":3,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-09T22:33:07.753Z","etag":null,"topics":["amazon","category","embeddings","langchain","openai","product","shopify","taxonomy"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sametcodes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-04-23T17:14:24.000Z","updated_at":"2025-07-14T22:09:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"83abbdc3-e647-46c1-be2b-cc4985346503","html_url":"https://github.com/sametcodes/product-taxonomy","commit_stats":{"total_commits":24,"total_committers":1,"mean_commits":24.0,"dds":0.0,"last_synced_commit":"bd21c0019486cb30161e5a3253cabb85d9a60db3"},"previous_names":["sametcodes/product-taxonomy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sametcodes/product-taxonomy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sametcodes%2Fproduct-taxonomy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sametcodes%2Fproduct-taxonomy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sametcodes%2Fproduct-taxonomy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sametcodes%2Fproduct-taxonomy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sametcodes","download_url":"https://codeload.github.com/sametcodes/product-taxonomy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sametcodes%2Fproduct-taxonomy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270578382,"owners_count":24610036,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-15T02:00:12.559Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon","category","embeddings","langchain","openai","product","shopify","taxonomy"],"created_at":"2024-12-16T06:27:10.794Z","updated_at":"2025-08-15T13:33:06.676Z","avatar_url":"https://github.com/sametcodes.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"This is a simple Node.js application that classify e-commerce products into the categories of well-known e-commerce websites like Amazon, Shopify and Google Shopping. It supports multiple languages.\n\nIt depends on OpenAI Embeddings API to get the embeddings of categories and products. And it uses Langchain/HNSWLib to build a fast and scalable search index.\n\n### Endpoints\n\nThere are two endpoints mainly. One is to create vectors and the other is for classifying a product into a category. One is based on Pinecone to keep the vectors and the other is based on HNSWLib to keep the vectors locally. Both of them use OpenAI Embeddings API.\n\n#### HNSW-based (recommended, faster)\n- `POST /v2/product/predict/:platform` - creates vectors of categories for the given input file\n- `POST /v2/product/:platform` - returns most-related category for the given product name\n\n#### Pinecone-based (legacy, slower)\n- `POST /category/predict/:platform` - creates vectors of categories for the given input file\n- `POST /category/:platform` - returns most-related category for the given product name\n\n### How to classify new platform categories\n\nFirst, you need to prepare a list of categories of the platform that you wanted to classify. For instance, [this file](https://help.shopify.com/txt/product_taxonomy/en.txt) is the product taxonomy file of Shopify. You can use it to create a list of categories. The input file should be TSV format, and the first column should be the ID and the second column should be the category name. You can find a sample input file under `sample` folder.\n\nYou can also find Postman collection in `postman.json` file.\n\n## Getting Started\n\n### Installation\n\n```bash\nnpm install\n```\n\n### Development\n\nTo run the application in development mode:\n\n```bash\nnpm start\n```\n\n### Production\n\nTo run the application in production mode:\n\n```bash\nnpm run build\nnpm run start:live\n```\n## Bonus: OpenAI ChatGPT Plugin\n\nThis repository also includes a plugin for OpenAI ChatGPT. You can use it to generate a response for a given message. You can find the plugin under `static` folder. If you have access to ChatGPT Plugins, you can load it from `localhost:6006` and use it by asking a question like `What is the best category for a product with name \"iPhone 12 Pro Max\"?`. Do not forget to run the application. The plugin version uses the HNSW-based endpoint (`/v2`).\n\n## Todo\n\n- [ ] Add different human-languages support at platform-level\n- [ ] Remove pinecone-based endpoints once the HNSW-based endpoints are stable\n- [ ] Prepare ready-to-use vectors for well-known platforms\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsametcodes%2Fproduct-taxonomy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsametcodes%2Fproduct-taxonomy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsametcodes%2Fproduct-taxonomy/lists"}