https://github.com/sametcodes/product-taxonomy
Classify your e-commerce products into categories of well-known e-commerce platforms. It uses OpenAI embeddings and LangChain.
https://github.com/sametcodes/product-taxonomy
amazon category embeddings langchain openai product shopify taxonomy
Last synced: about 2 months ago
JSON representation
Classify your e-commerce products into categories of well-known e-commerce platforms. It uses OpenAI embeddings and LangChain.
- Host: GitHub
- URL: https://github.com/sametcodes/product-taxonomy
- Owner: sametcodes
- Created: 2023-04-23T17:14:24.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-08T15:19:01.000Z (over 1 year ago)
- Last Synced: 2025-08-09T22:33:07.753Z (2 months ago)
- Topics: amazon, category, embeddings, langchain, openai, product, shopify, taxonomy
- Language: TypeScript
- Homepage:
- Size: 55.4 MB
- Stars: 18
- Watchers: 1
- Forks: 4
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This is a simple Node.js application that classify e-commerce products into the categories of well-known e-commerce websites like Amazon, Shopify and Google Shopping. It supports multiple languages.
It depends on OpenAI Embeddings API to get the embeddings of categories and products. And it uses Langchain/HNSWLib to build a fast and scalable search index.
### Endpoints
There are two endpoints mainly. One is to create vectors and the other is for classifying a product into a category. One is based on Pinecone to keep the vectors and the other is based on HNSWLib to keep the vectors locally. Both of them use OpenAI Embeddings API.
#### HNSW-based (recommended, faster)
- `POST /v2/product/predict/:platform` - creates vectors of categories for the given input file
- `POST /v2/product/:platform` - returns most-related category for the given product name#### Pinecone-based (legacy, slower)
- `POST /category/predict/:platform` - creates vectors of categories for the given input file
- `POST /category/:platform` - returns most-related category for the given product name### How to classify new platform categories
First, you need to prepare a list of categories of the platform that you wanted to classify. For instance, [this file](https://help.shopify.com/txt/product_taxonomy/en.txt) is the product taxonomy file of Shopify. You can use it to create a list of categories. The input file should be TSV format, and the first column should be the ID and the second column should be the category name. You can find a sample input file under `sample` folder.
You can also find Postman collection in `postman.json` file.
## Getting Started
### Installation
```bash
npm install
```### Development
To run the application in development mode:
```bash
npm start
```### Production
To run the application in production mode:
```bash
npm run build
npm run start:live
```
## Bonus: OpenAI ChatGPT PluginThis repository also includes a plugin for OpenAI ChatGPT. You can use it to generate a response for a given message. You can find the plugin under `static` folder. If you have access to ChatGPT Plugins, you can load it from `localhost:6006` and use it by asking a question like `What is the best category for a product with name "iPhone 12 Pro Max"?`. Do not forget to run the application. The plugin version uses the HNSW-based endpoint (`/v2`).
## Todo
- [ ] Add different human-languages support at platform-level
- [ ] Remove pinecone-based endpoints once the HNSW-based endpoints are stable
- [ ] Prepare ready-to-use vectors for well-known platforms