{"id":37776847,"url":"https://github.com/basalam/product-catalog-generator","last_synced_at":"2026-01-16T15:00:50.273Z","repository":{"id":233508955,"uuid":"783244209","full_name":"basalam/product-catalog-generator","owner":"basalam","description":"Code, datasets, and models designed to generate product catalogs using LLMs.","archived":false,"fork":false,"pushed_at":"2024-09-07T11:35:56.000Z","size":79,"stargazers_count":29,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-09-07T12:27:59.236Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/basalam.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-07T10:49:43.000Z","updated_at":"2024-09-07T11:35:59.000Z","dependencies_parsed_at":"2024-06-09T23:01:08.108Z","dependency_job_id":null,"html_url":"https://github.com/basalam/product-catalog-generator","commit_stats":null,"previous_names":["basalam/product-catalog-generator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/basalam/product-catalog-generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basalam%2Fproduct-catalog-generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basalam%2Fproduct-catalog-generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basalam%2Fproduct-catalog-generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basalam%2Fproduct-catalog-generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/basalam","download_url":"https://codeload.github.com/basalam/product-catalog-generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basalam%2Fproduct-catalog-generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28479406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-16T15:00:35.190Z","updated_at":"2026-01-16T15:00:50.247Z","avatar_url":"https://github.com/basalam.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Product catalog generator \n\nThis repo is the source code for a custom LLM and VLM fine tuned on [LLama 2](https://huggingface.co/docs/transformers/en/model_doc/llama2) and [Llava1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf) based on [Basalam](https://basalam.com/) products to infer enitty (product types) and attributes based on product data. You can use it on any similar dataset.\n\n\n## Datasets\n\n### Dataset V1 generated using GPT-3.5\n[GPT-3.5 generated product data](https://huggingface.co/datasets/BaSalam/entity-attribute-dataset-GPT-3.5-generated-v1)\n### Dataset V2 generated using GPT-4\n[GPT-4 generated product data](https://huggingface.co/datasets/BaSalam/entity-attribute-sft-dataset-GPT-4.0-generated-v1)\n\n### Dataset for Vision catalog generated using GPT-4 (✅New)\n[GPT-4 generated product data](https://huggingface.co/datasets/BaSalam/vision-catalogs-llava-format-v3)\n\n## Models\n\n### Sft model version 1 based on llama 2 and GPT-3.5 data.\n[Model V1](https://huggingface.co/BaSalam/Llama2-7b-entity-attr-v1)\n\n### Sft model version 2 based on llama 2 and GPT-4 data.\n[Model V2](https://huggingface.co/BaSalam/Llama2-7b-entity-attr-v2)\n\n### Sft model based on Llava1.5 and GPT-4 data (✅New)\n[Model](https://huggingface.co/BaSalam/Llava-1.5-7b-hf-bslm-product-attributes-v0)\n### LoRA file for Vision Catalog\n[Model](https://huggingface.co/BaSalam/llava1.5-7b-bslm-products-vision-catalog-lora)\n\n\n## Evaluation\n\n| model    | train loss | val loss | download                                                                                                                          \n|----------|------------|----------|-----------------------------------------------------------------------------------------------------------------------------------|\n| Model V1 | 0.07       | 0.08     | [Sft model version 1 based on llama 2 and GPT-3.5 data.](https://huggingface.co/BaSalam/Llama2-7b-entity-attr-v1)                 \n| Model V2 | 0.1        | 0.12     | [Sft model version 2 based on llama 2 and GPT-4 data.](https://huggingface.co/BaSalam/Llama2-7b-entity-attr-v2)                   |\n| vision   | 0.11       | 0.13     | [Sft model based on llava1.5 and GPT-4 data.](https://huggingface.co/BaSalam/Llava-1.5-7b-hf-bslm-product-attributes-v0)          |\n\n## Motivations\n\nProblem definition and roadmap to solve it (in Persian). [Virgool link](https://experience.basalam.com/%D9%85%D8%B3%D8%A7%D9%84%D9%87-%D8%AA%D8%B4%D8%AE%DB%8C%D8%B5-%D9%85%D8%AD%D8%B5%D9%88%D9%84%D8%A7%D8%AA-%D8%A8%D8%A7%D8%B3%D9%84%D8%A7%D9%85-%DB%8C%DA%A9-%D8%AA%D8%AC%D8%B1%D8%A8%D9%87-%D8%B9%D9%85%D9%84%DB%8C-%D8%A7%D8%B2-%D8%A8%D9%87-%DA%A9%D8%A7%D8%B1%DA%AF%DB%8C%D8%B1%DB%8C-llm%D9%87%D8%A7-m8sr2xt1dhdk).\n\n\n## How to use\n\n\n**Train:**\n\nTo finetune a new model, you can either create a new YAML configuration file with your specific parameters or modify an existing one. You'll find example configuration files in the src/train/ directory (**[config](https://github.com/basalam/product-catalog-generator/blob/main/src/train/v1.yaml), **[config](https://github.com/basalam/product-catalog-generator/blob/main/src/train/v2.yaml)). The default base model is `NousResearch/Llama-2-7b-chat-hf`, but you are free to change it. It's advisable to adjust the LoRA parameters accordingly if you do. Tailor other parameters to the needs of your task and dataset.\n\nTo initiate finetuning, navigate to the src directory and start the process with:\n\n    python -m train.train_wrapper --version v1\n\nHere, --version v1 corresponds to the version of the finetuning configuration, which should match the name of your YAML file.\n\nThe training process includes several steps:\n\n    1- Parameter Initialization: Loads parameters from the specified YAML file.\n    2- Dataset Loading:\n        - Retrieves the dataset from the Hugging Face hub using the _create_datasets_ function.\n    3- Training Execution: Handled by the _run_training_ method from the **[training_module](https://github.com/basalam/product-catalog-generator/blob/main/src/train/training.py):\n        - Initializes configuration settings.\n        - Prepares the model and tokenizer.\n        - Sets up training arguments.\n        - Begins the training cycle.\n        - After completing the last iteration, saves the model, merging LoRA configurations with the base model using peft.\n        - Uploads the trained model to the Hugging Face hub.\n        - Concludes the process!\n\nThis structured approach ensures comprehensive management and execution of the model training process.\n\n**Inference:**\n\nFor inference we use llm inference engine [vllm](https://github.com/vllm-project/vllm).\n\nInference config such as model, prompt and response templates are located at ````BASE_DIR/configs/inference/````.\nStart by running inference_wrapper in inference directory. The process is as follows:\n1.  Reading config from config file\n2.  Running _inference_model_ from **[vllm_engine](https://github.com/basalam/product-catalog-generator/blob/main/inference/vllm_engine.py)** module \n    - Args is read\n    - LLM inference engine is built \n    - For each sample input (prompt + input values (typically a product information)), a response is generated\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasalam%2Fproduct-catalog-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbasalam%2Fproduct-catalog-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasalam%2Fproduct-catalog-generator/lists"}