{"id":16210563,"url":"https://github.com/viveckh/qarecewebcrawler","last_synced_at":"2025-03-19T09:30:31.938Z","repository":{"id":101629227,"uuid":"177428679","full_name":"Viveckh/QareceWebCrawler","owner":"Viveckh","description":"This web crawler gathers the latest details, variations, imagery and pricing informations of a catalog of products given their urls from their corresponding online stores and prepares files ready for upload to your e-commerce platfrom. It was built with the purposes of making product additions easier for e-commerce retailers.","archived":false,"fork":false,"pushed_at":"2019-10-29T01:12:31.000Z","size":2471,"stargazers_count":8,"open_issues_count":0,"forks_count":4,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-28T17:25:13.316Z","etag":null,"topics":["catalog","e-commerce","ecommerce-platfrom","ecommerce-retailers","find-products","imagery","kyliejenner","macys","pricing-informations","qarece-web-crawler","scrapy","sephora","startup-code","startup-resources","startup-template","startups","webscraper","woocommerce","woocommerce-extension","wordpress-plugin"],"latest_commit_sha":null,"homepage":"https://www.qarece.com","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Viveckh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-24T14:51:37.000Z","updated_at":"2025-01-24T12:19:26.000Z","dependencies_parsed_at":"2023-07-18T02:46:05.497Z","dependency_job_id":null,"html_url":"https://github.com/Viveckh/QareceWebCrawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Viveckh%2FQareceWebCrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Viveckh%2FQareceWebCrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Viveckh%2FQareceWebCrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Viveckh%2FQareceWebCrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Viveckh","download_url":"https://codeload.github.com/Viveckh/QareceWebCrawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243980241,"owners_count":20378366,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["catalog","e-commerce","ecommerce-platfrom","ecommerce-retailers","find-products","imagery","kyliejenner","macys","pricing-informations","qarece-web-crawler","scrapy","sephora","startup-code","startup-resources","startup-template","startups","webscraper","woocommerce","woocommerce-extension","wordpress-plugin"],"created_at":"2024-10-10T10:39:20.554Z","updated_at":"2025-03-19T09:30:31.928Z","avatar_url":"https://github.com/Viveckh.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Qarece Web Crawler\n\n## Author: [(EJ) Vivek Pandey](https://viveckh.com) / [Sujil Maharjan](https://sujilmaharjan.com) / Arjun Bastola\n\n## Save hours spent on manually adding and updating products to your e-commerce site and do it in seconds!\n\n### This web crawler gathers the latest details, variations, imagery and pricing informations of a catalog of products given their urls from their corresponding online stores and prepares files ready for upload to your e-commerce platfrom. It was built with the purposes of making product additions easier for e-commerce retailers.\n\n## Supported Ecommerce Stores\n\n- Macys\n- Sephora\n- 6pm (Future Enchancement)\n- Amazon (Future Enhancement)\n\n## Output Files Customized for Platforms\n\n- Woocommerce (Default)\n\n## Important Notes\n\n- The internal category mapping, profit rates, and shipping rates are to be adjusted by you according to your business logic in the reference files located at `product_scraper/product_scraper/reference_data` folder. The current values are default placeholders.\n\n## Custom Enhancements for Your Business\n\nThis crawler prototype was initially built as per the needs of Qarece - New York. If you want a custom implementation or enhancement for your e-commerce business, reach out to the author at `viveckh@hotmail.com`.\n\nDevelopment rates are on a hourly basis or project basis depending on your needs.\n\n## How to Run\n\n- Clone the Repo (Duh)\n- Make sure you have Python, Scrapy and Pandas installed in your system\n- Add the urls of products you want to crawl by going to `/product_scraper/product_scraper/input/urls.txt`. Check the supported e-commerce stores above.\n- Navigate back to the base of the scraper where the scrapy.cfg file is. `cd /product_scraper`\n- `python setup.py`\n- `scrapy crawl products-spider`\n- `python clean-scraped-results.py`\n- The output file to upload in woocommerce is ready at `/product_scraper/product_scraper/output/Product-formatted-for-wp.csv`. Run it twice, once in a new products import mode, and again in update mode to update existing products\n- The output file to hide products that are no longer available in market will be ready at `/product_scraper/product_scraper/output/products_to_unpublish.csv`. Run it once in update mode to hide products that exist in your catalog but are not available in the market anymore.\n- Any products that could not be crawled successfully with associated errors will be detailed in `/product_scraper/product_scraper/output/errors.txt`\n\n## TODO\n- An automated way to update catalog in woocommerce using output files without having to upload manually\n- An automated script to prepare `urls.txt` by exporting the catalog from woocommerce in csv, and making a list of product's market urls\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviveckh%2Fqarecewebcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fviveckh%2Fqarecewebcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviveckh%2Fqarecewebcrawler/lists"}