{"id":20822994,"url":"https://github.com/robmch/mindfactory_crawling","last_synced_at":"2025-05-07T16:46:02.632Z","repository":{"id":57441687,"uuid":"161233131","full_name":"RobMcH/mindfactory_crawling","owner":"RobMcH","description":"A Python 3 Crawler for Mindfactory.de","archived":false,"fork":false,"pushed_at":"2022-10-03T17:56:23.000Z","size":39,"stargazers_count":4,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-31T12:01:55.183Z","etag":null,"topics":["crawler","crawling","data","webcrawler","webcrawling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RobMcH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-12-10T20:33:56.000Z","updated_at":"2024-03-05T23:29:30.000Z","dependencies_parsed_at":"2022-09-26T17:20:52.773Z","dependency_job_id":null,"html_url":"https://github.com/RobMcH/mindfactory_crawling","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobMcH%2Fmindfactory_crawling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobMcH%2Fmindfactory_crawling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobMcH%2Fmindfactory_crawling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RobMcH%2Fmindfactory_crawling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RobMcH","download_url":"https://codeload.github.com/RobMcH/mindfactory_crawling/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252917207,"owners_count":21824903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","crawling","data","webcrawler","webcrawling"],"created_at":"2024-11-17T22:16:50.239Z","updated_at":"2025-05-07T16:46:02.561Z","avatar_url":"https://github.com/RobMcH.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Mindfactory.de Crawler\nThis repository contains a crawler for [Mindfactory](https://www.mindfactory.de), a German eCommerce shop (for computer hardware).\nThe crawler extracts the data contained on every single product page and stores the scraped products and reviews in a SQLite database consisting of two tables.  \n\nEach product has the following properties:  \n* ID (SQLite identifier)\n* URL\n* Product name\n* Brand name\n* Category (i.e. CPU)\n* EAN\n* SKU\n* Items sold (count)\n* People watching (count)\n* RMA quote (in percent)\n* Average rating (from 1.0 to 5.0)\n* Shipping (information on availability)\n* Price (in Euro)  \n\nAdditionally, for every product all reviews are collected and stored in a separate SQLite table. An entry in this table has the following properties:\n* Product ID (reference to the corresponding ID in the product table)\n* Stars (rating, from 1 to 5)\n* Text (not tokenized/pre-processed in any kind)\n* Author\n* Date (YYYY-MM-DD)\n* Verified (if the customer actually bought the product at Mindfactory)\n\n# Prerequisites  \n* Python 3 (\u003e= 3.5)\n* scrapy (\u003e= 1.6.0)\n* SQLite3\n\n# Run the scraper  \n    scrapy crawl mindfactory_products\n    \n# Deploy the scraper\nThe scraper can be deployed using scrapyd. In order to do that, just run [scrapyd-deploy](https://github.com/scrapy/scrapyd-client#scrapyd-deploy)\nwith the address to the server running scrapyd. Afterwards the scraper can be used with scrapyd.\n\n    python scrapyd-deploy    \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobmch%2Fmindfactory_crawling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobmch%2Fmindfactory_crawling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobmch%2Fmindfactory_crawling/lists"}