{"id":13585610,"url":"https://github.com/Crinibus/scraper","last_synced_at":"2025-04-07T10:31:16.340Z","repository":{"id":39621349,"uuid":"270408328","full_name":"Crinibus/scraper","owner":"Crinibus","description":"Web scraper for scraping, tracking and visualizing prices of products on various websites.","archived":false,"fork":false,"pushed_at":"2024-02-06T21:17:57.000Z","size":2227,"stargazers_count":110,"open_issues_count":8,"forks_count":22,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-02-14T21:13:13.570Z","etag":null,"topics":["amazon","avcables","computersalg","coolshop","ebay","elgiganten","expert","komplett","mm-vision","newegg","prices","products","proshop","python","scrape-prices","scraper","sharkgaming","shein","tech-scraper","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Crinibus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-07T19:21:29.000Z","updated_at":"2024-07-13T14:51:55.531Z","dependencies_parsed_at":"2023-10-15T11:20:50.453Z","dependency_job_id":"4ae2310d-6e13-4e38-8590-1e094c939705","html_url":"https://github.com/Crinibus/scraper","commit_stats":null,"previous_names":[],"tags_count":62,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Crinibus%2Fscraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Crinibus%2Fscraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Crinibus%2Fscraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Crinibus%2Fscraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Crinibus","download_url":"https://codeload.github.com/Crinibus/scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247636234,"owners_count":20970887,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon","avcables","computersalg","coolshop","ebay","elgiganten","expert","komplett","mm-vision","newegg","prices","products","proshop","python","scrape-prices","scraper","sharkgaming","shein","tech-scraper","web-scraping"],"created_at":"2024-08-01T15:05:02.473Z","updated_at":"2025-04-07T10:31:16.071Z","avatar_url":"https://github.com/Crinibus.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Table of contents\n- [Intro](#intro)\n- [Contributing](#contributing)\n- [Installation](#installation)\n- [Add products](#add-products)\n    - [Websites to scrape from](#websites-to-scrape-from)\n- [Scrape products](#scrape-products)\n- [Delete data](#delete-data)\n- [User settings](#user-settings)\n- [Clean up data](#clean-up-data)\n- [View the latest datapoint of product(s)](#view-the-latest-datapoint-of-products)\n- [View all products](#view-all-products)\n- [Visualize data](#visualize-data)\n    - [Command examples](#command-examples)\n\n\u003cbr/\u003e\n\n\n## Intro \u003ca name=\"intro\"\u003e\u003c/a\u003e\nWith this program you can easily scrape and track prices on product at multiple [websites](#websites-to-scrape-from). \u003cbr/\u003e\nThis program can also visualize price over time of the products being tracked. That can be helpful if you want to buy a product in the future and wants to know if a discount might be around the corner.\n\n**Requires** `python 3.10+`\n\n\u003cbr/\u003e\n\n\n## Contributing \u003ca name=\"contributing\"\u003e\u003c/a\u003e\nFeel free to fork the project and create a pull request with new features or refactoring of the code. Also feel free to make issues with problems or suggestions to new features.\n\n\u003cbr/\u003e\n\n\n\u003cdetails\u003e\u003csummary\u003e\u003ch2\u003eUPDATE TO HOW DATA IS STORED IN V1.1\u003c/h2\u003e\u003c/summary\u003e\n\u003cp\u003e\n\nIn version v1.1, I have changed how data is stored in ```records.json```: ```dates``` under each product have been changed to ```datapoints``` and now a list containing dictionaries with ```date``` and ```price``` keys. \u003cbr/\u003e\nIf you want to update your data to be compatible with version v1.1, then open an interactive python session where this repository is located and run the following commands:\n```\n\u003e\u003e\u003e from scraper.format_to_new import Format\n\u003e\u003e\u003e Format.format_old_records_to_new()\n```\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\n\u003cdetails\u003e\u003csummary\u003e\u003ch2\u003eUPDATE TO PRODUCTS.CSV IN V2.3.0\u003c/h2\u003e\u003c/summary\u003e\n\u003cp\u003e\n\nIn version v2.3.0, I have add the column ```short_url``` to ```products.csv```. If you have add products before v2.3.0, then run the following commands in an interactive python session to add the new column:\n```\n\u003e\u003e\u003e from scraper.format_to_new import Format\n\u003e\u003e\u003e Format.add_short_urls_to_products_csv()\n```\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\u003ch2\u003eUPDATE TO HOW DATA IS STORED IN V3.0.0\u003c/h2\u003e\u003c/summary\u003e\n\u003cp\u003e\n\nIn version v3.0.0, I have changed where data is stored from a json file to a SQLite database. If you have data from before v3.0.0, then run the following commands in an interactive python session to add the data from records.json to the database (**OBS: Pandas is required**):\n```\n\u003e\u003e\u003e from scraper.format_to_new import Format\n\u003e\u003e\u003e Format.from_json_to_db()\n```\n\n\u003cbr/\u003e\n\n**NOTE:** This will replace the content in the database with what is in records.json. That means if you have products and/or datapoints in the database but not records.json, they will be deleted.\n\n\n\u003cbr/\u003e\n\nOBS: If you doesn't have Pandas installed run this command:\n```\npip3 install pandas\n```\n\n\u003c/p\u003e\n\u003c/details\u003e\n\n\u003cbr/\u003e\n\n\n## Installation \u003ca name=\"installation\"\u003e\u003c/a\u003e\n**Requires** `python 3.10+`\n\nClone this repository and move into the repository:\n```\ngit clone https://github.com/Crinibus/scraper.git\n```\n```\ncd scraper\n```\n\nThen make sure you have the modules, run this in the terminal:\n```\npip3 install -r requirements.txt\n```\n\n\u003cbr/\u003e\n\n\n## Add products \u003ca name=\"add-products\"\u003e\u003c/a\u003e\nTo add a single product, use the following command, where you replace ```\u003ccategory\u003e``` and ```\u003curl\u003e``` with your category and url:\n```\npython3 main.py -a -c \u003ccategory\u003e -u \u003curl\u003e\n```\n\ne.g.\n```\npython3 main.py -a -c vr -u https://www.komplett.dk/product/1168594/gaming/spiludstyr/vr/vr-briller/oculus-quest-2-vr-briller\n```\n\nThis adds the category (if new) and the product to the records.json file, and adds a line at the end of the products.csv file so the script can scrape price of the new product.\n\n\u003cbr/\u003e\n\nTo add multiple products at once, just add specify another category and url with ```-c \u003ccategory\u003e``` and ```-u \u003curl\u003e```. E.g. with the following command I add two products:\n```\npython3 main.py -a -c \u003ccategory\u003e -u \u003curl\u003e -c \u003ccategory2\u003e -u \u003curl2\u003e\n``` \nThis is equivalent to the above:\n```\npython3 main.py -a -c \u003ccategory\u003e \u003ccategory2\u003e -u \u003curl\u003e \u003curl2\u003e\n```\n\n**OBS**: The url must have a schema like: ```https://``` or ```http://```.\u003cbr/\u003e\n**OBS**: If an error occures when adding a product, then the error might happen because the url has a ```\u0026``` in it, when this happens then just put quotation marks around the url. This should solve the problem. If this doesn't solve the problem then summit a issue.\u003cbr/\u003e\n\n\u003cbr/\u003e\n\n\n### Websites to scrape from \u003ca name=\"websites-to-scrape-from\"\u003e\u003c/a\u003e\nThis scraper can (so far) scrape prices on products from:\n- [Amazon](https://www.amazon.com/)*\n- [eBay.com](https://www.ebay.com/)\n- [Komplett.dk](https://www.komplett.dk/)\n- [Proshop.dk](https://www.proshop.dk/)\n- [Computersalg.dk](https://www.computersalg.dk/)\n- [Elgiganten.dk](https://www.elgiganten.dk/) \u0026 [Elgiganten.se](https://www.elgiganten.se/)\n- [AvXperten.dk](https://www.avxperten.dk/)\n- [Av-Cables.dk](https://www.av-cables.dk/)\n- [Power.dk](https://www.power.dk/)\n- [Expert.dk](https://www.expert.dk/)\n- [MM-Vision.dk](https://www.mm-vision.dk/)\n- [Coolshop.dk](https://www.coolshop.dk/)\n- [Sharkgaming.dk](https://www.sharkgaming.dk/)\n- [Newegg.com](https://www.newegg.com/) \u0026 [Newegg.ca](https://www.newegg.ca/)\n- [HifiKlubben.dk](https://www.hifiklubben.dk/)\n- [Shein.com](https://www.shein.com/)\n\n****OBS these Amazon domains should work: [.com](https://www.amazon.com/), [.ca](https://www.amazon.ca/), [.es](https://www.amazon.es/), [.fr](https://www.amazon.fr/), [.de](https://www.amazon.de/) and [.it](https://www.amazon.it/)\u003cbr/\u003e\nThe listed Amazon domains is from my quick testing with one or two products from each domain.\u003cbr/\u003e\nIf you find that some other Amazon domains works or some of the listed doesn't please create an issue.***\n\n\u003cbr/\u003e\n\n\n## Scrape products \u003ca name=\"scrape-products\"\u003e\u003c/a\u003e\nTo scrape prices of products run this in the terminal:\n```\npython3 main.py -s\n```\nTo scrape with threads run the same command but with the ```--threads``` argument:\n```\npython3 main.py -s --threads\n```\n\n\u003cbr/\u003e\n\n## Activating and deactivating products\n\nWhen you add a new product the product is activated to be scraped. If you wish to not scrape a product anymore, you can deactivate the product with the following command:\n```\npython3 main.py --deactivate --id \u003cid\u003e\n```\n\nYou can activate a product again with the following command:\n```\npython3 main.py --activate --id \u003cid\u003e\n```\n\n\u003cbr/\u003e\n\n## Delete data \u003ca name=\"delete-data\"\u003e\u003c/a\u003e\n\nIf you want to start from scratch with no data in the records.json and products.csv files, then just run the following command:\n```\npython3 main.py --delete --all\n```\n\nYou can also just delete some products or some categories:\n```\npython3 main.py --delete --id \u003cid\u003e\n```\n```\npython3 main.py --delete --name \u003cname\u003e\n```\n```\npython3 main.py --delete --category \u003ccategory\u003e\n```\n\n\nThen just add products like described [here](#add-products).\n\n\u003cbr/\u003e\n\nIf you just want to delete all datapoints for every product, then run this command:\n```\npython3 main.py --reset --all\n```\n\n\nYou can also just delete datapoints for some products:\n```\npython3 main.py --reset --id \u003cid\u003e\n```\n```\npython3 main.py --reset --name \u003cname\u003e\n```\n```\npython3 main.py --reset --category \u003ccategory\u003e\n```\n\n\u003cbr/\u003e\n\n\n## User settings \u003ca name=\"user-settings\"\u003e\u003c/a\u003e\nUser settings can be added and changed in the file settings.ini.\n\n#### ChangeName\nUnder the category ```ChangeName``` you can change how the script changes product names, so similar products will be placed in the same product in records.json file.\n\nWhen adding a new setting under the category ```ChangeName``` in settings.ini, there must be a line with ```key\u003cn\u003e``` and a line with ```value\u003cn\u003e```, where ```\u003cn\u003e``` is the \"link\" between keywords and valuewords. E.g. ```value3``` is the value to ```key3```.\n\nIn ```key\u003cn\u003e``` you set the keywords (seperated by a comma) that the product name must have for to be changed to what ```value\u003cn\u003e``` is equal to. Example if the user settings is the following:\n\n```\n[ChangeName]\nkey1 = asus,3080,rog,strix,oc\nvalue1 = asus geforce rtx 3080 rog strix oc\n```\n\nThe script checks if a product name has all of the words in ```key1```, it gets changed to what ```value1``` is.\n\n#### Scraping\nYou can change the time between each time a url is being request by changing the field ```request_delay``` in the file scraper/settings.ini under the ```Scraping``` section.\n\nDefault is 0 seconds, but to avoid the website you scrape products from thinking you are DDOS attacting them or you being restricted from scraping on their websites temporarily, set the request_delay in settings.ini to a higher number of seconds, e.g. 5 seconds.\n\n\u003cbr/\u003e\n\n\n## Clean up data \u003ca name=\"clean-up-data\"\u003e\u003c/a\u003e\nIf you want to clean up your data, meaning you want to remove unnecessary datapoints (datapoints that have the same price as the datapoint before and after it), then run the following command:\n```\npython3 main.py --clean-data\n```\n\u003cbr/\u003e\n\n\n## Search products and categories\nYou can search for product names and categories you have in your records.json by using the argument ```--search [\u003cword\u003e ...]```. The search is like a keyword search, so e.g. if you enter ```--search logitech``` all product names and categories that contains the word \"logitech\" are found. \n\nYou can search with multiple keywords, just seperate them with a space: ```--search logitech corsair```. Here all the product names and categories that contains the words \"logitech\" or \"corsair\" are found.\n\n\u003cbr/\u003e\n\n\n## View the latest datapoint of product(s) \u003ca name=\"view-the-latest-datapoint-of-products\"\u003e\u003c/a\u003e\nIf you want to view the latest datapoint of a product, you can use the argument ```--latest-datapoint``` with ```--id``` and/or ```--name```.\n\nExample:\n```\npython3 main.py --name \"logitech z533\" --latest-datapoint\n```\n\nThe above command will show the latest datapoint for all the websites the specified product, in this case \"logitech z533\", has been scraped from and will show something like this:\n\n```\nLOGITECH Z533\n\u003e Komplett - 849816\n  - DKK 999.0\n  - 2022-09-12\n\u003e Proshop - 2511000\n  - DKK 669.0\n  - 2022-09-12\n\u003e Avxperten - 25630\n  - DKK 699.0\n  - 2022-09-12\n```\n\n\u003cbr/\u003e\n\n\n## View all products \u003ca name=\"view-all-products\"\u003e\u003c/a\u003e\nTo view all the products you have scraped, you can use the argument ```--list-products```.\n\nExample:\n```\npython3 main.py --list-products\n```\n\nThis will list all the products in the following format:\n\n```\nCATEGORY\n  \u003e PRODUCT NAME\n    - WEBSITE NAME - PRODUCT ID\n    - ✓ WEBSITE NAME - PRODUCT ID\n```\n\nThe check mark (✓) shows that the product is activated.\n\n\u003cbr/\u003e\n\n\n## Visualize data \u003ca name=\"visualize-data\"\u003e\u003c/a\u003e\nTo visualize your data, just run main.py with the ```-v``` or ```--visualize``` argument and then specify which products you want to be visualized. These are your options for how you want to visualize your products:\n\n- ```--all``` to visualize all your products\n- ```-c [\u003ccategory\u003e [\u003ccategory\u003e ...]]``` or ```--category [\u003ccategory\u003e [\u003ccategory\u003e ...]]``` to visualize all products in one or more categories\n- ```--id [\u003cid\u003e [\u003cid\u003e ...]]``` to visualize one or more products with the specified id(s)\n- ```-n [\u003cname\u003e [\u003cname\u003e ...]]``` or ```--name [\u003cname\u003e ...]]``` to visualize one or more products with the specified name(s)\n- ```--compare``` to compare two or more products with the specified id(s), name(s) and/or category(s) or all products on one graph. Use with ```--id```, ```--name```, ```--category``` and/or ```--all```\n\n### Example graph\n![](https://user-images.githubusercontent.com/57172157/171033112-908f6420-6c7a-44ef-ba67-8a4a73bbd96e.png)\n\n### Command examples \u003ca name=\"command-examples\"\u003e\u003c/a\u003e\n**Show graphs for all products**\n\nTo show graphs for all products, run the following command:\n```\npython3 main.py -v --all\n```\n\n\u003cbr/\u003e\n\n**Show graph(s) for specific products**\n\nTo show a graph for only one product, run the following command where ```\u003cid\u003e``` is the id of the product you want a graph for:\n```\npython3 main.py -v --id \u003cid\u003e\n```\n\nFor multiple products, just add another id, like so:\n```\npython3 main.py -v --id \u003cid\u003e \u003cid\u003e\n```\n\n\u003cbr/\u003e\n\n**Show graphs for products in one or more categories**\n\nTo show graphs for all products in one category, run the following command where ```\u003ccategory\u003e``` is the category you want graph from:\n```\npython3 main.py -v -c \u003ccategory\u003e\n```\n\nFor multiple categories, just add another category, like so:\n```\npython3 main.py -v -c \u003ccategory\u003e \u003ccategory\u003e\n```\n\n\u003cbr/\u003e\n\n**Show graps for products with a specific name**\n\nTo show graphs for product(s) with a specific name, run the following command where ```\u003cname\u003e``` is the name of the product(s) you want graphs for:\n```\npython3 main.py -v --name \u003cname\u003e\n```\n\nFor multiple products with different names, just add another name, like so:\n```\npython3 main.py -v --name \u003cname\u003e \u003cname2\u003e\n```\n\nIf the name of a product has multiple words in it, then just add quotation marks around the name.\n\n\u003cbr/\u003e\n\n**Only show graph for products that are up to date**\n\nTo only show graphs for the products that are up to date, use the flag ```--up-to-date``` or ```-utd```, like so:\n```\npython3 main.py -v --all -utd\n```\nThe use of the flag ```-utd``` is only implemented when visualizing all products like the example above or when visualizing all products in a category:\n```\npython3 main.py -v -c \u003ccategory\u003e -utd\n```\n\n\u003cbr/\u003e\n\n**Compare two products**\n\nTo compare two products on one graph, use the flag ```--compare``` with flag ```--id```, ```--name```, ```--category``` and/or ```--all```, like so:\n```\npython3 main.py -v --compare --id \u003cid\u003e\n```\n```\npython3 main.py -v --compare --name \u003cname\u003e\n```\n```\npython3 main.py -v --compare --category \u003ccategory\u003e\n```\n```\npython3 main.py -v --compare --id \u003cid\u003e --name \u003cname\u003e --category \u003ccategory\u003e\n```\n```\npython3 main.py -v --compare --all\n```\n\n***OBS** when using ```--name``` or ```--category``` multiple products can be visualized*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCrinibus%2Fscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCrinibus%2Fscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCrinibus%2Fscraper/lists"}