{"id":23991303,"url":"https://github.com/loglux/scan_co_uk","last_synced_at":"2026-06-12T13:30:57.660Z","repository":{"id":188278181,"uuid":"678399707","full_name":"loglux/Scan_co_uk","owner":"loglux","description":"This repository contains a Scrapy spider designed to scrape product information from Scan.co.uk based on provided search terms and filters.","archived":false,"fork":false,"pushed_at":"2023-08-29T13:03:05.000Z","size":15,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-25T05:17:13.589Z","etag":null,"topics":["filtering-data","python","scrapy","scrapy-spider","search","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/loglux.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-14T13:15:28.000Z","updated_at":"2023-09-12T23:48:04.000Z","dependencies_parsed_at":"2025-01-07T19:39:08.359Z","dependency_job_id":"7408f664-71d5-48ec-97d0-3486dca289a7","html_url":"https://github.com/loglux/Scan_co_uk","commit_stats":null,"previous_names":["loglux/scan_co_uk"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/loglux/Scan_co_uk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loglux%2FScan_co_uk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loglux%2FScan_co_uk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loglux%2FScan_co_uk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loglux%2FScan_co_uk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/loglux","download_url":"https://codeload.github.com/loglux/Scan_co_uk/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loglux%2FScan_co_uk/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34247460,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["filtering-data","python","scrapy","scrapy-spider","search","webscraping"],"created_at":"2025-01-07T19:39:05.135Z","updated_at":"2026-06-12T13:30:57.639Z","avatar_url":"https://github.com/loglux.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scan.co.uk Scrapy Spider\nThis repository contains a Scrapy spider designed to scrape product information from Scan.co.uk based on provided search terms and filters.\n\n## Features\n- **Search Products on Scan.co.uk**: Directly search for products based on your terms.\n- **Filter Results Using Multiple Keywords**: For instance, you can search for \"RTX 3080\" and filter the results by terms like \"Gigabyte, 10GB, OC\". The spider supports filtering by multiple comma-separated keywords.\n- **Exception Keywords Filter**: Exclude products containing specific keywords from the scraped results. For example, you can exclude products containing the word \"refurbished\".\n- **Out of Stock Suppression**: By default, products that are out of stock are suppressed. An additional key can include them in the results.\n- **Save results to a CSV file**.\n\n## Setup and Installation\n1. Clone this repository.\n```bash\ngit clone https://github.com/loglux/Scan_co_uk.git\ncd Scan_UK/scan_uk\n```\n2. Setup a virtual environment.\n```bash\npython3 -m venv venv\nsource venv/bin/activate \n# On Windows, use: venv\\Scripts\\activate.bat instead\n```\n3. Install dependencies.\n```bash\npip install scrapy\n```\n4. Run the spider.\n```bash\nscrapy crawl scan_search -a search=\"RTX 3080\" -a filter_words=\"Gigabyte,10GB\" -a exception_keywords=\"Dual\" -o output.csv -t csv\n```\n\n## Parameters\n- **search**: The search term you want to use (e.g., \"RTX 3080\").\n- **filter_words**: Comma-separated list of words to filter search results. Only results containing all of these words will be returned. Use -a filter_mode=\"any\", if you need to change this behaviour. Default is an empty string.\n- **filter_mode**: By default, filtering results that contain all specified filter words. However, if you want to change the behavior, set it to \"any\", which will filter results containing any of the specified filter words.\n- **exception_keywords**: Comma-separated list of words that act as negative filters. Results containing any of these words will be excluded. Default is an empty string.\n- **include_out_of_stock**: By default, out of stock products are suppressed. If you want to include them, pass include_out_of_stock=True.\n\n## Added scan_gpu_spider Functionality\n\n### Features\n- **Search for GPUs**: The spider now crawls multiple GPU pages to fetch the latest GeForce Graphics Cards details. It covers the RTX 4070 series all the way down to the RTX 3060 series.\n- **Model Number**: A new feature is added to extract the model number of each GPU.\n- **Dimensions**: The spider now also extracts the dimensions of the GPU, providing a better understanding of the physical specs.\n- **Chipset**: The spider can now also fetch the chipset information for each product, to offer more in-depth details about the card.\n\n### How to Run scan_gpu_spider\nNavigate to the project directory.\nRun \n```bash\nscrapy crawl scan_gpu_spider -o output.csv\n```\nThis will start the spider, and the scraped data will be stored in your desired format (e.g., .json, .csv, etc.)\n\n## Required Twisted Version\n\nThis project was created and tested with a specific version of the Twisted library to ensure compatibility and proper functioning with the Scrapy spider. The required Twisted version for this project is **Twisted 22.10.0**.\n\n### Scrapy Version and Compatibility\n\nAt the time this project was created, the latest available version of Scrapy was **Scrapy 2.10.0**. During development and testing, it was confirmed that this version of Scrapy worked seamlessly with Twisted 22.10.0, providing a stable and reliable environment for scraping.\n\n### Compatibility Issue with Newer Twisted Versions\n\nSince software libraries like Scrapy evolve over time, new versions are released to introduce features, improvements, and bug fixes. However, these updates can sometimes lead to compatibility issues with other libraries that the software relies on.\n\nIt has been observed that versions of Twisted newer than 22.10.0, such as **Twisted 28.10.0**, can cause compatibility problems with Scrapy 2.10.0. As a result, it is recommended to maintain the specified Twisted version to ensure that the Scrapy spider works as intended.\n\n### Downgrading Twisted for Compatibility\n\nTo mitigate the compatibility issue and ensure a smooth experience, it is advised to downgrade Twisted to the required version. You can achieve this by running the following command:\n\n```bash\npip install --upgrade Twisted==22.10.0\n```\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n## License\nMIT","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Floglux%2Fscan_co_uk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Floglux%2Fscan_co_uk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Floglux%2Fscan_co_uk/lists"}