{"id":21300607,"url":"https://github.com/callaginn/kroger-sweeper","last_synced_at":"2026-05-01T08:31:46.068Z","repository":{"id":263953520,"uuid":"865150241","full_name":"callaginn/kroger-sweeper","owner":"callaginn","description":"Download and categorize receipts from Kroger.com by scraping a user's purchases with puppeteer and playwright.","archived":false,"fork":false,"pushed_at":"2024-11-26T18:50:24.000Z","size":61,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-13T07:01:45.891Z","etag":null,"topics":["kroger","kroger-api","nodejs","orders","playwright","puppeteer","purchases","receipt","receipt-parser","receipts","scrape","scraper"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/callaginn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-30T04:15:41.000Z","updated_at":"2024-11-26T18:50:28.000Z","dependencies_parsed_at":"2024-11-21T07:31:52.992Z","dependency_job_id":"f8b1d8e8-69b7-448c-b2d1-2bda7de12e6a","html_url":"https://github.com/callaginn/kroger-sweeper","commit_stats":null,"previous_names":["callaginn/kroger-sweeper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/callaginn/kroger-sweeper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/callaginn%2Fkroger-sweeper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/callaginn%2Fkroger-sweeper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/callaginn%2Fkroger-sweeper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/callaginn%2Fkroger-sweeper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/callaginn","download_url":"https://codeload.github.com/callaginn/kroger-sweeper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/callaginn%2Fkroger-sweeper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32490810,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["kroger","kroger-api","nodejs","orders","playwright","puppeteer","purchases","receipt","receipt-parser","receipts","scrape","scraper"],"created_at":"2024-11-21T15:27:46.501Z","updated_at":"2026-05-01T08:31:46.054Z","avatar_url":"https://github.com/callaginn.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kroger Receipt Sweeper\nDownloads itemized receipts from a Kroger account with Puppeteer and categorizes them using the Kroger API and custom data cleanup scripts.\n\n```\nbrew install 1password-cli\nyarn install\n```\n\n## How to Use\n\n### 1. Scrape Kroger Data (Two Methods)\n\u003e _At the moment, both methods have to be manually copied/pasted into a json file, such as `src/data/receipts.json`._\n\n#### a. Scrape with `yarn sweep`\nThis experimental script logs into Kroger and scrapes all the receipt data automatically.\n\n#### b. Scrape with console scripts\nIf you run into issues with the automated scripts above, you can revert to a more manual process to collect the same data. This process includes logging in, navigating to the \"/mypurchases\" page, opening up the Chrome Console, and pasting in code from the \"scrape*.md\" files.\n\nSince this process involves you controlling a regular version of Chrome, it's the least likely to be flagged as a bot. And unlike the automated scripts, it allows you to intervene if it runs into issues. We'll use **scrape2b.md** as an example.\n\nThis script is able to get a list of all the purchases, but often fails after collecting a few itemized receipts. I believe this is because bot protection is being triggered, but haven't found an automated way to bypass that.\n\nIf it fails when fetching a batch of receipts, you can do the following:\n- Scroll/click around the Kroger interface a bit so that they reflag you as a human. Make sure you stay on the \"/mypurchases\" page though. There's some tabs at top that stay on the same page.\n- In the console, manually rerun the batch that failed and the ones afterwards. For example, if it fails while grabbing the fourth batch, you'll need to run `processBatch(batches, 3)` and subsequent batches.\n- Repeat the steps above if additional batches fail.\n\n### 2. Cleanup with `yarn cleanup`\n\nThis script grabs a simplified array of products from `receipts.json` and exports them to `products.json`.\n\n### 3. Get product categories with `yarn lookup`\nThis script loads `products.json` and uses the Kroger api to request information about all the products. These categorized products are saved to `src/data/categories.json`.\n\n### 4. Categorize items with `yarn categorize`\nThis script matches the products from `products.json` to `receipts.json`. Easiest course of action would be looping through categories.json and assigning the category based off the upc key.\n\n## Development Info and Scripts\nThe `src/dev/docs` folder contains a list of markdown files that explains how the process works.\n\nRun `yarn dev bot` to get a feel for how bot detectors like [Akamai's Bot Manager](https://www.akamai.com/products/bot-manager) (used by Kroger) detect bots and ID your device by browser.\n\n## Useful References\n- [How to scrape the web without getting blocked (Zyte.com)](https://www.zyte.com/blog/how-to-scrape-the-web-without-getting-blocked/)\n- [Detecting Headless Chrome’s Puppeteer Extra Stealth Plugin with JavaScript Browser Fingerprinting](https://datadome.co/bot-management-protection/detecting-headless-chrome-puppeteer-extra-plugin-stealth/)\n- [How To Make Puppeteer Undetectable](https://scrapeops.io/puppeteer-web-scraping-playbook/nodejs-puppeteer-make-puppeteer-undetectable/)\n- [Can a website detect when you are using Selenium with chromedriver?](https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-with-chromedriver/41220267#41220267)\n- [How to set User-Agent header with Puppeteer JS and not fail](https://filipvitas.medium.com/how-to-set-user-agent-header-with-puppeteer-js-and-not-fail-28c7a02165da)\n- [THE LAB #22: Scraping Akamai protected websites](https://substack.thewebscraping.club/p/scraping-akamai-protected-website)\n- [THE LAB #30: How to bypass Akamai protected website when nothing else works](https://substack.thewebscraping.club/p/the-lab-30-how-to-bypass-akamai-protected)\n\n## Bot Tests\n- [Your HTTP headers](https://deviceandbrowserinfo.com/http_headers)\n- [BrowserScan Device Info](https://www.browserscan.net)\n- [BrowserScan Bot Detection](https://www.browserscan.net/bot-detection)\n- [SannySoft Bot Fingerprint Scanner](https://bot.sannysoft.com/)\n- [Canvas Fingerprinting](https://browserleaks.com/canvas)\n- [Calculate reCAPTCHA 3 Score](https://antcpt.com/eng/information/demo-form/recaptcha-3-test-score.html)\n- [Rebrowser Bot Detector](https://bot-detector.rebrowser.net/)\n- [FingerprintJS Bot Detector](https://fingerprintjs.github.io/BotD/main/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcallaginn%2Fkroger-sweeper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcallaginn%2Fkroger-sweeper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcallaginn%2Fkroger-sweeper/lists"}