{"id":16620880,"url":"https://github.com/ajmeese7/dynamic-page-retrieval","last_synced_at":"2026-04-30T03:35:18.657Z","repository":{"id":42224170,"uuid":"147258158","full_name":"ajmeese7/dynamic-page-retrieval","owner":"ajmeese7","description":"Scrape data from JS-rendered pages","archived":false,"fork":false,"pushed_at":"2024-06-18T15:17:31.000Z","size":199,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-26T08:02:22.980Z","etag":null,"topics":["dynamic-content","puppeteer","scraping"],"latest_commit_sha":null,"homepage":"https://dynamic-page-retrieval.herokuapp.com/scrape","language":"EJS","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ajmeese7.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-03T22:10:33.000Z","updated_at":"2023-12-08T11:55:28.000Z","dependencies_parsed_at":"2024-12-18T05:23:34.038Z","dependency_job_id":null,"html_url":"https://github.com/ajmeese7/dynamic-page-retrieval","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ajmeese7/dynamic-page-retrieval","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajmeese7%2Fdynamic-page-retrieval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajmeese7%2Fdynamic-page-retrieval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajmeese7%2Fdynamic-page-retrieval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajmeese7%2Fdynamic-page-retrieval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ajmeese7","download_url":"https://codeload.github.com/ajmeese7/dynamic-page-retrieval/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajmeese7%2Fdynamic-page-retrieval/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32454089,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T22:27:22.272Z","status":"online","status_checked_at":"2026-04-30T02:00:05.929Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dynamic-content","puppeteer","scraping"],"created_at":"2024-10-12T02:45:35.689Z","updated_at":"2026-04-30T03:35:18.641Z","avatar_url":"https://github.com/ajmeese7.png","language":"EJS","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dynamic-page-retrieval\n\nThe point of this project is to make web scraping easier for developers in any language.\nThis allows you to send a URL as a parameter to a Heroku application via a GET request\nand receive the scraped HTML as a result. The most helpful part of this project is that\nit returns the web page after it has been dynamically populated by JavaScript, so you\ncan scrape nearly any page.\n\n## Usage\n\nSimply send a GET request to `https://dynamic-page-retrieval.herokuapp.com/scrape` with a URL\nparameter, which should be formatted like so: `?URL=https://www.google.com`.\n\nSo, the entire URL for your GET request, if you were going to use the pre-hosted Heroku\napplication, would be `https://dynamic-page-retrieval.herokuapp.com/scrape?URL=https://www.google.com`\nif you wanted to scrape `https://www.google.com`.\n\nAn example of how to format this GET request in JavaScript:\n```javascript\nconst Http = new XMLHttpRequest();\nconst url = \"https://dynamic-page-retrieval.herokuapp.com/scrape?URL=https://www.google.com\";\nHttp.open(\"GET\", url);\nHttp.send();\nHttp.onreadystatechange=(e)=\u003e{\n  // Replace console.log() with what you need the HTML for,\n  // or assign it to a global variable for use elsewhere\n  console.log(Http.responseText)\n}\n```\n\n## Set up your own\n\nFirst, create a free [Heroku](signup.heroku.com) account. If you already have one, there is\nno need to make a new one.\n\nNext, make sure you have [Node.js and npm](https://nodejs.org/en/download/) installed locally.\nIn the creation of this project, I used Node v9.3.0 and npm v6.4.1, but it shouldn't matter\nthat much since you are just going to be deploying to Heroku. If you are going to run this\nlocally, then version will likely be more of a factor.\n\nClone this project to your machine and open a terminal in the folder. Enter the following\nsequence of commands:\n\n`heroku create`\n\n`heroku buildpacks:add https://github.com/jontewks/puppeteer-heroku-buildpack`\n\n`git push heroku master`\n\n`heroku ps:scale web=1`\n\n`heroku open`\n\nAnd you should have a working copy of the project!\n\nI am using [kaffeine](http://kaffeine.herokuapp.com/) to keep my dyno alive to reduce loading\ntimes. It is currently set to sleep at 12:00 AM to conserve hours, so it will not be awake from\n12:00-6:00 unless someone sends a request during that time interval. An alternative is to add\nsomething like this to the app to help it keep itself awake:\n```javascript\nvar http = require(\"http\");\nsetInterval(function() {\n  http.get(\"http://\u003cyour app name\u003e.herokuapp.com\");\n}, 300000); // every 5 minutes (300000)\n```\n\n## Contributing\n\nFeel free to open a PR for README additions of GET requests in other languages, making a pretty\nhomepage and displaying the information on the scraped page in a nicer format, better tests,\nbetter error handling, etc.\n\n### Ideas\n- Make npm package where you just put in the URL and get back the scraped content\n- Make similar projects in other languages (even though that was a bust before)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajmeese7%2Fdynamic-page-retrieval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fajmeese7%2Fdynamic-page-retrieval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajmeese7%2Fdynamic-page-retrieval/lists"}