{"id":14118153,"url":"https://github.com/Page-Replica/page-replica","last_synced_at":"2025-08-02T06:30:45.130Z","repository":{"id":214912506,"uuid":"737644300","full_name":"Page-Replica/page-replica","owner":"Page-Replica","description":"Page Replica – Tool for Web Scraping, Prerendering, and SEO Boost","archived":false,"fork":false,"pushed_at":"2024-07-22T04:37:41.000Z","size":14,"stargazers_count":425,"open_issues_count":0,"forks_count":18,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-11-29T12:37:26.837Z","etag":null,"topics":["caching","frontend","prerendering","seo-optimization","ssr"],"latest_commit_sha":null,"homepage":"https://page-replica.com","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Page-Replica.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-31T22:23:04.000Z","updated_at":"2024-11-28T09:19:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"6582c636-e4b9-49cc-8883-c56aa068bcb3","html_url":"https://github.com/Page-Replica/page-replica","commit_stats":null,"previous_names":["html5-ninja/page-replica","page-replica/page-replica"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Page-Replica%2Fpage-replica","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Page-Replica%2Fpage-replica/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Page-Replica%2Fpage-replica/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Page-Replica%2Fpage-replica/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Page-Replica","download_url":"https://codeload.github.com/Page-Replica/page-replica/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228443979,"owners_count":17920825,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["caching","frontend","prerendering","seo-optimization","ssr"],"created_at":"2024-08-14T19:01:11.172Z","updated_at":"2024-12-06T09:31:01.401Z","avatar_url":"https://github.com/Page-Replica.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"\u003cimg width=\"321\" alt=\"Screenshot 2024-06-30 at 21 50 13\" src=\"https://github.com/html5-ninja/page-replica/assets/2590579/d606e994-b6ac-4235-9ff6-5ec7a76fa095\"\u003e\n\n# Good news everyone! Page Replica is Available as a Web App!\n\nIf you want to avoid the hassle of setting up your own pre-rendering tool, check out [Page Replica](https://page-replica.com). Manage and re-render your pages effortlessly!\n\n### Key Feature\n- Page Replica is free to use for up to 5,000 requests per month.\n- Unlimited sites\n- API access\n\n### Need Assistance?\nIf you have any questions or need support, we're here to help! Join our [GitHub Discussion](https://github.com/html5-ninja/page-replica/discussions/3) to get in touch with us.\n\n---\n\n# Page Replica free tool\n\n\"Page Replica\" is a versatile web scraping and caching tool built with Node.js, Express, and Puppeteer. It helps prerender web app (React, Angular, Vue,...) pages, which can be served via Nginx for SEO or other purposes.\n\nThe tool allows you to scrape individual web pages or entire sitemaps trough an api, selectively removing JavaScript, and caching the resulting HTML.\n\nAdditionally, it features an Nginx configuration that optimally handles user and search engine bot traffic.\n\n\n## Installation\n\n1. **Clone the Repository:**\n\n   ```bash\n   git clone https://github.com/html5-ninja/page-replica.git\n   cd page-replica\n   ```\n\n2. **Install Dependencies:**\n\n   ```bash\n   npm install\n   ```\n\n3. **Settings:**\n- index.js \n   ```bash\n   const CONFIG = {\n   baseUrl: \"https://example.com\",\n   removeJS: true,\n   addBaseURL: true,\n   cacheFolder: \"path_to_cache_folder\",\n   }\n   ```\n- app.js : set the port for your API\n\n4. **Start the API:**\n\n   ```bash\n   npm start\n   ```\n\n## Usage\n\nBy scraping a page or a sitemap, a copy of the prerendered page will be stored in the cache folder.\n\n### Scraping Individual Pages\n\nTo scrape a single page, make a GET request to `/page` with the `url` query parameter:\n\n```bash\ncurl http://localhost:8080/page?url=https://example.com\n```\n\n### Scraping Sitemaps\n\nTo scrape pages from a sitemap, make a GET request to `/sitemap` with the `url` query parameter:\n\n```bash\ncurl http://localhost:8080/sitemap?url=https://example.com/sitemap.xml\n```\n\n## Serve the Cached Pages to Bots with Nginx (My Recipe)\n\nIn this case, the cached pages are served using Nginx. You can adapt this configuration to your needs and your server.\n\nThe Nginx configuration, residing in `nginx_config_sample/example.com.conf`, thoughtfully manages traffic. \nIt efficiently routes regular users to the main application server and redirects search engine bots to a dedicated server block for cached HTML delivery.\n\nPlease review the `nginx_config_sample/example.com.conf` file to gain an understanding of its functionality.\n\n## Contribution\nWe welcome contributions! If you have ideas for new features or server/cloud configurations that could enhance this tool, feel free to:\n\n- Open an issue to discuss your ideas.\n- Fork the repository and make your changes.\n- Submit a pull request with a clear description of your changes.\n\n### Feature Requests and Suggestions\nIf you have any feature requests or suggestions for server/cloud configurations beyond Nginx, please open an issue to start a discussion.\n\n## Folder Structure\n\n- `nginx_config_sample`: Presents a sample Nginx configuration for redirecting bot traffic to the cached content server.\n- `api.js`: An Express application responsible for handling web scraping requests.\n- `index.js`: The core web scraping logic employing Puppeteer.\n- `package.json`: Node.js project configuration.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPage-Replica%2Fpage-replica","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPage-Replica%2Fpage-replica","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPage-Replica%2Fpage-replica/lists"}