{"id":32481781,"url":"https://github.com/overbrowsing/wasteback-machine","last_synced_at":"2025-10-27T02:58:41.401Z","repository":{"id":320462903,"uuid":"1049160904","full_name":"overbrowsing/wasteback-machine","owner":"overbrowsing","description":"JavaScript library for measuring the size and composition of archived web pages.","archived":false,"fork":false,"pushed_at":"2025-10-23T22:28:59.000Z","size":153,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-24T00:11:53.403Z","etag":null,"topics":["internet-archive","sustainable-web-design","wayback-machine","web-archives","webpage-metrics","website-analysis"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/overbrowsing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-02T15:15:37.000Z","updated_at":"2025-10-23T22:36:41.000Z","dependencies_parsed_at":"2025-10-24T00:12:02.179Z","dependency_job_id":"4414b9ef-3c9b-4035-b76a-15d1b632bb1e","html_url":"https://github.com/overbrowsing/wasteback-machine","commit_stats":null,"previous_names":["overbrowsing/wasteback-machine"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/overbrowsing/wasteback-machine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/overbrowsing%2Fwasteback-machine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/overbrowsing%2Fwasteback-machine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/overbrowsing%2Fwasteback-machine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/overbrowsing%2Fwasteback-machine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/overbrowsing","download_url":"https://codeload.github.com/overbrowsing/wasteback-machine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/overbrowsing%2Fwasteback-machine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281207061,"owners_count":26461324,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-27T02:00:05.855Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["internet-archive","sustainable-web-design","wayback-machine","web-archives","webpage-metrics","website-analysis"],"created_at":"2025-10-27T02:58:39.710Z","updated_at":"2025-10-27T02:58:41.389Z","avatar_url":"https://github.com/overbrowsing.png","language":"JavaScript","funding_links":[],"categories":["Software"],"sub_categories":["Command-line Clients"],"readme":"# Wasteback Machine\n\n[![NPM version](https://img.shields.io/npm/v/@overbrowsing/wasteback-machine.svg)](https://www.npmjs.com/package/@overbrowsing/wasteback-machine)\n[![npm](https://img.shields.io/npm/dt/@overbrowsing/wasteback-machine.svg)](https://www.npmtrends.com/@overbrowsing/wasteback-machine)\n[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://egghead.io/courses/how-to-contribute-to-an-open-source-project-on-github)\n\n## What Is Wasteback Machine?\n\nWasteback Machine is a JavaScript library for measuring the size and composition of archived web pages (mementos) from the [Internet Archive's Wayback Machine](https://web.archive.org).\n\n## Why Use Wasteback Machine?\n\nWasteback Machine retrieves mementos with high fidelity, removing archive linkage and replay-preserving modifications, and excluding replay-induced distortions, while preserving temporal coherence. The library extracts and classifies binary resources (URI-Ms) to accurately measure page size and composition.\n\nThe method overcomes the limitations of live-measurement approaches by recognising the unique nature of web archives as re-born digital objects and navigating their complexities to make them analytically tractable. This enables retrospective analysis of websites.\n\nIts modular design supports integration into research workflows, analytics pipelines, and sustainability assessment tools, facilitating the study of web evolution and informing interventions to measure the internet’s environmental impact.\n\n## Features\n\n- **Retrieve mementos by date or timespan:** Selects the nearest memento if the exact timestamp is missing.\n- **Analyse page composition:** Measure sizes of HTML, style sheets, scripts, images, videos, fonts, etc.\n- **Generate detailed resource (URI-M) lists:** Includes URLs, types, and sizes of all URI-Ms.\n- **Retrieval completeness score:** See what percentage of a memento was successfully retrieved.\n\n## Installation\n\n### Using NPM\n\nTo install Wasteback Machine as a dependency for your projects using NPM:\n\n```sh\nnpm install @overbrowsing/wasteback-machine\n```\n\n### Using Yarn\n\nTo install Wasteback Machine as a dependency for your projects using Yarn:\n\n```sh\nyarn add @overbrowsing/wasteback-machine\n```\n\n## Usage\n\nWasteback Machine provides two primary functions:\n\n1. Discover available mementos for a URL in a given time range.\n2. Analyse a specific memento for page size and composition.\n\n### 1. Fetch Available Mementos\n\n```javascript\nimport { getMementos } from \"@overbrowsing/wasteback-machine\";\n\n// Get all mementos for www.nytimes.com between 1996 and 2025\nconst mementos = await getMementos('https://nytimes.com', 1996, 2025);\nconsole.log(mementos);\n```\n\nExample Output:\n\n```javascript\n[\n  '19961112181513', '19961112181513', '19961112181513', '19961219002950', ...\n]\n```\n\n### 2. Analyse a Specific Memento\n\n```javascript\nimport { getMementoSizes } from \"@overbrowsing/wasteback-machine\";\n\n// Analyse www.nytimes.com memento from November 12, 1996\nconst mementoData = await getMementoSizes(\n  'https://nytimes.com',\n  '19961112181513',\n  { includeResources: true } // optional: include full resource list\n);\nconsole.log(mementoData);\n```\n\nExample Output:\n\n```js\n{\n  url: 'https://nytimes.com',\n  requestedMemento: '19961112181513',\n  memento: '19961112181513',\n  mementoURL: 'https://web.archive.org/web/19961112181513/https://nytimes.com',\n  sizes: {\n    html: { bytes: 1653, count: 1 },\n    stylesheet: { bytes: 0, count: 0 },\n    script: { bytes: 0, count: 0 },\n    image: { bytes: 46226, count: 2 },\n    video: { bytes: 0, count: 0 },\n    audio: { bytes: 0, count: 0 },\n    font: { bytes: 0, count: 0 },\n    flash: { bytes: 0, count: 0 },\n    plugin: { bytes: 0, count: 0 },\n    data: { bytes: 0, count: 0 },\n    other: { bytes: 0, count: 0 },\n    total: { bytes: 47879, count: 3 }\n  },\n  completeness: '100%',\n  resources: [\n    {\n      url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/index.gif',\n      type: 'image',\n      size: 45259\n    },\n    {\n      url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/free-images/marker.gif',\n      type: 'image',\n      size: 967\n    }\n  ]\n}\n```\n\n## Demo\n\nA demo is available in [`examples/demo.js`](examples/demo.js). It integrates [CO2.js](https://developers.thegreenwebfoundation.org/co2js/overview) with the 1Byte model to estimate the environmental impact of a memento.\n\n### Getting Started\n\nRun the demo with Node.js:\n\n```bash\nnode examples/demo.js \u003cURL\u003e \u003cYear YYYY\u003e [Month MM] [Day DD]\n```\n\nParameters:\n\t•\t\u003cURL\u003e: Target website to analyse\n\t•\t\u003cYear YYYY\u003e: Year of interest\n\t•\t[Month MM]: Optional month (defaults to January (01) if omitted)\n\t•\t[Day DD]: Optional day (defaults to 1st (01) if omitted)\n\nExample:\n\n```bash\n# Analyse www.nytimes.com memento from November 12, 1996\nnode examples/demo.js www.nytimes.com 1996 11 12\n```\n\n### Results\n\nAfter running the demo, you will receive a structured report for the desired memento:\n\n- Memento information:\n  - Retrieved memento URL\n  - Completeness of retrieval (%)\n- Page size results:\n  - Total page size (KB)\n  - Estimated equivalent emissions per page visit (g CO₂e)\n- Page composition results:\n  - Count of URI-Ms by type (images, scripts, stylesheets, etc.)\n  - Total size per type (KB) and percentage of total page size (%)\n  - Estimated equivalent emissions per type per page visit (g CO₂e)\n\nExample Output:\n\n```bash\n# Results for www.nytimes.com memento from November 12, 1996\nRetrieved Memento:\n🔗 Memento URL:     https://web.archive.org/web/19961112181513/https://www.nytimes.com\n✅ Completeness:    100%\n\nPage Size Results:\n📊 Data Transfer:   46.76 KB\n🌍 Page CO₂e:       0.014 g\n\nPage Composition Results:\n📁 HTML\n   Count:   1\n   Size:    1.61 KB (3.5%)\n   CO₂e:    0.000 g\n\n📁 IMAGE\n   Count:   2\n   Size:    45.14 KB (96.5%)\n   CO₂e:    0.013 g\n```\n\n## Methodology\n\nFor details on Wasteback Machine’s methodology, assumptions, and limitations, please refer to our working paper. It provides guidance on the library’s intended use, interpretive constraints, and best practices for integrating results into research or sustainability assessments.\n\nFor questions or access before publication, please contact [overbrowsing@ed.ac.uk](mailto:overbrowsing@ed.ac.uk).\n\n## Disclaimer\n\n\u003e [!IMPORTANT]\n\u003e This library is provided for informational and research purposes only. The authors make no guarantees about the accuracy of the results and disclaim any liability for their use.\n\n## Contributing\n\nContributions are welcome! Please [submit an issue](https://github.com/overbrowsing/wasteback-machine/issues) or a [pull request](https://github.com/overbrowsing/wasteback-machine/pulls).\n\n## Licenses\n\nWasteback Machine is licensed under [Apache 2.0](https://tldrlegal.com/license/apache-license-2.0-(apache-2.0)). For full licensing details, see the [LICENSE](/LICENSE) file.\n\nThe Wayback Machine, Wayback CDX Server API, and Wayback Replay API are provided by the Internet Archive and are governed by their [Terms of Use](https://archive.org/about/terms).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foverbrowsing%2Fwasteback-machine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foverbrowsing%2Fwasteback-machine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foverbrowsing%2Fwasteback-machine/lists"}