{"id":19364362,"url":"https://github.com/davidfstr/crystal-web-archiver","last_synced_at":"2026-04-03T04:03:17.505Z","repository":{"id":2303469,"uuid":"3262361","full_name":"davidfstr/Crystal-Web-Archiver","owner":"davidfstr","description":"Downloads websites for long-term archival.","archived":false,"fork":false,"pushed_at":"2026-03-29T19:57:49.000Z","size":11240,"stargazers_count":89,"open_issues_count":106,"forks_count":7,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-29T21:34:06.339Z","etag":null,"topics":["archival","digital-preservation","website-downloader"],"latest_commit_sha":null,"homepage":"http://dafoster.net/projects/crystal-web-archiver","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidfstr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2012-01-25T04:09:28.000Z","updated_at":"2026-03-29T19:57:24.000Z","dependencies_parsed_at":"2026-01-04T16:07:44.921Z","dependency_job_id":null,"html_url":"https://github.com/davidfstr/Crystal-Web-Archiver","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"purl":"pkg:github/davidfstr/Crystal-Web-Archiver","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2FCrystal-Web-Archiver","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2FCrystal-Web-Archiver/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2FCrystal-Web-Archiver/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2FCrystal-Web-Archiver/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidfstr","download_url":"https://codeload.github.com/davidfstr/Crystal-Web-Archiver/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidfstr%2FCrystal-Web-Archiver/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31333229,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T03:20:36.090Z","status":"ssl_error","status_checked_at":"2026-04-03T03:20:35.133Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archival","digital-preservation","website-downloader"],"created_at":"2024-11-10T07:37:12.289Z","updated_at":"2026-04-03T04:03:17.494Z","avatar_url":"https://github.com/davidfstr.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Crystal: A Website Archiver\n===========================\n\n\u003cimg src=\"https://raw.githubusercontent.com/davidfstr/Crystal-Web-Archiver/main/README/logo.png\" alt=\"Crystal Website Archiver icon\" align=\"right\" /\u003e\n\nCrystal is a tool that downloads high-fidelity copies of websites for long-term archival.\n\nIt works best on traditional websites made of distinct pages using \nlimited JavaScript (such as blogs, wikis, and other static websites)\nalthough it can also download more dynamic sites which have infinitely \nscrolling feeds of content (such as social media sites).\n\nTo get started downloading your first website with Crystal, please see the \n[Tutorial](#tutorial) below.\n\n\u003cimg src=\"https://raw.githubusercontent.com/davidfstr/Crystal-Web-Archiver/main/README/crystal-ui.png\" alt=\"Crystal's user interface\" title=\"Crystal's user interface\" /\u003e\n\n\u003ca name=\"download\"\u003e\u003c/a\u003e\n\nDownload ⬇︎\n--------\n\nEither install a binary version of Crystal:\n\n* [macOS 14 and later](https://github.com/davidfstr/Crystal-Web-Archiver/releases/download/v2.3.0/crystal-mac-2.3.0.dmg)\n* [Windows 11 and later](https://github.com/davidfstr/Crystal-Web-Archiver/releases/download/v2.3.0/crystal-win-2.3.0.exe)\n\nOr install from source, using pipx:\n\n* Install [Python] \u003e=3.13 and pip:\n    * Ubuntu/Kubuntu 22.04+: `apt-get update; apt-get install -y python3 python3-pip python3-venv`\n    * Fedora 37+: `yum update -y; yum install -y python3 python3-pip`\n* On Linux, install dependencies of wxPython from your package manager:\n    * Ubuntu/Kubuntu 22.04+: `apt-get install -y libgtk-3-dev`\n    * Fedora 37+: `yum install -y wxGTK-devel gcc gcc-c++ which python3-devel`\n* Install pipx\n    * `python3 -m pip install pipx`\n* Install Crystal with pipx\n    * `pipx install crystal-web`\n    * ⏳ On Linux the above step will take a long time (10+ minutes)\n      because wxPython, a dependency of Crystal, will need to be built\n      from source, since it does not offer precompiled wheels for Linux.\n* On Linux, install a shortcut to Crystal inside GNOME/KDE applications and on the desktop:\n    * `crystal --install-to-desktop`\n* Run Crystal:\n    * `crystal`\n\n[Python]: https://www.python.org/\n\n\n\u003ca name=\"tutorial\"\u003e\u003c/a\u003e\n\nTutorial ⭐\n--------\n\n\u003ca name=\"tutorial-simple-website\"\u003e\u003c/a\u003e\n\n### Download a simple website\n\n\u003e A **simple website** is created or administered by only a single person, \n\u003e may contain text and images but not video, \n\u003e and does not requiring logging in to view its content.\n\u003e \n\u003e There are many simple websites you can practice downloading at \u003chttps://daarchive.net/\u003e.\n\n\u003ca href=\"https://youtu.be/rNrbBfcO0rE\" target=\"_blank\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/davidfstr/Crystal-Web-Archiver/main/README/download-simple-poster-play.png\" alt=\"Video showing how to download a simple site with Crystal\" /\u003e\u003c/a\u003e\n\nSteps to download [xkcd], a simple site:\n\n* Download Crystal. See the [Download](#download) section above for specific instructions.\n* Open Crystal and press \"New Project\" to create a new untitled project.\n* Click the big \"New Root URL...\" button and type in \n  \"xkcd.daarchive.net\" for the URL. \n  Optionally type in \"Home\" for the Name.\n* Tick the \"Create Group to Download Entire Site\" checkbox.\n  The \"Download Site Immediately\" checkbox should already be ticked.\n  Press the \"New\" button to create the root URL, create the group for the site,\n  and start downloading the site.\n* The newly created \"Home\" URL at path \"/\" should already be selected.\n  Click the \"View\" button to open the downloaded home page in your default\n  web browser.\n* Within the web browser you should be able to navigate to any page in the\n  downloaded site.\n* Return to the Crystal app. Close the untitled window.\n  Don't worry if download tasks are still running because Crystal\n  will offer to resume any downloads later when the project is reopened.\n* You'll be prompted to save the project somewhere permanent.\n  Save it as \"Simple Tutorial\" on your desktop.\n* Find the saved \"Simple Tutorial\" project on your desktop and double-click it\n  to open it.\n* On macOS the project will open in Crystal immediately.\n  On Windows or Linux a window will appear with an \"OPEN ME\" file.\n  Double-click the \"OPEN ME\" file to open the project in Crystal.\n* The home URL should be selected. Press \"View\" to see the downloaded\n  home page again in your web browser.\n* Congratulations! You've downloaded your first simple website with Crystal!\n\n\u003ca name=\"tutorial-complex-website\"\u003e\u003c/a\u003e\n\n### Download a complex website\n\n\u003e Any website that is not simple is a **complex website**. In particular:\n\u003e \n\u003e * Sites that contain content from *multiple people* are complex. \n\u003e   Most forums, wikis (like: Wikipedia), and social media sites\n\u003e   (like: Facebook, YouTube, or X) are complex.\n\u003e * Sites that contain large assets such as video (ex: YouTube), \n\u003e   files (ex: Hugging Face), or large images are complex.\n\u003e * Sites that require *login* to view content like paid blogs (ex: Pragmatic Engineer), \n\u003e   paid news sites (ex: The New York Times), \n\u003e   and paid video sites (ex: Barre3) are complex.\n\n\u003ca href=\"https://youtu.be/1rDZVduWYlA\" target=\"_blank\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/davidfstr/Crystal-Web-Archiver/main/README/download-complex-poster-play.png\" alt=\"Video showing how to download a complex site with Crystal\" /\u003e\u003c/a\u003e\n\nWhen downloading a complex website you need to precisely define which pages you want to download because downloading the entire site would take too much space/time.\n\nEach complex website is different. Here is an example of downloading only\nGuido's blog posts from [Artima Weblogs], a complex site containing content from\nmultiple people:\n\n* Download Crystal. See the [Download](#download) section above for specific instructions.\n* Open Crystal and press \"New Project\" to create a new untitled project.\n* Click the big \"New Root URL...\" button and type in \n  \"artima.daarchive.net\" for the URL. \n  Optionally type in \"Home\" for the name.\n* The \"Download URL Immediately\" checkbox should already be ticked.\n  Press the \"New\" button to create the root URL and start downloading it.\n* The newly created \"Home\" URL at path \"/\" should already be selected.\n  Click the \"View\" button to open the downloaded home page in your default\n  web browser.\n* In the left navigation where it says \"Artima Blogger\", look for the link\n  \"Guido van van Rossum\" [sic]. Click it.\n* A Crystal error page appears that says \"Page Not in Archive\", because\n  the link you clicked (`https://artima.daarchive.net/index.html$/blogger=guido.html`)\n  hasn't been downloaded yet.\n* Click the \"Download\" button to individually download and view Guido's first post list page.\n* Notice at the top of the page there are links to pages 2, 3, 4, and 5\n  of Guido's post list. Click the page 2 link.\n* Again, a Crystal page saying \"Page Not in Archive\" appears.\n  This time though, we want to download all similar pages.\n  Tick the \"Create Group for Similar Pages\" checkbox at the bottom of the page\n  to reveal a form for creating a group.\n    * A **group** describes a collection of pages that all have the same URL pattern.\n    * Crystal automatically populates its best guess for a URL Pattern.\n      For this example that guessed pattern is:\n      `https://artima.daarchive.net/index.html$/blogger=guido\u0026start=#\u0026thRange=15.html`.\n      The \"#\" wildcard in the pattern will match any number of digits, like \"15\" or \"30\".\n      There are other wildcards like \"*\" which will match any block of text without a \"/\".\n    * Notice that the Preview Members box displays all URLs matching the\n      currently typed URL Pattern.\n    * Crystal also automatically populates its best guess for what the source of\n      the Group should be.\n        * The **source** of a group links to all or most members of the group.\n          When Crystal is asked to redownload a group it will redownload the\n          source first to see if the group has any new members.\n* For this example Crystal has guessed an appropriate URL Pattern and Source\n  for matching all of Guido's post list pages, so we don't need to change them.\n* Optionally type in \"Guido Post List, Page 2+\" for the name of the group.\n* The \"Download Group Immediately\" checkbox should already be ticked.\n  Press the \"Download\" button to create the group and start downloading it.\n* Return to the Crystal app. The top \"Root URLs and Groups\" pane displays\n  each URL and Group you discovered by navigating the downloaded site.\n  It should say:\n  * ⚓️ / - Home\n  * ⚓️ /index.html$/blogger=guido.html\n  * 📁 /index.html$/blogger=guido\u0026start=#\u0026thRange=15.html - Guido Post List, Page 2+\n* Each ⚓️ is a Root URL. Each 📁 is a Group.\n* Click the second ⚓️ to select it.\n* Click the \"Edit\" button.\n* Type \"Guido Post List, Page 1\" for the name of the URL.\n* Click the \"Save\" button. Now the displayed URLs and groups should be:\n  * ⚓️ / - Home\n  * ⚓️ /index.html$/blogger=guido.html - Guido Post List, Page 1\n  * 📁 /index.html$/blogger=guido\u0026start=#\u0026thRange=15.html - Guido Post List, Page 2+\n* Close the untitled window.\n  You'll be prompted to save the project somewhere permanent.\n  Save it as \"Complex Tutorial\" on your desktop.\n* Congratulations! You've downloaded your first complex website with Crystal!\n\nTips for downloading more types of complex sites are available on the wiki:\n\n* [Complex Website Download Examples](https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Complex-Website-Download-Examples)\n\n[xkcd]: https://xkcd.daarchive.net/\n[Artima Weblogs]: https://artima.daarchive.net/\n\n\nHistory 📖\n-------\n\nDavid Foster wrote Crystal originally in 2011 because other website downloaders\nhe tried didn't work well for him and because he wanted to write a large\nPython program, as Python was a new language for him at the time.\n\nEvery few years he revisits Crystal to add features allowing him to archive \nmore sites that he cares about and to streamline the downloading process.\n\n\nDesign 📐\n------\n\nA few unique characteristics of Crystal:\n\n* The Crystal project file format (`*.crystalproj`) is suitable for long-term archival:\n    * Downloaded pages are stored in their original form as downloaded\n      from the web including all HTTP headers.\n    * Metadata is stored in a [SQLite database].\n\n* To download pages automatically, the user must define \"groups\" of pages with similar\n  URLs (ex: \"Blog Posts\", \"Archive Pages\") and specify rules for finding links to members\n  of the group.\n    * Once a group has been defined in this way, it is possible for the user to\n      instruct Crystal to simply download the group. This involves finding links to all\n      members of the group (possibly by downloading other groups) and then downloading\n      each member of the group, in parallel.\n\nThe design is intended for the future addition of the following features:\n\n* Intelligently updating the pages in websites that have already been downloaded.\n    * This would be done by defining rules on groups that specify how often its members\n      are updated. For example the set of \"Archive Pages\" on WordPress blogs is expected\n      to change monthly. And the most recently added member of the \"Archive Pages\" group\n      may change daily, whereas the other members are expected to never change.\n    * Multiple revisions per downloaded resource are supported to allow multiple\n      versions of the same resource to be tracked over time.\n\n[SQLite database]: https://sqlite.org/lts.html\n\n\nContributing ⚒\n------------\n\nCode contributions to Crystal from users are welcome, particularly if you want\nto add specialized support for downloading a site you care about that Crystal\ndoesn't already well-support.\n\n**Note on Licensing:** Crystal uses a noncommercial license rather than \na traditional open source license, but this does not prevent you from \ncontributing code. Contributors retain full rights to their contributions and \ncan use their contributed code in other projects under any license they choose.\nSee the [License FAQ](https://github.com/davidfstr/Crystal-Web-Archiver/wiki/License-FAQ) \nfor more details.\n\nIf you'd like to request a feature, report a bug, or ask a question, please create\n[a new GitHub Issue](https://github.com/davidfstr/Crystal-Web-Archiver/issues/new),\nwith either the `type-feature`, `type-bug`, or `type-question` tag.\n\nIf you'd like to help work on coding new features, please see\nthe [code contributor workflow]. If you'd like to help moderate the community\nplease see the [maintainer workflow].\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for more information.\n\n[code contributor workflow]: https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Contributor-Workflows#code-contributors\n[maintainer workflow]: https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Contributor-Workflows#maintainers\n\n### Code Contributors\n\nPoetry is required for dependency management and development.\nTo install the correct version:\n\n    python -m pip install poetry==2.1.1\n\nTo **run the code locally**,\nrun `poetry install` once in Terminal (Mac) or in Command Prompt (Windows), and\n`poetry run python -m crystal` thereafter.\n\nTo **build new binaries** for Mac or Windows, follow the instructions at [COMPILING.txt](COMPILING.txt).\n\nTo **run non-UI tests**, run `poetry run pytest` in Terminal (Mac) or in Command Prompt (Windows).\n\nTo **run UI tests**, run `poetry run python -m crystal test` in Terminal (Mac) or in Command Prompt (Windows).\n\nTo **typecheck**, run `poetry run mypy` in Terminal (Mac) or in Command Prompt (Windows).\n\nTo **sort imports**, run `poetry run isort .` in Terminal (Mac) or in Command Prompt (Windows).\n\n\nRelated Projects ⎋\n----------------\n\n* [webcrystal]: An alternative website archiving tool that focuses on making it\n  easy for automated crawlers (rather than for humans) to download websites.\n\n[webcrystal]: http://dafoster.net/projects/webcrystal/\n\n\nRelease Notes ⋮\n-------------\n\nSee [RELEASE_NOTES.md](RELEASE_NOTES.md)\n\n\nLicense ⚖️\n-------\n\nCrystal is licensed under the [PolyForm Noncommercial License 1.0.0](LICENSE.txt). \nThis means you may use Crystal for any noncommercial purpose, \nbut commercial use requires a separate license agreement.\n\n**This license does not restrict code contributions.** \nContributors retain all rights to their contributions and may use their \ncontributed code in other projects under any license they choose.\n\nFor more information about Crystal's license please read the\n[License FAQ](https://github.com/davidfstr/Crystal-Web-Archiver/wiki/License-FAQ).\n\nFor commercial licensing inquiries, please contact \n[David Foster](https://dafoster.net/contact/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidfstr%2Fcrystal-web-archiver","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidfstr%2Fcrystal-web-archiver","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidfstr%2Fcrystal-web-archiver/lists"}