{"id":13910645,"url":"https://github.com/alopezrivera/anchorage","last_synced_at":"2025-04-29T09:31:01.493Z","repository":{"id":50314140,"uuid":"382062254","full_name":"alopezrivera/anchorage","owner":"alopezrivera","description":"Save your bookmark collection in the Internet Archive, or locally.","archived":false,"fork":false,"pushed_at":"2022-07-05T21:25:34.000Z","size":9183,"stargazers_count":23,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-15T01:23:17.137Z","etag":null,"topics":["archiving","internet-archive","permanence","web"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alopezrivera.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-01T14:35:17.000Z","updated_at":"2024-01-26T21:56:25.000Z","dependencies_parsed_at":"2022-09-26T21:11:56.104Z","dependency_job_id":null,"html_url":"https://github.com/alopezrivera/anchorage","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezrivera%2Fanchorage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezrivera%2Fanchorage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezrivera%2Fanchorage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alopezrivera%2Fanchorage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alopezrivera","download_url":"https://codeload.github.com/alopezrivera/anchorage/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251473181,"owners_count":21595018,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archiving","internet-archive","permanence","web"],"created_at":"2024-08-07T00:01:40.123Z","updated_at":"2025-04-29T09:31:00.434Z","avatar_url":"https://github.com/alopezrivera.png","language":"Python","readme":"# Anchorage\n\n![alt text](tests/coverage/coverage.svg \".coverage available in tests/coverage/\")\n\nAnchorage consists of a Python library and CLI to save your bookmark collection in bulk, forever:\nonline in the [Internet Archive](https://archive.org/) or locally, using [ArchiveBox](https://archivebox.io/).\n\nAnchorage will automatically retrieve your bookmark collection from your browser of \nchoice, filter out duplicates, local files as well as entries matching filters of your\nown making, and archive the chosen ones.\n\nRead on to get started. [The full Python API documentation is available \nhere](https://anchorage-docs.github.io/).\n\n![alt text](https://github.com/antonlopezr/anchorage/blob/master/docs/demo/gifs/run.gif \"Anchorage in action\")\n\n-----\n\n#### Table of Contents\n\n[ 1. Introduction ](#1-introduction)\n\n[ 3. Requirements \u0026 Install  ](#2-requirements--install)\n\n[ 4. Anchorage configuration  ](#3-anchorage-configuration)\n\n[ 4. Anchorage CLI  ](#4-anchorage-cli)\n\n[ 6. Python API  ](#5-python-api)\n\n[ _6.1 Anchorage configuration_ ](#51-anchorage-configuration)\n\n[ _6.3 Bookmark retrieval_ ](#52-bookmark-retrieval)\n\n[ _6.3 Archiving_ ](#53-archiving)\n\n---\n\n## 1. Introduction\n\nAs the internet ages link rot takes over larger and larger swathes of it, from the tiny to the mighty, from the trivial\nto the best pieces you ever found: all lost forever. Anchorage is an attempt to make it as easy as possible for you to\nsave the little corner of it you're most fond of, for your own peace of mind and the enjoyment of us all :)\n\n## 2. Requirements \u0026 Install\nA working [Docker](https://docs.docker.com/get-docker/) install is the only requirement, beyond Python and Anchorage's \ndependencies. \n**Without Docker**: Docker is used to run [ArchiveBox](https://archivebox.io/), via a provided \n[docker-compose file](https://github.com/ArchiveBox/ArchiveBox/wiki/Docker#docker-compose). \nWithout Docker Anchorage will not be able to archive your collection locally, but it will still be \nable to save it online in the Internet Archive.\n\nAnchorage can be installed using pip as any Python package. Its dependencies will be downloaded automatically. \n\n    pip install anchorage\n    \n## 3. Anchorage configuration\nTo access a browser's bookmarks file, Anchorage stores its location in its configuration file:\n\n    ~/.anchorage/config.toml\n    \nThere's an [example `config.toml`](https://github.com/antonlopezr/anchorage/blob/master/example-config.toml)\nin this repo for reference. \n\nTo add a new browser simply add a new top-level key, followed by its bookmark file paths. Anchorage only needs\nthe path in your operating system to work.\n\n    [\u003cbrowser name\u003e]\n    linux = \u003cpath\u003e\n    macos = \u003cpath\u003e\n    windows = \u003cpath\u003e\n\nImportantly:\n- Linux and MacOS paths are stored in **full**.\n- Windows paths are stored from the **`AppData`** directory.\n\nThe default `config.toml` contains the bookmark file paths for Google Chrome, Mozilla Firefox and Microsoft Edge\nand Edge Beta for **Windows** only. To use Anchorage in **Linux or MacOS** add the bookmark file path of your\nbrowser of choice to your `config.toml`.\n\n#### Editing the Anchorage config file\nThe config file can be edited just as any other. \nNew browsers will automatically be listed in the CLI.\n\nImportantly:\n- Set unknown bookmark file paths to \"?\". That way the CLI will recognize those as unknown and behave appropriately.\n\n![alt text](https://github.com/antonlopezr/anchorage/blob/master/docs/demo/gifs/config.gif \"Adding the location of the Google Chrome bookmarks file to ~/.anchorage/config.toml\")\n\n## 4. Anchorage CLI\nThe CLI will guide you through retrieving your bookmarks from your browser of choice, applying\nfilters to you bookmark collection and archiving your bookmarks in the Internet Archive or locally, \nusing ArchiveBox.\n\nTo start the CLI open your shell and type\n\n    anchorage\n\nYou will be asked whether you're ready to proceed. On the ok it will ensure all dependencies are present.\n\n##### 1. Config check\nIf a config file is found, you will be prompted to choose whether to \nkeep the current config or overwrite it with the default one.\n\n##### 2. Browser choice\nYou will be prompted to choose which browser to retrieve your bookmark collection from. The browser \nchoices are sourced from `config.toml`. Refer to [section 3](#3-anchorage-configuration) for \nediting it to add a missing browser or enter the path to the bookmarks file of your browser, if it's missing \n(equal to \"?\").\n\n##### 3. Applying filters to the collection\nFilters can be applied to your bookmark collection before archiving. \nAny or all of four filters can be chosen, one specific for URLs:\n\n- `Local files`: remove local URLs (say, PDFs stored in your computer) from the collection.\n\nand three general: \n\n- `Match string`: remove bookmark URLs, names or bookmark directories matching a provided string or any string \nin a string list.\n- `Match substring`: remove bookmark URLs, names or bookmark directories containing a provided string or any string \nin a string list.\n- `Regex`: remove bookmark URLs, names or bookmark directories matching a provided regex formula.\n\nFor each you will be prompted to choose to apply it to any or all of the previous.\n\n##### 4. Archive choice\nYou will be then asked to choose whether to archive your collection online or locally.\n##### _Online_\nBy default websites will not be archived if a previous image exists in The Internet Archive. This is to save time: we rest easy as a those \nsites are saved already at some point. In case you want to save a current snapshot of the colection, you will be prompted whether to override this \nand archive all sites in the collection regardless. This may take significantly longer. Based on your choice, you will be given an estimate of the \narchive time. \n##### _Local_\nTo archive your collection locally you will be prompted for an archive directory. \n\n##### 5. Run\nAfter a last confirmation the process will begin. A progress bar will inform you of how far the process \nis from finishing, how many bookmarks have been saved and provide a dynamic estimate of the time remaining \nbefore the process is finished.\n\n## 5. Python API: user's guide\nThe full documentation of the Anchorage API is available in the [docs](https://anchorage-docs.github.io/) site.\n\n### 5.1 Anchorage configuration\nGenerate the Anchorage config file with the `init` command.\n\n    from anchorage import init\n    \n    init()\n\n### 5.2 Bookmark retrieval\nThree methods are relevant:\n\n- `path(\u003cbrowser\u003e)`: obtain the path to your chosen browser's bookmarks file (in your OS) from `config.toml`.\n- `load(\u003cpath\u003e)`: read your chosen browser's JSON or JSONLZ4 bookmarks file and return a Python dictionary.\n- `bookmarks(\u003cdict\u003e)`: create an instance of the `bookmarks` class.\n\nThe `bookmarks` class creates a second bookmarks dictionary more suitable for our intent, and contains methods\nto filter and loop through the collection. Filters can be applied as seen below.  \n\n    from anchorage import path, load, bookmarks\n    \n    collection = bookmarks(load(path(\u003cbrowser name\u003e)),\n                           drop_local_files= \u003cboolean\u003e,\n                           drop_dirs=        \u003cstring or list of strings\u003e,\n                           drop_names=       \u003cstring or list of strings\u003e,\n                           drop_urls=        \u003cstring or list of strings\u003e,\n                           drop_dirs_subs=   \u003cstring or list of strings\u003e,\n                           drop_names_subs=  \u003cstring or list of strings\u003e,\n                           drop_urls_subs=   \u003cstring or list of strings\u003e,\n                           drop_dirs_regex=  \u003cstring\u003e,\n                           drop_names_regex= \u003cstring\u003e,\n                           drop_urls_regex=  \u003cstring\u003e\n                           )\n\n### 5.3 Archiving\nInput: `bookmarks` instance or bookmark dictionary returned by `load`. \n\n#### Online\n\n    from anchorage import anchor_online\n    \n    anchor_online(bookmarks, overwrite=\u003cbool\u003e)\n    \nThe `overwrite` parameter determines whether to save snapshots of sites already present in the \nInternet Archive or not.\n    \n#### Locally\n\n    from anchorage import anchor_locally\n    \n    anchor_locally(bookmarks, archive=\u003cdir\u003e)\n    \nThe `archive` parameter specifies the directory in which to create the local archive.\n\nRunning the ArchiveBox default NGINX server \ncan be done with the following command.\n\n    from anchorage import server\n    \n    server()\n\n---\n\n[Back to top](#Anchorage)\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falopezrivera%2Fanchorage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falopezrivera%2Fanchorage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falopezrivera%2Fanchorage/lists"}