{"id":13587780,"url":"https://github.com/uriel1998/muna","last_synced_at":"2025-04-14T18:32:07.112Z","repository":{"id":147177309,"uuid":"261003648","full_name":"uriel1998/muna","owner":"uriel1998","description":"Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from ArchiveBox.","archived":false,"fork":false,"pushed_at":"2024-10-03T04:29:51.000Z","size":198,"stargazers_count":18,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-28T07:20:38.626Z","etag":null,"topics":["archivebox","bash","url","urls","wayback-machine"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uriel1998.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-05-03T19:19:48.000Z","updated_at":"2025-03-27T10:46:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"c765fc90-16c9-4bba-9205-dd411eb9ef93","html_url":"https://github.com/uriel1998/muna","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uriel1998%2Fmuna","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uriel1998%2Fmuna/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uriel1998%2Fmuna/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uriel1998%2Fmuna/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uriel1998","download_url":"https://codeload.github.com/uriel1998/muna/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248936836,"owners_count":21186113,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archivebox","bash","url","urls","wayback-machine"],"created_at":"2024-08-01T15:06:21.527Z","updated_at":"2025-04-14T18:32:06.727Z","avatar_url":"https://github.com/uriel1998.png","language":"Shell","funding_links":[],"categories":["Shell"],"sub_categories":[],"readme":"# muna\n\nClean a series of links, resolving redirects and finding Wayback results if page is gone\n\n![muna logo](https://raw.githubusercontent.com/uriel1998/muna/master/muna-open-graph.png \"logo\")\n\n## Contents\n 1. [About](#1-about)\n 2. [License](#2-license)\n 3. [Prerequisites](#3-prerequisites)\n 4. [Installation](#4-installation)\n 5. [Usage](#5-usage)\n 6. [TODO](#6-todo)\n\n***\n\n## 1. About\n\nOriginally, `muna` was `uniredirector` for my program [agaetr](https://github.com/uriel1998/agaetr),\nbut grew to be quite a bit more multipurpose.  (The script in `agaetr` is now \nenhanced to be identical to `muna` but in name.\n\nI ended up writing this because of [ArchiveBox](https://github.com/pirate/ArchiveBox). It's \na great self-hosted archiving system, but when you throw a random list of URLs\n(or worse, different types of RSS feeds) at it, you get... *mixed* results. It \ndoes not handle redirects too well, and if something is just 404, you're out of \nluck.  So I wrote `feeds-in` to preprocess inputs from both persnickety RSS \nfeeds and a plain list of URLs. It's included here as a use example of how to \nuse `muna` and the bash function `unredirect`.\n\n`muna` is an old norse word meaning \"call to mind, remember\".\n\n## 2. License\n\nThis project is licensed under the Apache License. For the full license, see `LICENSE`.\n\n## 3. Prerequisites\n\n* bash\n* awk\n* curl \n* wget\n* sed\n\nOn many linux installations these may already be installed; if not, they're \nin your package manager.  (If you have to build *these* from source, you don't \nneed *me* telling you how to do that!)\n\n## 4. Installation\n\n\n### muna\n\nClone or download this repository. Put `muna.sh` somewhere in your `$PATH` or \ncall/source it explicitly.  \n\n### feeds-in.sh\n\nWhile this script is included here as an example, it is a fully functional \nDEATH ST... script. It's a functional script, appropriate to put in a cronjob \nto preprocess sources of URLs for `ArchiveBox`.  Or use it as the base of a \nscript to meet your needs.\n\nOne important and *super* useful note for someone who already has a big list of \nURLs from some other program: All you have to do is put that text file, one URL \nper line, in `RAWDIR` (which you'll configure here in a second) and that list \nwill be pulled seamlessly into the workflow.\n\nIf you are using `feeds-in.sh` with `ArchiveBox`, you will need to edit \nthese lines as appropriate for you:\n\n```\nAPPDIR=\"/home/www-data/apps/ArchiveBox-Docker\"\nRAWDIR=\"$APPDIR/rawdata\"\nDATADIR=\"$APPDIR/data\"\nsource \"$APPDIR/muna.sh\"\n\n```\n\n`APPDIR` should be to your `ArchiveBox` installation. `RAWDIR` is a work \ndirectory where you can also put any text file with a plain list of URLs. \n`DATADIR` should be the data directory of your `ArchiveBox` installation.\n\nThere are several example feeds (starting around line 50). Each strips that \nparticular RSS feed (or XML sitemap) down to a series of URLs, one per line, \nwritten in a text file.  \n\nThe `sed` and `awk` strings are left here as an example for these particular \nkinds of feeds. Feel free to use them as a starting point, but I won't guarantee \nthey work for *your* feeds, they just work for *these* feeds.\n\nThe console output here is a progress bar unless there are errors. The text \nfile is time-date stamped to avoid collisions and overwrites.\n\nThen `feeds-in.sh` calls `ArchiveBox` to import that list of URLs. Uncomment \nthe appropriate line in this section for your style of installation. Note that \nthe *docker-compose* and standalone *docker* commands are quite different; \ndon't confuse them!  (I won't tell you how I know... sigh.)\n\n```\n###############################START PARTS TO EDIT########################\n\n# Uncomment the next line for non-docker installations\n#./archive /\"$OUTFILESHORT\"    \n\n# Uncomment the next line for docker-compose installations   \ndocker-compose exec archivebox /bin/archive /\"$OUTFILESHORT\"\n\n# Uncomment the next line for docker *NOT DOCKER COMPOSE* installations\n#cat \"$OUTFILE\" | docker run -i -v ~/ArchiveBox:/data nikisweeting/archivebox\n    \n```\n\n## 5. Usage\n\n### muna\n\nIf there's a redirect, whether from a shortener or, say, redirected to HTTPS, \n`muna` will follow that and change the variable `\"$url\"` (or return to STDOUT) \nthe appropriate URL. If there is any other error (including if the page is gone or \nthe server has disappeared), it will see if the page is saved at the [Internet Archive](https://archive.org) \nand return the latest capture instead.  If it cannot find a copy anywhere, it \nchanges the variable `\"$url\"` to a NULL string and returns nothing, \nexiting with the exit code `99`.\n\n### Standalone\n\n`muna.sh [-q] URL`\n\n * -q : If run standalone, it will return nothing to STDOUT except for the unredirected URL. Some error messages may print to STDERR.\n\n### As a function\n\nPut this line at the top of your script.\n\n`source path/to/muna`\n    \nIn your script, the variable `$url` must be set before calling the function \n`unredirect`. Afterward, if a successful match was made, `$url` will be \nset appropriately. If no match was made, `$url` will be set to NULL. Like\nthis example in `feeds-in.sh`.\n    \n```\nurl=$(printf \"%s\" \"$line\")\nunredirector \nif [ ! -z \"$url\" ];then  #yup, that url exists\n    echo \"$url\" \u003e\u003e \"$OUTFILE\"\nfi     \n```\n\n### feeds-in.sh\n\n`bash ./feeds-in.sh`\n    \nSeriously, that's it. If you edited things in the script to meet your system, then you should be done.\n    \n## 6. TODO\n\n\n### Roadmap:\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Furiel1998%2Fmuna","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Furiel1998%2Fmuna","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Furiel1998%2Fmuna/lists"}