{"id":25129552,"url":"https://github.com/xarantolus/collect","last_synced_at":"2025-04-23T16:26:53.313Z","repository":{"id":47349635,"uuid":"116179492","full_name":"xarantolus/Collect","owner":"xarantolus","description":"A server to collect \u0026 archive websites that also supports video downloads","archived":false,"fork":false,"pushed_at":"2023-02-11T14:15:27.000Z","size":2174,"stargazers_count":75,"open_issues_count":9,"forks_count":10,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-02-12T17:08:33.387Z","etag":null,"topics":["archive","self-hosted","video-downloader","web-archiving","webinterface","website-archive","website-scraper"],"latest_commit_sha":null,"homepage":"https://010.one/Collect/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xarantolus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-01-03T20:34:04.000Z","updated_at":"2024-01-28T14:03:55.000Z","dependencies_parsed_at":"2024-01-13T03:01:44.483Z","dependency_job_id":"2e5ec7bc-2fcf-4cf7-b73a-0520bf7941c4","html_url":"https://github.com/xarantolus/Collect","commit_stats":null,"previous_names":[],"tags_count":60,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xarantolus%2FCollect","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xarantolus%2FCollect/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xarantolus%2FCollect/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xarantolus%2FCollect/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xarantolus","download_url":"https://codeload.github.com/xarantolus/Collect/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250469819,"owners_count":21435716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive","self-hosted","video-downloader","web-archiving","webinterface","website-archive","website-scraper"],"created_at":"2025-02-08T12:17:58.252Z","updated_at":"2025-04-23T16:26:53.292Z","avatar_url":"https://github.com/xarantolus.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Collect\nCollect is a server to collect \u0026 archive websites written for NodeJS.\n\nIt does not download entire sites, but rather single pages and all content needed to display them. This means that Collect stores a static copy of the website (and its assets) on your disk. It also hosts these pages so you can access them over the network.\n\n\u003cdetails\u003e\u003csummary\u003eTable of contents\u003c/summary\u003e\n\n  * [Features](#features)\n  * [Screenshots](#screenshots)\n  * [Installation](#installation)\n      - [Settings](#settings)\n      - [Plugins](#plugins)\n    + [Updating](#updating)\n  * [Contributing](#contributing)\n  * [Security considerations](#security-considerations)\n  * [Warning](#warning)\n  * [Credits](#credits)\n  * [License](#license)\n\n\u003c/details\u003e\n\n## Features\n   * General\n      * Archive web pages and videos\n      * View all archived pages and videos\n   * Web interface\n      * Simply add sites and videos via their URL\n      * Browse your archive by domain\n  \t  * Manage/Delete downloaded content\n  \t  * Any change on the server side will by sent to clients in real time\n   * API\n      * Get all sites / list sites by domain\n      * Get details of saved content\n      * Add a site to the archive\n  \t  * Delete a site\n      * Edit title of a saved page\n  \t  * Download all saved pages as an archive (See [Backup](API_doc.md#backup-get))\n  \t  * For more, see [the API documentation](API_doc.md)\n\n### Screenshots\n\n##### Main Page\n  ![Main Page Screenshot](img/screenshot-main.png?raw=true)\n\n##### New Page\n  ![New Page Screenshot](img/screenshot-new.png?raw=true)\n\n##### Details Page\n  ![Details Page Screenshot](img/screenshot-details.png?raw=true)\n\n\n### Installation\nBefore installing Collect, please make sure that `git`, `node` and `npm` are installed.\n\n*Note*: This install process is tested with Node version 12, 14 and 16. The test status can be read from the \"Test\" badge: [![Test](https://github.com/xarantolus/Collect/actions/workflows/test-serverstart.yml/badge.svg)](https://github.com/xarantolus/Collect/actions/workflows/test-serverstart.yml). If this is green, then everything should work!\n\nStart by cloning the repository to your computer/server:\n```\ngit clone https://github.com/xarantolus/Collect.git\n```\n\nSwitch to the `Collect` directory:\n```\ncd Collect/Collect\n```\n\nInstall dependencies:\n```\nnpm install\n```\n\nStart the server in `production` mode (recommended):\n```\nnpm start production\n```\nor \n```\nnode app production\n```\n\nExpected output:\n\n```\nPreparing integrity check...\nChecking cookie file...\nChecking if folders for ids exist...\nAll folders exist.\nChecking if ids for folders exist...\nAll entrys exist.\nFinished integrity check.\nCollect-Server(1.17.0-production) listening on port 80\n```\n\nNow open the website in your browser by visiting http://localhost:80 if running on the same computer or http://yourserver:80, where `yourserver` is the network name of your server. \n\nYou will notice that you need to authenticate with a username and password. That can be set up as shown in the next section.\n\n#### Settings\nTo change settings, edit `Collect/config.json`. There, you can set a `port`, `username`, `password`, `id_length`, `api_token`, `allow_public_view` and `allow_public_all`. Note that you need to restart the server to apply changes.\n\n\u003cdetails\u003e\u003csummary\u003eSettings documentation\u003c/summary\u003e\n\n###### Port\nThe port the server should listen on. If another program uses this port, the server will not be able to start.\n\n###### Username\nThe username that should be used to log in.\n\n###### Password\nThe password for this user. [Please don't use a password you use somewhere else.](#security-considerations)\n\n###### ID length\nThe length of the ids the server should generate. If you save **a lot** of websites from the same domain (\u003e ~1 million / 16\u003csup\u003elength\u003c/sup\u003e) you should change this number.\n\n###### API token\nIf you like to play around with the API, you can set an API token. It is implemented so integrating apps like [Workflow](https://workflow.is) is easy.\n\nIf you don't want to use the API, it is recommended to set the token to a long random string.\n\n###### Allow Public View\nDisable authentification for viewing sites and enable a /public/list url.\n\n###### Allow Public All\nCompletly disable access control. Use at your own risk !\n\u003c/p\u003e\u003c/details\u003e\n \n#### User Guide\nAfter setting up the server, you can read the [user guide](User-Guide.md) to find out more about general usage, keyboard shortcuts and download options.\n\n#### Optional Plugins\n\u003cdetails\u003e\u003csummary\u003eThere is one plugin available for Collect. Open to get more info.\u003c/summary\u003e\n\nThe server can use [`PhantomJS`](http://phantomjs.org/) to process websites after downloading. This ensures that dynamically loaded content is also saved.\n\n**Note**: This is no longer recommended as [PhantomJS is not actively maintained](https://github.com/ariya/phantomjs/issues/15344). I'm not stopping you though.\n\nTo use this, install the [`node-website-scraper-phantom` module](https://github.com/website-scraper/node-website-scraper-phantom).\n```\nnpm install website-scraper-phantom\n```\n\nThis command must be run in the directory that contains the `package.json` file.\n\nAfter installing, the server should output `PhantomJS will be used to process websites` when started.\n\nIf the install fails, you cannot use the module and __*Collect*__ will fall back to the normal way of saving pages.\n\nIf you cannot save any pages after installing, remove the module by running \n```\nnpm uninstall website-scraper-phantom\n```\n\u003c/details\u003e\n\n### Updating\nIf you already have Collect installed on your computer/server and want to update to the latest version, follow these steps.\n\nGo in the directory where Collect is installed.\n```\ncd /path/to/Collect\n```\n\nYou might want to back up your settings file.\n\nWindows:\n```\nmove Collect\\config.json ..\\\n```\n\nLinux/Unix:\n```\nmv Collect/config.json ../config.json\n```\n\nDownload the latest version:\n```\ngit fetch --all\n```\n\nApply all changes (this usually overwrites your cookies file, but not the directory where your sites are saved.)\n```\ngit reset --hard origin/master\n```\n\nRestore the settings file.\n\nWindows:\n```\nmove ..\\config.json Collect\\\n```\n\nLinux/Unix:\n```\nmv ../config.json Collect/config.json\n```\n\nGo to the directory that contains `package.json`.\n```\ncd Collect\n```\n\nInstall all required packages.\n```\nnpm install\n```\n\nAfter restarting your server, the new version should be up \u0026 running.\n\nIf it doesn't start, delete the `node_modules` directory and re-run `npm install`.\n\n## Contributing\nSee the [contributing file](.github/CONTRIBUTING.md).\n\n## Thanks to :heart:\n\n* [@TeoGoddet](https://github.com/TeoGoddet) for [#41](https://github.com/xarantolus/Collect/pull/41): Add public access \n\n## Security considerations\n * The login system uses plain text. Anyone with access to your server (e.g. `SSH` or any malicious program) can read your credentials.\n * Any site you download can read \u0026 set cookies. A downloaded website could send your login cookie to another server. If you host this software in your private network without outside access, everything should be fine even if a cookie gets stolen, but don't take my word for it.\n * The connection does by default not use `HTTPS`.\n\n\n## Warning\nYou're using this tool at your own risk. I am not responsible for any lost data like passwords or websites.\n\n## Credits\n   [Website Scraper Module](https://github.com/website-scraper/node-website-scraper): MIT License. This server is mostly a user interface to this module and would never have been possible without their work.\n   \n   [Website Scraper Module PhantomJS Plugin](https://github.com/website-scraper/node-website-scraper-phantom): MIT License. Makes processing dynamic pages as easy as pie.\n\n   [The UIkit library](https://github.com/uikit/uikit): Copyright YOOtheme GmbH under the MIT License. I really love this UI framework.\n\n   [ArchiverJS](https://github.com/archiverjs/node-archiver): Mit License. `node-archiver` is a nice module for generating all kind of archive files. It is used to [create backups](API_doc.md#backup-get).\n\n   [Ionicons](https://github.com/ionic-team/ionicons): MIT License. The icons are really nice. I used the `ion-ios-cloudy-outline` icon.\n\n   [Notification Sound](https://freesound.org/people/philitup321/sounds/204369/): CC0 1.0 Universal License\n\n   \n## License\nSee the [License file](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxarantolus%2Fcollect","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxarantolus%2Fcollect","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxarantolus%2Fcollect/lists"}