https://github.com/spencermountain/remote-work
script to crawl and download files from open-directories
https://github.com/spencermountain/remote-work
Last synced: 12 days ago
JSON representation
script to crawl and download files from open-directories
- Host: GitHub
- URL: https://github.com/spencermountain/remote-work
- Owner: spencermountain
- License: mit
- Created: 2023-06-20T22:23:45.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-09-12T13:58:07.000Z (over 1 year ago)
- Last Synced: 2025-03-25T15:14:18.048Z (29 days ago)
- Language: JavaScript
- Homepage:
- Size: 25.4 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![]()
remote-work
![]()
crawl and download files from an open-directory
npm install remote-work
![]()
Sometimes you'll open a webpage, and it will look like this:
This is called an **open directory**, or sometimes an **autoindexer**.
It's a server that's configured to show you all its files, which is nice. It used to be more common.
This is a tool to download the all files from a page like this, from the command-line.
```bash
npx remote-work http://us.archive.ubuntu.com/ubuntu/pool/multiverse/y
```(you'll need to have [NodeJS installed](https://nodejs.dev/en/download/))
### Features
- **async** - downloads files 3 at a time, by default
- **configurable** - download only the files you'd like, using _a [glob](https://www.digitalocean.com/community/tools/glob)_
- **stoppable** - gets files _[depth-first](https://www.codecademy.com/article/tree-traversal)_
- **resumable** - don't re-download files that you already have
### Node API
you can also use this library in a script
`npm install remote-work````js
import remoteWork from 'remote-work'const url = 'http://us.archive.ubuntu.com/ubuntu/pool/multiverse/y'
const dir = './output'
let opts = {
n: 1, //only download one file at a time
match: '*.mp3' //only download mp3 files
}
await remoteWork(url, dir, opts)
```Please be considerate when downloading files from a remote server.
---
### See also
- [wget-wizard](https://www.whatismybrowser.com/developers/tools/wget-wizard/) - do it all w/ a CLI script
- [reddit.com/r/opendirectories](http://reddit.com/r/opendirectories)
- [directory_downloader](https://github.com/SuperVegetoo/directory_downloader) - python directory parser/crawler
- [autoindex](https://github.com/weisjohn/autoindex) javascript open directory parser by John WeisMIT