https://github.com/archiveteam/ftp-gov-grab
Archiving government FTPs.
https://github.com/archiveteam/ftp-gov-grab
Last synced: about 1 year ago
JSON representation
Archiving government FTPs.
- Host: GitHub
- URL: https://github.com/archiveteam/ftp-gov-grab
- Owner: ArchiveTeam
- License: unlicense
- Created: 2016-12-17T22:01:41.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-02-26T01:08:31.000Z (over 9 years ago)
- Last Synced: 2024-10-30T00:55:34.182Z (over 1 year ago)
- Language: Python
- Size: 20.5 KB
- Stars: 22
- Watchers: 13
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
ftp-gov-grab
=============
More information about the archiving project can be found on the ArchiveTeam wiki: [FTP GOV](http://archiveteam.org/index.php?title=FTP_GOV)
Setup instructions
=========================
Be sure to replace `YOURNICKHERE` with the nickname that you want to be shown as, on the tracker. You don't need to register it, just pick a nickname you like.
In most of the below cases, there will be a web interface running at http://localhost:8001/. If you don't know or care what this is, you can just ignore it—otherwise, it gives you a fancy view of what's going on.
**If anything goes wrong while running the commands below, please scroll down to the bottom of this page. There's troubleshooting information there.**
Running with a warrior
-------------------------
Follow the [instructions on the ArchiveTeam wiki](http://archiveteam.org/index.php?title=Warrior) for installing the Warrior, and select the "FTP GOV" project in the Warrior interface.
Running without a warrior
-------------------------
To run this outside the warrior, clone this repository, cd into its directory and run:
pip install --upgrade seesaw
pip2 install --upgrade warc
Grab a copy of Wpull 2.0.1 from https://launchpad.net/wpull/+download:
wget https://launchpad.net/wpull/trunk/v2.0.1/+download/wpull-2.0.1-linux-x86_64-3.4.3-20161230193838.zip
python -c "import zipfile; f=zipfile.ZipFile('wpull-2.0.1-linux-x86_64-3.4.3-20161230193838.zip'); f.extractall('./')"
chmod +x ./wpull
then start downloading with:
run-pipeline pipeline.py --concurrent 2 YOURNICKHERE
For more options, run:
run-pipeline --help
If you don't have root access and/or your version of pip is very old, you can replace "pip install --upgrade seesaw" with:
wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py ; python get-pip.py --user ; ~/.local/bin/pip install --user seesaw
so that pip and seesaw are installed in your home, then run
~/.local/bin/run-pipeline pipeline.py --concurrent 2 YOURNICKHERE
Running multiple instances on different IPs
-------------------------------------------
This feature requires seesaw version 0.0.16 or greater. Use `pip install --upgrade seesaw` to upgrade.
Use the `--context-value` argument to pass in `bind_address=123.4.5.6` (replace the IP address with your own).
Example of running 2 threads, no web interface, and Wget binding of IP address:
run-pipeline pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server --context-value bind_address=123.4.5.6
Distribution-specific setup
-------------------------
### For Debian/Ubuntu:
adduser --system --group --shell /bin/bash archiveteam
apt-get update && install -y git-core libgnutls-dev screen python-dev python-pip bzip2 zlib1g-dev unzip
pip install --upgrade seesaw
su -c "cd /home/archiveteam; git clone https://github.com/ArchiveTeam/ftp-gov-grab.git" archiveteam
su -c "cd /home/archiveteam/ftp-gov-grab/; wget https://launchpad.net/wpull/trunk/v2.0.1/+download/wpull-2.0.1-linux-x86_64-3.4.3-20161230193838.zip; unzip wpull-2.0.1-linux-x86_64-3.4.3-20161230193838.zip; chmod +x ./wpull" archiveteam
screen su -c "cd /home/archiveteam/ftp-gov-grab/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam
[... ctrl+A D to detach ...]
### For CentOS:
Ensure that you have the CentOS equivalent of bzip2 installed as well. You might need the EPEL repository to be enabled.
yum -y install gnutls-devel python-pip zlib-devel unzip
pip install --upgrade seesaw
[... pretty much the same as above ...]
### For openSUSE:
zypper install screen python-pip libgnutls-devel bzip2 python-devel gcc make unzip
pip install --upgrade seesaw
[... pretty much the same as above ...]
### For OS X:
You need Homebrew. Ensure that you have the OS X equivalent of bzip2 installed as well.
brew install python gnutls unzip
pip install --upgrade seesaw
[... pretty much the same as above ...]
**There is a known issue with some packaged versions of rsync. If you get errors during the upload stage, ftp-gov-grab will not work with your rsync version.**
This supposedly fixes it:
alias rsync=/usr/local/bin/rsync
### For Arch Linux:
Ensure that you have the Arch equivalent of bzip2 installed as well.
1. Make sure you have `python2-pip` installed.
2. Run `pip2 install seesaw`.
3. Modify the run-pipeline script in seesaw to point at `#!/usr/bin/python2` instead of `#!/usr/bin/python`.
4. `useradd --system --group users --shell /bin/bash --create-home archiveteam`
5. `su -c "cd /home/archiveteam; git clone https://github.com/ArchiveTeam/ftp-gov-grab.git" archiveteam`
6. `su -c "cd /home/archiveteam/ftp-gov-grab/; wget https://launchpad.net/wpull/trunk/v2.0.1/+download/wpull-2.0.1-linux-x86_64-3.4.3-20161230193838.zip; unzip wpull-2.0.1-linux-x86_64-3.4.3-20161230193838.zip; chmod +x ./wpull" archiveteam`
7. `screen su -c "cd /home/archiveteam/ftp-gov-grab/; run-pipeline pipeline.py --concurrent 2 --address '127.0.0.1' YOURNICKHERE" archiveteam`
### For FreeBSD:
Nothing specific here. If not so, please do let us know on IRC (irc.efnet.org #archiveteam).
Troubleshooting
=========================
Broken? These are some of the possible solutions:
### Wpull not successfully running
If you have trouble getting Wpull running, please see http://wpull.readthedocs.org/en/master/install.html.
### Problem with gnutls or openssl during building
Please ensure that gnutls-dev(el) and openssl-dev(el) are installed.
### ImportError: No module named seesaw
If you're sure that you followed the steps to install `seesaw`, permissions on your module directory may be set incorrectly. Try the following:
chmod o+rX -R /usr/local/lib/python2.7/dist-packages
### run-pipeline: command not found
Install `seesaw` using `pip2` instead of `pip`.
pip2 install seesaw
### Issues in the code
If you notice a bug and want to file a bug report, please use the GitHub issues tracker.
Are you a developer? Help write code for us! Look at our [developer documentation](http://archiveteam.org/index.php?title=Dev) for details.
### Other problems
Have an issue not listed here? Join us on IRC and ask! We can be found at irc.efnet.org #cheetoflee.