https://github.com/averagesecurityguy/scrape
Extensible paste site scraper written in Golang.
https://github.com/averagesecurityguy/scrape
gists golang hacking-tool osint osint-tool pastebin pentest scraper
Last synced: about 1 month ago
JSON representation
Extensible paste site scraper written in Golang.
- Host: GitHub
- URL: https://github.com/averagesecurityguy/scrape
- Owner: averagesecurityguy
- Created: 2017-12-20T15:54:06.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2020-12-05T03:57:32.000Z (about 5 years ago)
- Last Synced: 2024-06-20T08:18:04.333Z (over 1 year ago)
- Topics: gists, golang, hacking-tool, osint, osint-tool, pastebin, pentest, scraper
- Language: Go
- Homepage:
- Size: 80.1 KB
- Stars: 70
- Watchers: 6
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scrape
Scrape finds interesting data in text files using keyword searches and regular expressions. Scrape pulls text files from Pastebin and Github Gists. In addition, Scrape can parse text files in a local directory. The search terms are user configurable and are stored in the config.json file. Scrape can run in the background as a service or it can run on demand.
## Sources
### Pastebin
To use scrape without getting blacklisted at Pastebin.com you will need to get a Lifetime Pro membership and whitelist your IP address. Scrape implements Pastebin's recommended scraping logic, which is defined at https://pastebin.com/api_scraping_faq.
### Gists
To use scrape with Github Gists, you will need to create a read-only Github API key. Scrape gets the 100 most recent gists using the API endpoint described at: https://developer.github.com/v3/gists/#list-all-public-gists. At this time, no attempt is made to download truncated files or truncated content.
### Local Files
To use scrape to parse files in a local directory, define the directory in the config.json file. Scrape will parse the files in batches of 100 by default. The batch size is configurable in the config.json file. Keep in mind, that after a file is processed it will be deleted from the directory.
## Installation
You will first need to clone the Git repository with `git clone https://github.com/averagesecurityguy/scrape`. Once you have downloaded the repository, run the setup.sh script from the repository with sudo permissions. This will generate a new user called scrape and install the service.sh init script. If you already have a service account you want to use on your machine, modify the setup.sh script to disable creating the new account and modify service.sh to use the account you want..
## Viewing Gathered Data
While scrape is running you can visit https://127.0.0.1:5000 to view the data that has been gathered. You will need to create a TLS certificate and key and define their locations in the config.json file. When scrape is not runnig you can use the view tool in the install directory to view scrape data.
### View Command Usage
```
Usage:
view filename action [arguments]
Actions:
buckets Get a list of buckets.
read Get the value of the key in the bucket.
keys Get a list of keys in a bucket.
vals Get a list of values in a bucket.
search Get a list of keys from the bucket where the
value contains the given string.
```