https://github.com/lawzava/scrape

CLI utility to scrape emails from websites
https://github.com/lawzava/scrape

cli cli-utility email golang hacktoberfest scrape-emails scraper snap snapcraft

Last synced: 6 months ago
JSON representation

CLI utility to scrape emails from websites

Host: GitHub
URL: https://github.com/lawzava/scrape
Owner: lawzava
License: mit
Created: 2019-10-30T12:01:35.000Z (almost 6 years ago)
Default Branch: main
Last Pushed: 2023-12-25T12:09:35.000Z (almost 2 years ago)
Last Synced: 2025-03-24T09:47:07.781Z (7 months ago)
Topics: cli, cli-utility, email, golang, hacktoberfest, scrape-emails, scraper, snap, snapcraft
Language: Go
Homepage: https://snapcraft.io/scrape
Size: 149 KB
Stars: 159
Watchers: 4
Forks: 26
Open Issues: 1
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md

Awesome Lists containing this project

README

          [![scrape](https://snapcraft.io/scrape/badge.svg)](https://snapcraft.io/scrape)

# Scrape

CLI utility to scrape emails from websites

## Features

- Asynchronous scraping

- Recursive link follow

- External link follow

- Cloudflare email obfuscation decoding

- Client side rendered pages support through headless `chromium` load awaits

- Simple, grepable output

## Install

* MacOS:

    ```bash

    brew tap lawzava/scrape https://github.com/lawzava/scrape

    brew install scrape

    ```

* Linux:

    ```bash

    sudo snap install scrape

    ```

## Usage

Sample call:

`scrape -w https://lawzava.com` 

Depends on `chromium` or `google-chrome` being available in path if `--js` is used

### Parameters:

```

      --async             Scrape website pages asynchronously (default true)

      --debug             Print debug logs

  -d, --depth int         Max depth to follow when scraping recursively (default 3)

      --follow-external   Follow external 3rd party links within website

  -h, --help              help for scrape

      --js                Enables EnableJavascript execution await

      --output string     Output type to use (default 'plain', supported: 'csv', 'json') (default "plain")

      --output-with-url   Adds URL to output with each email

      --recursively       Scrape website recursively (default true)

      --timeout int       If > 0, specify a timeout (seconds) for js execution await

  -w, --website string    Website to scrape (default "https://lawzava.com")

```

## Note about scraper package

For those that are looking for `scraper` package - this repository was intended as a cli-use only thus the scraper package was moved to [lawzava/emailscraper](https://github.com/lawzava/emailscraper).

The `scrape` utility will be maintained as a CLI implementation of `emailscraper` package.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lawzava/scrape

Awesome Lists containing this project

README