Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/maxmindlin/scout-lang

A web crawling programming language
https://github.com/maxmindlin/scout-lang

dsl programming-language scraper scraping scraping-websites web-crawling web-scraping

Last synced: about 3 hours ago
JSON representation

A web crawling programming language

Awesome Lists containing this project

README

        



A Web Crawling Programming Language


GitHub Actions Workflow Status
GitHub Actions Workflow Status
GitHub Release
Crates.io Version
Docker version




ScoutLang is a DSL made for web scraping, focusing on a simple and expressive syntax. A powerful web crawling stack is abstracted away, allowing you to write powerful, easy to read scraping scripts.

## Why Scout?

- Gain access to powerful web scraping technology without needing expertise
- A focus on developer velocity
- Builtin debugging tools


![example](./assets/code-sample.png)

## Iterative script building

ScoutLang comes bundled with a full REPL and a powerful debugging mode, allowing you to visualize your web scraping scripts in real time.

![debug](./assets/scout.gif)

# Installation

Eventually Scout installation will come bundled with the necessary pre-reqs. For now, you will need:
- Some version of FireFox
- [Geckodriver](https://github.com/mozilla/geckodriver)

The binary can then be installed one of two ways:

1. Cargo (requires Rust)

```sh
cargo install scoutlang
```

2. Run the installer (requires Python3):

```bash
curl --proto '=https' --tlsv1.2 -LsSf https://raw.githubusercontent.com/maxmindlin/scout-lang/main/scripts/installer.py | python3
```

Both install the Scout interpreter into your path as `scout`.

# Usage

The `scout` binary ran with a filename will read and interpret a script file. Without a script will start the REPL.

Available ENV variables:
- `SCOUT_DEBUG`: Whether or not to open the debug browser. Defaults to `false`.
- `SCOUT_PORT`: Which port to run Scout on. Defaults to a random open port. Do not set if you intend to run multiple scout instances at once as ports will conflict.
- `SCOUT_PROXY`: An optional URL to proxy requests to. Defaults to none.
- `SCOUT_PATH`: A path to where Scout installs dependencies, like the standard lib. Defaults to `$HOME/scout-lang/`.

# License

Scout is dual-licensed with MIT & Apache 2.0, at your option.