https://github.com/maxmindlin/scout-lang
A web crawling programming language
https://github.com/maxmindlin/scout-lang
dsl programming-language scraper scraping scraping-websites web-crawling web-scraping
Last synced: 10 months ago
JSON representation
A web crawling programming language
- Host: GitHub
- URL: https://github.com/maxmindlin/scout-lang
- Owner: maxmindlin
- License: apache-2.0
- Created: 2024-05-20T15:24:52.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-21T17:40:17.000Z (over 1 year ago)
- Last Synced: 2025-03-30T06:06:31.792Z (10 months ago)
- Topics: dsl, programming-language, scraper, scraping, scraping-websites, web-crawling, web-scraping
- Language: Rust
- Homepage: https://scout-lang.netlify.app
- Size: 54.4 MB
- Stars: 113
- Watchers: 2
- Forks: 6
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE-APACHE
Awesome Lists containing this project
README
ScoutLang is a DSL made for web scraping, focusing on a simple and expressive syntax. A powerful web crawling stack is abstracted away, allowing you to write powerful, easy to read scraping scripts.
## Why Scout?
- Gain access to powerful web scraping technology without needing expertise
- A focus on developer velocity
- Builtin debugging tools

## Iterative script building
ScoutLang comes bundled with a full REPL and a powerful debugging mode, allowing you to visualize your web scraping scripts in real time.

# Installation
Eventually Scout installation will come bundled with the necessary pre-reqs. For now, you will need:
- Some version of FireFox
- [Geckodriver](https://github.com/mozilla/geckodriver)
The binary can then be installed one of two ways:
1. Cargo (requires Rust)
```sh
cargo install scoutlang
```
2. Run the installer (requires Python3):
```bash
curl --proto '=https' --tlsv1.2 -LsSf https://raw.githubusercontent.com/maxmindlin/scout-lang/main/scripts/installer.py | python3
```
Both install the Scout interpreter into your path as `scout`.
# Usage
The `scout` binary ran with a filename will read and interpret a script file. Without a script will start the REPL.
Available ENV variables:
- `SCOUT_DEBUG`: Whether or not to open the debug browser. Defaults to `false`.
- `SCOUT_PORT`: Which port to run Scout on. Defaults to a random open port. Do not set if you intend to run multiple scout instances at once as ports will conflict.
- `SCOUT_PROXY`: An optional URL to proxy requests to. Defaults to none.
- `SCOUT_PATH`: A path to where Scout installs dependencies, like the standard lib. Defaults to `$HOME/scout-lang/`.
# License
Scout is dual-licensed with MIT & Apache 2.0, at your option.