Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/achannarasappa/locust

Distributed web data discovery and collection framework built for serverless
https://github.com/achannarasappa/locust

aws-lambda crawler locust scraping serverless

Last synced: about 3 hours ago
JSON representation

Distributed web data discovery and collection framework built for serverless

Awesome Lists containing this project

README

        

[![Build Status](https://travis-ci.com/achannarasappa/locust.svg?branch=master)](https://travis-ci.com/achannarasappa/locust) [![Coverage Status](https://coveralls.io/repos/github/achannarasappa/locust/badge.svg?branch=master)](https://coveralls.io/github/achannarasappa/locust?branch=master)




Locust


Distributed web data discovery and collection framework

## Quick Start

```
npm install @achannarasappa/locust
```

## Features

* Configuration driven jobs
* Distributed execution model to support serverless architectures
* Handle client-side JavaScript execution
* Data extraction using CSS selectors
* Depth-based stop condition along with support for custom stop conditions
* Robust dev tooling with [locust-cli](https://github.com/achannarasappa/locust-cli) to build and test jobs

## Use Cases

* Web indexing (i.e. web crawling)
* Web data extraction (i.e. web scraping)

## Reference

* Documentation
* [Quick start guide](https://locust.dev/docs/getting_started)
* [API](https://locust.dev/docs/api)
* [CLI](https://locust.dev/docs/cli)
* [Examples](https://github.com/achannarasappa/locust-examples)
* Related
* [locust-cli](https://github.com/achannarasappa/locust-cli)
* [locust-aws-terraform](https://github.com/achannarasappa/locust-aws-terraform)
* [locust-website](https://github.com/achannarasappa/locust-website)