https://github.com/welaika/sputnik
Crawling since 1957
https://github.com/welaika/sputnik
elixir
Last synced: 7 months ago
JSON representation
Crawling since 1957
- Host: GitHub
- URL: https://github.com/welaika/sputnik
- Owner: welaika
- License: mit
- Created: 2017-12-08T04:06:15.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2021-10-11T14:15:12.000Z (over 4 years ago)
- Last Synced: 2025-06-09T12:12:29.040Z (8 months ago)
- Topics: elixir
- Language: Elixir
- Homepage: https://hexdocs.pm/sputnik/
- Size: 507 KB
- Stars: 30
- Watchers: 7
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# Sputnik
by weLaika
Sputnik is a website crawler written in Elixir.
It crawls a website following all internal links and makes a report of all pages' status codes.
With query flags you can pass one ore more css selector to produce pages report about that.
## Build
Sputnik can be built with:
```
mix deps.get
mix escript.build
```
## Usage
Sputnik takes the url to crawl and optional query to perform on the crawled pages:
### Options
- query: valid css selectors, separated by commas, that you want to analyze all over the website
- connections: max number of concurrent HTTP connections (default is 10)
```
sputnik [--query --query ...] [--connections ]
```
## Examples
running
```
./sputnik "http://spawnfest.github.io" --query "div" --query "a" --query "h1,h2,h3,h4,h5,h6" --connections 10
```
produces the following output
```
#################### Pages ####################
Pages found: 19
status_code 200: 12
status_code 301: 7
#################### Queries ####################
## query `a` ##
327 result(s)
Min 18 result(s) per page
Max 57 result(s) per page
## query `div` ##
347 result(s)
Min 13 result(s) per page
Max 53 result(s) per page
## query `h1,h2,h3,h4,h5,h6` ##
95 result(s)
Min 0 result(s) per page
Max 31 result(s) per page
```
and it opens the browser with a page like this

## Requirements
Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
be found at [https://hexdocs.pm/sputnik](https://hexdocs.pm/sputnik).
## Testing
To run tests:
```bash
$ mix test --cover
```
To run credo:
```bash
$ mix credo
```
## Documentation
To generate the documentation:
```bash
$ mix docs && open doc/index.html
```
## Releasing
Bump the version in `mix.exs`, commit && push, and run `mix hex.publish`
Please read [https://hex.pm/docs/publish](https://hex.pm/docs/publish) for help.