Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/h4kor/koios
A web crawler written in Elixir
https://github.com/h4kor/koios
Last synced: about 2 months ago
JSON representation
A web crawler written in Elixir
- Host: GitHub
- URL: https://github.com/h4kor/koios
- Owner: H4kor
- License: mit
- Created: 2022-05-18T18:06:09.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-06-01T18:32:14.000Z (over 2 years ago)
- Last Synced: 2023-03-10T23:07:28.235Z (almost 2 years ago)
- Language: Elixir
- Size: 58.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# koios
This is not fit for any usage at the moment.
Don't even think about adding this to your project, it was written by a fool without prior knowledge of Elixir.**TODO: Add description**
## ArchitectureThe **RetrieverRegistry** provides **Retriever**s to be used for accessing website data.
**Retriever** are domain specific to enforce requests limits per domain.
A **Crawler** takes a URL/Domain and uses the **Retriever** to download all pages in a breath-first fashion.
A **Coordinator** can have multiple crawlers sending data to them and multiple scrapers processing the data.
A **Scraper** retrieves data (html) and process it in arbitrary ways.
```elixir
Koios.add_scraper(My.Scraper, nil)
Koios.add_scraper(Another.Scraper, nil)
Koios.build_crawler("https://blog.libove.org")
|> Koios.add_constraint(Koios.DomainConstraint, "**.libove.org")
|> Koios.add_constraint(Koios.DepthConstraint, 4)
|> Koios.max_tasks(400)
|> Koios.start_crawler
```