Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sntran/gen_spider
An Erlang/Elixir behaviour to define Spiders
https://github.com/sntran/gen_spider
behaviour crawler generic interface spider
Last synced: about 1 month ago
JSON representation
An Erlang/Elixir behaviour to define Spiders
- Host: GitHub
- URL: https://github.com/sntran/gen_spider
- Owner: sntran
- License: apache-2.0
- Created: 2018-09-08T05:40:52.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-09-18T03:20:00.000Z (over 6 years ago)
- Last Synced: 2024-11-15T19:06:54.240Z (about 2 months ago)
- Topics: behaviour, crawler, generic, interface, spider
- Language: Elixir
- Homepage: https://hex.pm/packages/gen_spider
- Size: 33.2 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# GenSpider
[![Build Status](https://img.shields.io/travis/sntran/gen_spider/master.svg)](https://travis-ci.org/sntran/gen_spider)
[![Test Coverage](https://img.shields.io/coveralls/github/sntran/gen_spider.svg)](https://coveralls.io/github/sntran/gen_spider)
[![Hex Version](https://img.shields.io/hexpm/v/gen_spider.svg)](https://hex.pm/packages/gen_spider)
[![License](https://img.shields.io/github/license/sntran/gen_spider.svg)](https://choosealicense.com/licenses/apache-2.0/)GenSpider is a behaviour for defining Spiders.
Spiders are modules which define how a certain site (or a group of sites) will
be scraped, including how to perform the crawl (i.e. follow links) and how to
extract structured data from their pages (i.e. scraping items). In other words,
Spiders are the place where you define the custom behaviour for crawling and
parsing pages for a particular site (or, in some cases, a group of sites).## Hello World
The basic Quotes Spider from Scrapy is implemented with `gen_spider` in both
[Erlang](examples/quotes_spider.erl) and [Elixir](examples/quotes_spider.ex).## Generic Spiders
GenSpider also comes with some useful generic spiders that can be found in the
[examples](examples) directory. Their aim is to provide convenient functionality
for a few common scraping cases, like following all links on a site based on
certain rules, crawling from Sitemaps, or parsing an XML/CSV feed.## Installation
If [available in Hex](https://hex.pm/docs/publish), the package can be installed
by adding `gen_spider` to your list of dependencies in `mix.exs`:```elixir
def deps do
[
{:gen_spider, "~> 0.1.0"}
]
end
```Documentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc)
and published on [HexDocs](https://hexdocs.pm). Once published, the docs can
be found at [https://hexdocs.pm/gen_spider](https://hexdocs.pm/gen_spider).## Contributing
We welcome everyone to contribute to GenSpider and help us tackle existing issues!
Use the [issue tracker][issues] for bug reports or feature requests. Open a [pull request][pulls] when you are ready to contribute.
When submitting a pull request you should not update the `CHANGELOG.md`.
## License
GenSpider source code is released under Apache 2 License.
Check LICENSE file for more information.[issues]: https://github.com/sntran/gen_spider/issues
[pulls]: https://github.com/sntran/gen_spider/pulls