Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anonyfox/elixir-scrape
Scrape any website, article or RSS/Atom Feed with ease!
https://github.com/anonyfox/elixir-scrape
data-science elixir feed html information-retrieval readability rss scrape scraping
Last synced: about 9 hours ago
JSON representation
Scrape any website, article or RSS/Atom Feed with ease!
- Host: GitHub
- URL: https://github.com/anonyfox/elixir-scrape
- Owner: Anonyfox
- License: lgpl-3.0
- Created: 2015-06-29T17:25:24.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2020-07-25T15:06:34.000Z (over 4 years ago)
- Last Synced: 2024-10-30T02:38:52.855Z (14 days ago)
- Topics: data-science, elixir, feed, html, information-retrieval, readability, rss, scrape, scraping
- Language: Elixir
- Homepage: https://github.com/Anonyfox/elixir-scrape
- Size: 926 KB
- Stars: 327
- Watchers: 16
- Forks: 43
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Scrape
[![Hex.pm](https://img.shields.io/hexpm/dt/scrape.svg)](https://hex.pm/packages/scrape)
[![Hex.pm](https://img.shields.io/hexpm/v/scrape.svg)](https://hex.pm/packages/scrape)
[![Hex.pm](https://img.shields.io/hexpm/l/scrape.svg)](https://hex.pm/packages/scrape)Structured Data extraction from common web resources, using information-retrieval techniques. See the [docs](https://hexdocs.pm/scrape/Scrape.html)
## Installation
The package can be installed by adding `scrape` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:scrape, "~> 3.0.0"}
]
end
```## Known Issues
* This package uses an outdated version of `httpoison` because of `keepcosmos/readability`. You can override this in your app with `override: true` and everything should work.
* The current version 3.X is a complete rewrite from scratch, so some new issues might occur and the API has changed. Please provide some URL to a HTML/Feed document when submitting issues, so I can look into it for bugfixing.## Usage
* `Scrape.domain!(url)` -> get structured data of a domain-type url (like https://bbc.com)
* `Scrape.feed!(url)` -> get structured data of a RSS/Atom feed
* `Scrape.article!(url)` -> get structured data of an article-type url## License
LGPLv3. You can use this package any way you want (including commercially), but I want bugfixes and improvements to flow back into this package for everyone's benefit.