https://github.com/anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!
https://github.com/anonyfox/elixir-scrape

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 2 months ago
JSON representation

Scrape any website, article or RSS/Atom Feed with ease!

Host: GitHub
URL: https://github.com/anonyfox/elixir-scrape
Owner: Anonyfox
License: lgpl-3.0
Created: 2015-06-29T17:25:24.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2020-07-25T15:06:34.000Z (almost 5 years ago)
Last Synced: 2025-04-13T00:47:15.794Z (2 months ago)
Topics: data-science, elixir, feed, html, information-retrieval, readability, rss, scrape, scraping
Language: Elixir
Homepage: https://github.com/Anonyfox/elixir-scrape
Size: 926 KB
Stars: 330
Watchers: 15
Forks: 43
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

        # Scrape

[![Hex.pm](https://img.shields.io/hexpm/dt/scrape.svg)](https://hex.pm/packages/scrape)

[![Hex.pm](https://img.shields.io/hexpm/v/scrape.svg)](https://hex.pm/packages/scrape)

[![Hex.pm](https://img.shields.io/hexpm/l/scrape.svg)](https://hex.pm/packages/scrape)

Structured Data extraction from common web resources, using information-retrieval techniques. See the [docs](https://hexdocs.pm/scrape/Scrape.html)

## Installation

The package can be installed by adding `scrape` to your list of dependencies in `mix.exs`:

```elixir

def deps do

  [

    {:scrape, "~> 3.0.0"}

  ]

end

```

## Known Issues

* This package uses an outdated version of `httpoison` because of `keepcosmos/readability`. You can override this in your app with `override: true` and everything should work.

* The current version 3.X is a complete rewrite from scratch, so some new issues might occur and the API has changed. Please provide some URL to a HTML/Feed document when submitting issues, so I can look into it for bugfixing.

## Usage

* `Scrape.domain!(url)` -> get structured data of a domain-type url (like https://bbc.com)

* `Scrape.feed!(url)` -> get structured data of a RSS/Atom feed

* `Scrape.article!(url)` -> get structured data of an article-type url 

## License

LGPLv3. You can use this package any way you want (including commercially), but I want bugfixes and improvements to flow back into this package for everyone's benefit.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anonyfox/elixir-scrape

Awesome Lists containing this project

README