Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mischov/meeseeks
An Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.
https://github.com/mischov/meeseeks
css elixir html parser selectors xml xpath
Last synced: 5 days ago
JSON representation
An Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.
- Host: GitHub
- URL: https://github.com/mischov/meeseeks
- Owner: mischov
- License: mit
- Created: 2017-02-17T19:27:56.000Z (almost 8 years ago)
- Default Branch: main
- Last Pushed: 2023-08-10T04:41:04.000Z (over 1 year ago)
- Last Synced: 2024-10-19T16:29:02.231Z (about 2 months ago)
- Topics: css, elixir, html, parser, selectors, xml, xpath
- Language: Elixir
- Homepage:
- Size: 349 KB
- Stars: 315
- Watchers: 9
- Forks: 23
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- freaking_awesome_elixir - Elixir - A library for parsing and extracting data from HTML and XML with CSS or XPath selectors. (XML)
- fucking-awesome-elixir - meeseeks - A library for parsing and extracting data from HTML and XML with CSS or XPath selectors. (XML)
- awesome-elixir - meeseeks - A library for parsing and extracting data from HTML and XML with CSS or XPath selectors. (XML)
README
# Meeseeks
[![Hex Version](https://img.shields.io/hexpm/v/meeseeks.svg?style=flat&color=%23714a94)](https://hex.pm/packages/meeseeks)
[![Hex Docs](https://img.shields.io/badge/hex-docs-%23714a94.svg?style=flat")](https://hexdocs.pm/meeseeks)
[![License](https://img.shields.io/hexpm/l/meeseeks.svg?style=flat&color=%23714a94)](https://github.com/mischov/meeseeks/blob/main/LICENSE)
[![Total Download](https://img.shields.io/hexpm/dt/meeseeks.svg?style=flat&color=%23714a94)](https://hex.pm/packages/meeseeks)
[![CI](https://github.com/mischov/meeseeks/actions/workflows/ci.yml/badge.svg)](https://github.com/mischov/meeseeks/actions/workflows/ci.yml)Meeseeks is an Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.
```elixir
import Meeseeks.CSShtml = HTTPoison.get!("https://news.ycombinator.com/").body
for story <- Meeseeks.all(html, css("tr.athing")) do
title = Meeseeks.one(story, css(".title a"))%{
title: Meeseeks.text(title),
url: Meeseeks.attr(title, "href")
}
end
#=> [%{title: "...", url: "..."}, %{title: "...", url: "..."}, ...]
```## Features
- Friendly API
- Browser-grade HTML5 parser
- Permissive XML parser
- CSS and XPath selectors
- Supports custom selectors
- Helpers to extract data from selections## Compatibility
Meeseeks requires a minimum combination of Elixir 1.12.0 and Erlang/OTP 23.0, and is tested with a maximum combination of Elixir 1.14.0 and Erlang/OTP 25.0.
## Installation
Meeseeks depends on the Rust library [`html5ever`](https://github.com/servo/html5ever) via [`meeseeks_html5ever`](https://github.com/mischov/meeseeks_html5ever), but because `meeseeks_html5ever` provides pre-compiled NIFs via [`rustler_precompiled`](https://github.com/philss/rustler_precompiled) **you do not need to have Rust installed** to use Meeseeks.
To install Meeseeks, add it to your `mix.exs`:
```elixir
defp deps do
[
{:meeseeks, "~> 0.17.0"}
]
end
```Then run `mix deps.get`.
### Force Compilation
If you need to force compilation of the Rust NIF for some reason, see the instructions [here](https://github.com/mischov/meeseeks_html5ever#dependencies).
## Getting Started
### Parse
Start by parsing a source (HTML/XML string or [`Meeseeks.TupleTree`](https://hexdocs.pm/meeseeks/Meeseeks.TupleTree.html)) into a [`Meeseeks.Document`](https://hexdocs.pm/meeseeks/Meeseeks.Document.html) so that it can be queried.
`Meeseeks.parse/1` parses the source as HTML, but `Meeseeks.parse/2` accepts a second argument of either `:html`, `:xml`, or `:tuple_tree` that specifies how the source is parsed.
```elixir
document = Meeseeks.parse("")1
2
3
#=> #Meeseeks.Document<{...}>
```The selection functions accept an unparsed source, parsing it as HTML, but parsing is expensive so parse ahead of time when running multiple selections on the same document.
### Select
Next, use one of Meeseeks's selection functions - `fetch_all`, `all`, `fetch_one`, or `one` - to search for nodes.
All these functions accept a queryable (a source, a document, or a [`Meeseeks.Result`](https://hexdocs.pm/meeseeks/Meeseeks.Result.html)), one or more [`Meeseeks.Selector`](https://hexdocs.pm/meeseeks/Meeseeks.Selector.html)s, and optionally an initial context.
`all` returns a (possibly empty) list of results representing every node matching one of the provided selectors, while `one` returns a result representing the first node to match a selector (depth-first) or nil if there is no match.
`fetch_all` and `fetch_one` work like `all` and `one` respectively, but wrap the result in `{:ok, ...}` if there is a match or return `{:error, %Meeseeks.Error{type: :select, reason: :no_match}}` if there is not.
To generate selectors, use the `css` macro provided by [`Meeseeks.CSS`](https://hexdocs.pm/meeseeks/Meeseeks.CSS.html) or the `xpath` macro provided by [`Meeseeks.XPath`](https://hexdocs.pm/meeseeks/Meeseeks.XPath.html).
```elixir
import Meeseeks.CSS
result = Meeseeks.one(document, css("#main p"))
#=> #Meeseeks.Result<{1
}>import Meeseeks.XPath
result = Meeseeks.one(document, xpath("//*[@id='main']//p"))
#=> #Meeseeks.Result<{1
}>
```### Extract
Retrieve information from the [`Meeseeks.Result`](https://hexdocs.pm/meeseeks/Meeseeks.Result.html) with an extractor.
The included extractors are `attr`, `attrs`, `data`, `dataset`, `html`, `own_text`, `tag`, `text`, `tree`.
```elixir
Meeseeks.tag(result)
#=> "p"
Meeseeks.text(result)
#=> "1"
Meeseeks.tree(result)
#=> {"p", [], ["1"]}
```The extractors `html` and `tree` work on [`Meeseeks.Document`](https://hexdocs.pm/meeseeks/Meeseeks.Document.html)s in addition to [`Meeseeks.Result`](https://hexdocs.pm/meeseeks/Meeseeks.Result.html)s.
```elixir
Meeseeks.html(document)
#=> ""1
2
3
```## Guides
- [Meeseeks vs. Floki](guides/meeseeks_vs_floki.md)
- [CSS Selectors](guides/css_selectors.md)
- [XPath Selectors](guides/xpath_selectors.md)
- [Custom Selectors](guides/custom_selectors.md)
- [Deployment](guides/deployment.md)## Contributing
If you are interested in contributing please read the [contribution guidelines](CONTRIBUTING.md).
## License
Meeseeks is licensed under the [MIT license](https://opensource.org/licenses/mit-license.php).