{"id":13509761,"url":"https://github.com/mischov/meeseeks","last_synced_at":"2025-05-16T12:02:47.731Z","repository":{"id":49192513,"uuid":"82330769","full_name":"mischov/meeseeks","owner":"mischov","description":"An Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.","archived":false,"fork":false,"pushed_at":"2023-08-10T04:41:04.000Z","size":357,"stargazers_count":317,"open_issues_count":4,"forks_count":26,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-09T07:03:17.149Z","etag":null,"topics":["css","elixir","html","parser","selectors","xml","xpath"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mischov.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-02-17T19:27:56.000Z","updated_at":"2025-03-21T20:09:01.000Z","dependencies_parsed_at":"2022-08-19T01:11:11.979Z","dependency_job_id":"530f72aa-510b-43a9-8464-6c6b79ba68ad","html_url":"https://github.com/mischov/meeseeks","commit_stats":{"total_commits":169,"total_committers":9,"mean_commits":18.77777777777778,"dds":"0.059171597633136064","last_synced_commit":"74f84010252da3298f8c74e90fdee1ab9ad6d700"},"previous_names":[],"tags_count":38,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischov%2Fmeeseeks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischov%2Fmeeseeks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischov%2Fmeeseeks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischov%2Fmeeseeks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mischov","download_url":"https://codeload.github.com/mischov/meeseeks/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253365465,"owners_count":21897187,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["css","elixir","html","parser","selectors","xml","xpath"],"created_at":"2024-08-01T02:01:12.613Z","updated_at":"2025-05-16T12:02:47.682Z","avatar_url":"https://github.com/mischov.png","language":"Elixir","funding_links":[],"categories":["XML","HTML"],"sub_categories":[],"readme":"# Meeseeks\n\n[![Hex Version](https://img.shields.io/hexpm/v/meeseeks.svg?style=flat\u0026color=%23714a94)](https://hex.pm/packages/meeseeks)\n[![Hex Docs](https://img.shields.io/badge/hex-docs-%23714a94.svg?style=flat\")](https://hexdocs.pm/meeseeks)\n[![License](https://img.shields.io/hexpm/l/meeseeks.svg?style=flat\u0026color=%23714a94)](https://github.com/mischov/meeseeks/blob/main/LICENSE)\n[![Total Download](https://img.shields.io/hexpm/dt/meeseeks.svg?style=flat\u0026color=%23714a94)](https://hex.pm/packages/meeseeks)\n[![CI](https://github.com/mischov/meeseeks/actions/workflows/ci.yml/badge.svg)](https://github.com/mischov/meeseeks/actions/workflows/ci.yml)\n\nMeeseeks is an Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.\n\n```elixir\nimport Meeseeks.CSS\n\nhtml = HTTPoison.get!(\"https://news.ycombinator.com/\").body\n\nfor story \u003c- Meeseeks.all(html, css(\"tr.athing\")) do\n  title = Meeseeks.one(story, css(\".title a\"))\n\n  %{\n    title: Meeseeks.text(title),\n    url: Meeseeks.attr(title, \"href\")\n  }\nend\n#=\u003e [%{title: \"...\", url: \"...\"}, %{title: \"...\", url: \"...\"}, ...]\n```\n\n## Features\n\n- Friendly API\n- Browser-grade HTML5 parser\n- Permissive XML parser\n- CSS and XPath selectors\n- Supports custom selectors\n- Helpers to extract data from selections\n\n## Compatibility\n\nMeeseeks requires a minimum combination of Elixir 1.12.0 and Erlang/OTP 23.0, and is tested with a maximum combination of Elixir 1.14.0 and Erlang/OTP 25.0.\n\n## Installation\n\nMeeseeks depends on the Rust library [`html5ever`](https://github.com/servo/html5ever) via [`meeseeks_html5ever`](https://github.com/mischov/meeseeks_html5ever), but because `meeseeks_html5ever` provides pre-compiled NIFs via [`rustler_precompiled`](https://github.com/philss/rustler_precompiled) **you do not need to have Rust installed** to use Meeseeks.\n\nTo install Meeseeks, add it to your `mix.exs`:\n\n```elixir\ndefp deps do\n  [\n    {:meeseeks, \"~\u003e 0.17.0\"}\n  ]\nend\n```\n\nThen run `mix deps.get`.\n\n### Force Compilation\n\nIf you need to force compilation of the Rust NIF for some reason, see the instructions [here](https://github.com/mischov/meeseeks_html5ever#dependencies).\n\n## Getting Started\n\n### Parse\n\nStart by parsing a source (HTML/XML string or [`Meeseeks.TupleTree`](https://hexdocs.pm/meeseeks/Meeseeks.TupleTree.html)) into a [`Meeseeks.Document`](https://hexdocs.pm/meeseeks/Meeseeks.Document.html) so that it can be queried.\n\n`Meeseeks.parse/1` parses the source as HTML, but `Meeseeks.parse/2` accepts a second argument of either `:html`, `:xml`, or `:tuple_tree` that specifies how the source is parsed.\n\n```elixir\ndocument = Meeseeks.parse(\"\u003cdiv id=main\u003e\u003cp\u003e1\u003c/p\u003e\u003cp\u003e2\u003c/p\u003e\u003cp\u003e3\u003c/p\u003e\u003c/div\u003e\")\n#=\u003e #Meeseeks.Document\u003c{...}\u003e\n```\n\nThe selection functions accept an unparsed source, parsing it as HTML, but parsing is expensive so parse ahead of time when running multiple selections on the same document.\n\n### Select\n\nNext, use one of Meeseeks's selection functions - `fetch_all`, `all`, `fetch_one`, or `one` - to search for nodes.\n\nAll these functions accept a queryable (a source, a document, or a [`Meeseeks.Result`](https://hexdocs.pm/meeseeks/Meeseeks.Result.html)), one or more [`Meeseeks.Selector`](https://hexdocs.pm/meeseeks/Meeseeks.Selector.html)s, and optionally an initial context.\n\n`all` returns a (possibly empty) list of results representing every node matching one of the provided selectors, while `one` returns a result representing the first node to match a selector (depth-first) or nil if there is no match.\n\n`fetch_all` and `fetch_one` work like `all` and `one` respectively, but wrap the result in `{:ok, ...}` if there is a match or return `{:error, %Meeseeks.Error{type: :select, reason: :no_match}}` if there is not.\n\nTo generate selectors, use the `css` macro provided by [`Meeseeks.CSS`](https://hexdocs.pm/meeseeks/Meeseeks.CSS.html) or the `xpath` macro provided by [`Meeseeks.XPath`](https://hexdocs.pm/meeseeks/Meeseeks.XPath.html).\n\n```elixir\nimport Meeseeks.CSS\nresult = Meeseeks.one(document, css(\"#main p\"))\n#=\u003e #Meeseeks.Result\u003c{ \u003cp\u003e1\u003c/p\u003e }\u003e\n\nimport Meeseeks.XPath\nresult = Meeseeks.one(document, xpath(\"//*[@id='main']//p\"))\n#=\u003e #Meeseeks.Result\u003c{ \u003cp\u003e1\u003c/p\u003e }\u003e\n```\n\n### Extract\n\nRetrieve information from the [`Meeseeks.Result`](https://hexdocs.pm/meeseeks/Meeseeks.Result.html) with an extractor.\n\nThe included extractors are `attr`, `attrs`, `data`, `dataset`, `html`, `own_text`, `tag`, `text`, `tree`.\n\n```elixir\nMeeseeks.tag(result)\n#=\u003e \"p\"\nMeeseeks.text(result)\n#=\u003e \"1\"\nMeeseeks.tree(result)\n#=\u003e {\"p\", [], [\"1\"]}\n```\n\nThe extractors `html` and `tree` work on [`Meeseeks.Document`](https://hexdocs.pm/meeseeks/Meeseeks.Document.html)s in addition to [`Meeseeks.Result`](https://hexdocs.pm/meeseeks/Meeseeks.Result.html)s.\n\n```elixir\nMeeseeks.html(document)\n#=\u003e \"\u003chtml\u003e\u003chead\u003e\u003c/head\u003e\u003cbody\u003e\u003cdiv id=\\\"main\\\"\u003e\u003cp\u003e1\u003c/p\u003e\u003cp\u003e2\u003c/p\u003e\u003cp\u003e3\u003c/p\u003e\u003c/div\u003e\u003c/body\u003e\u003c/html\u003e\"\n```\n\n## Guides\n\n- [Meeseeks vs. Floki](guides/meeseeks_vs_floki.md)\n- [CSS Selectors](guides/css_selectors.md)\n- [XPath Selectors](guides/xpath_selectors.md)\n- [Custom Selectors](guides/custom_selectors.md)\n- [Deployment](guides/deployment.md)\n\n## Contributing\n\nIf you are interested in contributing please read the [contribution guidelines](CONTRIBUTING.md).\n\n## License\n\nMeeseeks is licensed under the [MIT license](https://opensource.org/licenses/mit-license.php).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmischov%2Fmeeseeks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmischov%2Fmeeseeks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmischov%2Fmeeseeks/lists"}