{"id":13508267,"url":"https://github.com/philss/floki","last_synced_at":"2025-05-14T22:05:08.106Z","repository":{"id":22747507,"uuid":"26092799","full_name":"philss/floki","owner":"philss","description":"Floki is a simple HTML parser that enables search for nodes using CSS selectors.","archived":false,"fork":false,"pushed_at":"2025-04-17T08:12:51.000Z","size":1637,"stargazers_count":2098,"open_issues_count":24,"forks_count":156,"subscribers_count":25,"default_branch":"main","last_synced_at":"2025-05-07T21:13:57.329Z","etag":null,"topics":["css-selector","css-selectors","elixir","erlang","fast-html","floki","html-parser","html5ever"],"latest_commit_sha":null,"homepage":"https://hex.pm/packages/floki","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philss.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":"philss"}},"created_at":"2014-11-02T20:49:15.000Z","updated_at":"2025-05-07T04:41:29.000Z","dependencies_parsed_at":"2023-01-13T22:09:13.688Z","dependency_job_id":"665aabb5-7d32-44e0-b92a-999c7b982444","html_url":"https://github.com/philss/floki","commit_stats":{"total_commits":681,"total_committers":104,"mean_commits":6.548076923076923,"dds":0.5565345080763583,"last_synced_commit":"96955f925d62989b6f0bfaf09ce6505e67e04fbb"},"previous_names":[],"tags_count":78,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philss%2Ffloki","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philss%2Ffloki/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philss%2Ffloki/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philss%2Ffloki/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philss","download_url":"https://codeload.github.com/philss/floki/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254036999,"owners_count":22003715,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["css-selector","css-selectors","elixir","erlang","fast-html","floki","html-parser","html5ever"],"created_at":"2024-08-01T02:00:50.630Z","updated_at":"2025-05-14T22:05:08.009Z","avatar_url":"https://github.com/philss.png","language":"Elixir","readme":"[![Actions Status](https://github.com/philss/floki/workflows/CI/badge.svg?branch=main)](https://github.com/philss/floki/actions)\n[![Floki version](https://img.shields.io/hexpm/v/floki.svg)](https://hex.pm/packages/floki)\n[![Hex Docs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/floki/)\n[![Hex.pm](https://img.shields.io/hexpm/dt/floki.svg)](https://hex.pm/packages/floki)\n[![License](https://img.shields.io/hexpm/l/floki.svg)](https://github.com/philss/floki/blob/main/LICENSE)\n[![Last Updated](https://img.shields.io/github/last-commit/philss/floki.svg)](https://github.com/philss/floki/commits/main)\n\n\u003cimg src=\"assets/images/floki-logo-with-type.svg\" width=\"500\" alt=\"Floki logo\"\u003e\n\n**Floki is a simple HTML parser that enables search for nodes using CSS selectors**.\n\n[Check the documentation 📙](https://hexdocs.pm/floki).\n\n## Usage\n\nTake this HTML as an example:\n\n```html\n\u003c!doctype html\u003e\n\u003chtml\u003e\n\u003cbody\u003e\n  \u003csection id=\"content\"\u003e\n    \u003cp class=\"headline\"\u003eFloki\u003c/p\u003e\n    \u003cspan class=\"headline\"\u003eEnables search using CSS selectors\u003c/span\u003e\n    \u003ca href=\"https://github.com/philss/floki\"\u003eGithub page\u003c/a\u003e\n    \u003cspan data-model=\"user\"\u003ephilss\u003c/span\u003e\n  \u003c/section\u003e\n  \u003ca href=\"https://hex.pm/packages/floki\"\u003eHex package\u003c/a\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n```\n\nHere are some queries that you can perform (with return examples):\n\n```elixir\n{:ok, document} = Floki.parse_document(html)\n\nFloki.find(document, \"p.headline\")\n# =\u003e [{\"p\", [{\"class\", \"headline\"}], [\"Floki\"]}]\n\ndocument\n|\u003e Floki.find(\"p.headline\")\n|\u003e Floki.raw_html\n# =\u003e \u003cp class=\"headline\"\u003eFloki\u003c/p\u003e\n```\n\nEach HTML node is represented by a tuple like:\n\n    {tag_name, attributes, children_nodes}\n\nExample of node:\n\n    {\"p\", [{\"class\", \"headline\"}], [\"Floki\"]}\n\nSo even if the only child node is the element text, it is represented inside a list.\n\n## Installation\n\nAdd Floki to your `mix.exs`:\n\n```elixir\ndefp deps do\n  [\n    {:floki, \"~\u003e 0.37.0\"}\n  ]\nend\n```\n\nAfter that, run `mix deps.get`.\n\nIf you are running on [Livebook](https://livebook.dev) or a script, you can install with `Mix.install/2`:\n\n```elixir\nMix.install([\n  {:floki, \"~\u003e 0.37.0\"}\n])\n```\n\nYou can check the [changelog](CHANGELOG.md) for changes.\n\n## Dependencies\n\nFloki needs the `:leex` module in order to compile.\nNormally this module is installed with Erlang in a complete installation.\n\nIf you get this [\"module :leex is not available\"](https://github.com/philss/floki/issues/35) error message, you need to install the `erlang-dev` and `erlang-parsetools` packages in order get the `:leex` module. The packages names may be different depending on your OS.\n\n### Alternative HTML parsers\n\nBy default Floki uses a patched version of `mochiweb_html` for parsing fragments\ndue to its ease of installation (it's written in Erlang and has no outside dependencies).\n\nHowever one might want to use an alternative parser due to the following\nconcerns:\n\n- Performance - It can be [up to 20 times slower than the alternatives](https://hexdocs.pm/fast_html/readme.html#benchmarks) on big HTML\n  documents.\n- Correctness - in some cases `mochiweb_html` will produce different results\n  from what is specified in [HTML5 specification](https://html.spec.whatwg.org/).\n  For example, a correct parser would parse `\u003ctitle\u003e \u003cb\u003e bold \u003c/b\u003e text \u003c/title\u003e`\n  as `{\"title\", [], [\" \u003cb\u003e bold \u003c/b\u003e text \"]}` since content inside `\u003ctitle\u003e` is\n  to be [treated as plaintext](https://html.spec.whatwg.org/#the-title-element).\n  Albeit `mochiweb_html` would parse it as `{\"title\", [], [{\"b\", [], [\" bold \"]}, \" text \"]}`.\n\nFloki supports the following alternative parsers:\n\n- `fast_html` - A wrapper for [lexbor](https://github.com/lexbor/lexbor). A pure C HTML parser.\n- `html5ever` - A wrapper for [html5ever](https://github.com/servo/html5ever) written in Rust, developed as a part of the Servo project.\n\n`fast_html` is generally faster, according to the\n[benchmarks](https://hexdocs.pm/fast_html/readme.html#benchmarks) conducted by\nits developers.\n\nYou can perform a benchmark by running the following:\n\n```sh\nsh benchs/extract.sh\nmix run benchs/parse_document.exs\n```\n\nExtracting the files is needed only once.\n\n#### Using `html5ever` as the HTML parser\n\nThis dependency is written with a NIF using [Rustler](https://github.com/rusterlium/rustler), but\nyou don't need to install anything to compile it thanks to [RustlerPrecompiled](https://hexdocs.pm/rustler_precompiled/).\n\n```elixir\ndefp deps do\n  [\n    {:floki, \"~\u003e 0.37.0\"},\n    {:html5ever, \"~\u003e 0.15.0\"}\n  ]\nend\n```\n\nRun `mix deps.get` and compiles the project with `mix compile` to make sure it works.\n\nThen you need to configure your app to use `html5ever`:\n\n```elixir\n# in config/config.exs\n\nconfig :floki, :html_parser, Floki.HTMLParser.Html5ever\n```\n\nNotice that you can pass the HTML parser as an option in `parse_document/2` and `parse_fragment/2`.\n\n#### Using `fast_html` as the HTML parser\n\nA C compiler, GNU\\Make and CMake need to be installed on the system in order to\ncompile lexbor.\n\nFirst, add `fast_html` to your dependencies:\n\n```elixir\ndefp deps do\n  [\n    {:floki, \"~\u003e 0.37.0\"},\n    {:fast_html, \"~\u003e 2.0\"}\n  ]\nend\n```\n\nRun `mix deps.get` and compiles the project with `mix compile` to make sure it works.\n\nThen you need to configure your app to use `fast_html`:\n\n```elixir\n# in config/config.exs\n\nconfig :floki, :html_parser, Floki.HTMLParser.FastHtml\n```\n\n## More about Floki API\n\nTo parse a HTML document, try:\n\n```elixir\nhtml = \"\"\"\n  \u003chtml\u003e\n  \u003cbody\u003e\n    \u003cdiv class=\"example\"\u003e\u003c/div\u003e\n  \u003c/body\u003e\n  \u003c/html\u003e\n\"\"\"\n\n{:ok, document} = Floki.parse_document(html)\n# =\u003e {:ok, [{\"html\", [], [{\"body\", [], [{\"div\", [{\"class\", \"example\"}], []}]}]}]}\n```\n\nTo find elements with the class `example`, try:\n\n```elixir\nFloki.find(document, \".example\")\n# =\u003e [{\"div\", [{\"class\", \"example\"}], []}]\n```\n\nTo convert your node tree back to raw HTML (spaces are ignored):\n\n```elixir\ndocument\n|\u003e Floki.find(\".example\")\n|\u003e Floki.raw_html\n# =\u003e  \u003cdiv class=\"example\"\u003e\u003c/div\u003e\n```\n\nTo fetch some attribute from elements, try:\n\n```elixir\nFloki.attribute(document, \".example\", \"class\")\n# =\u003e [\"example\"]\n```\n\nYou can get attributes from elements that you already have:\n\n```elixir\ndocument\n|\u003e Floki.find(\".example\")\n|\u003e Floki.attribute(\"class\")\n# =\u003e [\"example\"]\n```\n\nIf you want to get the text from an element, try:\n\n```elixir\ndocument\n|\u003e Floki.find(\".headline\")\n|\u003e Floki.text\n\n# =\u003e \"Floki\"\n```\n\n## Supported selectors\n\nHere you find all the [CSS selectors](https://www.w3.org/TR/selectors/#selectors) supported in the current version:\n\n| Pattern         | Description                  |\n|-----------------|------------------------------|\n| *               | any element                  |\n| E               | an element of type `E`       |\n| E[foo]          | an `E` element with a \"foo\" attribute |\n| E[foo=\"bar\"]    | an E element whose \"foo\" attribute value is exactly equal to \"bar\" |\n| E[foo~=\"bar\"]   | an E element whose \"foo\" attribute value is a list of whitespace-separated values, one of which is exactly equal to \"bar\" |\n| E[foo^=\"bar\"]   | an E element whose \"foo\" attribute value begins exactly with the string \"bar\" |\n| E[foo$=\"bar\"]   | an E element whose \"foo\" attribute value ends exactly with the string \"bar\" |\n| E[foo*=\"bar\"]   | an E element whose \"foo\" attribute value contains the substring \"bar\" |\n| E[foo\\|=\"en\"]    | an E element whose \"foo\" attribute has a hyphen-separated list of values beginning (from the left) with \"en\" |\n| E:nth-child(n)  | an E element, the n-th child of its parent |\n| E:nth-last-child(n)  | an E element, the n-th child of its parent, counting from bottom to up |\n| E:first-child   | an E element, first child of its parent |\n| E:last-child   | an E element, last child of its parent |\n| E:nth-of-type(n)  | an E element, the n-th child of its type among its siblings |\n| E:nth-last-of-type(n)  | an E element, the n-th child of its type among its siblings, counting from bottom to up |\n| E:first-of-type   | an E element, first child of its type among its siblings |\n| E:last-of-type   | an E element, last child of its type among its siblings |\n| E:checked       | An E element (checkbox, radio, or option) that is checked |\n| E:disabled      | An E element (button, input, select, textarea, or option) that is disabled |\n| E.warning       | an E element whose class is \"warning\" |\n| E#myid          | an E element with ID equal to \"myid\" (for ids containing periods, use `#my\\\\.id` or `[id=\"my.id\"]`) |\n| E:not(s)        | an E element that does not match simple selector s |\n| :root           | the root node or nodes (in case of fragments) of the document. Most of the times this is the `html` tag |\n| E F             | an F element descendant of an E element |\n| E \u003e F           | an F element child of an E element |\n| E + F           | an F element immediately preceded by an E element |\n| E ~ F           | an F element preceded by an E element |\n\nThere are also some selectors based on non-standard specifications. They are:\n\n| Pattern               | Description                                                            |\n|-----------------------|------------------------------------------------------------------------|\n| E:fl-contains('foo')  | an E element that contains \"foo\" inside a text node                    |\n| E:fl-icontains('foo') | an E element that contains \"foo\" inside a text node (case insensitive) |\n\n## Suppressing log messages\n\nFloki may log debug messages related to problems in the parsing of selectors, or parsing of the HTML tree.\nIt also may log some \"info\" messages related to deprecated APIs. If you want to suppress these log messages,\nplease consider setting the `:compile_time_purge_matching` option for `:logger` in your compile time configuration.\n\nSee https://hexdocs.pm/logger/Logger.html#module-compile-configuration for details.\n\n## Special thanks\n\n* [@arasatasaygin](https://github.com/arasatasaygin) for Floki's logo from the [Open Logos project](http://openlogos.org/).\n\n## License\n\nCopyright (c) 2014 Philip Sampaio Silva\n\nFloki is under MIT license. Check the [LICENSE](https://github.com/philss/floki/blob/main/LICENSE) file for more details.\n","funding_links":["https://github.com/sponsors/philss"],"categories":["HTML","Elixir","Data Ingestion \u0026 ETL","Uncategorized","前端开发框架及项目","XML","Tools"],"sub_categories":["How to Join","Uncategorized","其他_文本生成、文本对话","Mesh networks"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilss%2Ffloki","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilss%2Ffloki","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilss%2Ffloki/lists"}