{"id":16679373,"url":"https://github.com/mischov/meeseeks_floki_bench","last_synced_at":"2025-04-09T22:02:03.504Z","repository":{"id":69248475,"uuid":"87670574","full_name":"mischov/meeseeks_floki_bench","owner":"mischov","description":"Meeseeks vs. Floki performance","archived":false,"fork":false,"pushed_at":"2020-04-29T19:45:17.000Z","size":71,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-09T22:01:40.620Z","etag":null,"topics":["benchmark","elixir","floki","meeseeks"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mischov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-08T23:34:09.000Z","updated_at":"2024-04-27T06:10:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"36e334d2-13ff-470e-927e-be54f15b07fe","html_url":"https://github.com/mischov/meeseeks_floki_bench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischov%2Fmeeseeks_floki_bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischov%2Fmeeseeks_floki_bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischov%2Fmeeseeks_floki_bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mischov%2Fmeeseeks_floki_bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mischov","download_url":"https://codeload.github.com/mischov/meeseeks_floki_bench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248119296,"owners_count":21050755,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","elixir","floki","meeseeks"],"created_at":"2024-10-12T13:34:57.656Z","updated_at":"2025-04-09T22:02:03.493Z","avatar_url":"https://github.com/mischov.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Meeseeks vs. Floki Performance\n\nA performance comparsion between the Elixir language HTML parsing libraries [Meeseeks](https://github.com/mischov/meeseeks) and [Floki](https://github.com/philss/floki).\n\nPerformance benchmarks should always be considered with some skepticism.\n\nBenchmarking is hard to do well, and often - intentionally or not - benchmarks may favor one implementation's strengths over another in a way that makes one look better but doesn't really help users.\n\nFor these benchmarks I have tried to focus on potential real-world-type scenarios that people might find helpful, but if performance matters consider benchmarking the two for your particular problem.\n\n### Config\n\nFloki is benchmarked using the `html5ever` parser.\n\nPerformance characteristics are different for the `mochiweb_html` parser, but I strongly recommend always using the `html5ever` parser unless you're sure malformed HTML won't be a problem.\n\n### Setup\n\nYour OS is (probably) constantly changing your processor speed (to save energy and reduce heat), which leads to inconsistent results when benchmarking.\n\nBefore running benchmarks, set processors to some fixed speed. For Debian instructions on how to do this, see [here](https://wiki.debian.org/HowTo/CpuFrequencyScaling).\n\nThanks to [this article](https://medium.com/learn-elixir/speed-up-data-access-in-elixir-842617030514) for pointing this out.\n\n## The \"Wiki Links\" Benchmark\n\nThe scenario tested by \"Wiki Links\" is simple: select every link from a particular Wikipedia article to other Wikipedia articles.\n\nThis scenario is intended to mimic a simple crawler that is looking on each page for more links to follow.\n\nThe test data used is 99Kb and parses to ~2,700 nodes.\n\nFor XPath, I test both a naive solution that is closely related to the CSS solution and a more optimized version that avoids an early filter.\n\n```\n$ MIX_ENV=prod mix compile\n$ MIX_ENV=prod mix run bench/wiki_links.exs\nOperating System: macOS\nCPU Information: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz\nNumber of Available Cores: 4\nAvailable memory: 8 GB\nElixir 1.10.1\nErlang 22.3.3\n\nBenchmark suite executing with the following configuration:\nwarmup: 3 s\ntime: 9 s\nmemory time: 3 s\nparallel: 1\ninputs: none specified\nEstimated total run time: 1 min\n\nBenchmarking Floki CSS...\nBenchmarking Meeseeks CSS...\nBenchmarking Meeseeks XPath naive...\nBenchmarking Meeseeks XPath optimized...\n\nName                               ips        average  deviation         median         99th %\nMeeseeks CSS                     85.04       11.76 ms     ±8.55%       11.46 ms       16.75 ms\nMeeseeks XPath optimized         78.01       12.82 ms     ±5.05%       12.63 ms       15.77 ms\nFloki CSS                        77.36       12.93 ms     ±8.32%       12.70 ms       16.45 ms\nMeeseeks XPath naive             61.23       16.33 ms     ±6.61%       16.03 ms       20.19 ms\n\nComparison: \nMeeseeks CSS                     85.04\nMeeseeks XPath optimized         78.01 - 1.09x slower +1.06 ms\nFloki CSS                        77.36 - 1.10x slower +1.17 ms\nMeeseeks XPath naive             61.23 - 1.39x slower +4.57 ms\n\nMemory usage statistics:\n\nName                        Memory usage\nMeeseeks CSS                     0.77 MB\nMeeseeks XPath optimized         1.09 MB - 1.41x memory usage +0.32 MB\nFloki CSS                        3.11 MB - 4.05x memory usage +2.35 MB\nMeeseeks XPath naive             2.15 MB - 2.80x memory usage +1.38 MB\n\n**All measurements for memory usage were the same**\n```\n\nIf you're going to be building a simple crawler where all you care about is searching a page for links, either Meeseeks or Floki will perform similarly (though Meeseeks will probably use less memory).\n\n[Implementation](https://github.com/mischov/meeseeks_floki_bench/blob/master/lib/meeseeks_floki_bench/wiki_links.ex)\n\n## The \"Trending JS\" Benchmark\n\n\"Trending JS\" represents a simple scenario where, overwhelmed by the churn in the JS ecosystem, you want a quick way to check what JS libraries are trending on Gibhub today, returning the name, total stars, and stars today for each.\n\nThis scenario mimics the use case of selecting a list of items from some HTML page and then extracting data from each of these items.\n\nThe test data used is 349Kb and parses to ~6,900 nodes.\n\n```\n$ MIX_ENV=prod mix compile\n$ MIX_ENV=prod mix run bench/trending_js.exs\nOperating System: macOS\nCPU Information: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz\nNumber of Available Cores: 4\nAvailable memory: 8 GB\nElixir 1.10.1\nErlang 22.3.3\n\nBenchmark suite executing with the following configuration:\nwarmup: 3 s\ntime: 9 s\nmemory time: 3 s\nparallel: 1\ninputs: none specified\nEstimated total run time: 45 s\n\nBenchmarking Floki CSS...\nBenchmarking Meeseeks CSS ...\nBenchmarking Meeseeks XPath...\n\nName                     ips        average  deviation         median         99th %\nMeeseeks CSS           23.22       43.07 ms     ±2.73%       42.79 ms       47.22 ms\nMeeseeks XPath         19.47       51.35 ms     ±4.03%       50.77 ms       60.82 ms\nFloki CSS              14.01       71.39 ms     ±3.85%       71.31 ms       83.36 ms\n\nComparison: \nMeeseeks CSS           23.22\nMeeseeks XPath         19.47 - 1.19x slower +8.28 ms\nFloki CSS              14.01 - 1.66x slower +28.32 ms\n\nMemory usage statistics:\n\nName              Memory usage\nMeeseeks CSS           3.66 MB\nMeeseeks XPath         6.57 MB - 1.80x memory usage +2.91 MB\nFloki CSS             22.23 MB - 6.08x memory usage +18.57 MB\n\n**All measurements for memory usage were the same**\n```\n\nMeeseeks avoids some converting between data formats that Floki does, so the Meeseeks implementations tend to come out ahead of Floki in this benchmark.\n\n[Implementation](https://github.com/mischov/meeseeks_floki_bench/blob/master/lib/meeseeks_floki_bench/trending_js.ex)\n\n## Further Benchmarks\n\nIf you have an idea for a useful, real-world inspired benchmark, please open an issue.\n\nContributions are welcome.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmischov%2Fmeeseeks_floki_bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmischov%2Fmeeseeks_floki_bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmischov%2Fmeeseeks_floki_bench/lists"}