{"id":19316590,"url":"https://github.com/tokenmill/beagle-performance-benchmarks","last_synced_at":"2026-05-17T19:03:09.165Z","repository":{"id":138584131,"uuid":"207819218","full_name":"tokenmill/beagle-performance-benchmarks","owner":"tokenmill","description":"Performance benchmarks for the Beagle library, and comparisons with other stored-query solutions.","archived":false,"fork":false,"pushed_at":"2019-10-01T08:41:48.000Z","size":23821,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-06T04:12:32.791Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tokenmill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-11T13:27:31.000Z","updated_at":"2019-10-01T08:41:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"f816bb95-4fc0-46f5-a063-a0d895504ce2","html_url":"https://github.com/tokenmill/beagle-performance-benchmarks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fbeagle-performance-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fbeagle-performance-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fbeagle-performance-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tokenmill%2Fbeagle-performance-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tokenmill","download_url":"https://codeload.github.com/tokenmill/beagle-performance-benchmarks/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240420938,"owners_count":19798501,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T01:11:57.593Z","updated_at":"2026-05-17T19:03:09.067Z","avatar_url":"https://github.com/tokenmill.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ca href=\"http://www.tokenmill.lt\"\u003e\n      \u003cimg src=\".github/tokenmill-logo.svg\" width=\"125\" height=\"125\" align=\"right\" /\u003e\n\u003c/a\u003e\n\n# beagle-performance-benchmarks\n\nPerformance benchmarks for Beagle library and a comparison with other stored query solutions.\n\n## Available Benchmarks for Phrase Matching\n\n3 are available:\n- `make run-beagle-bench`: in docker-compose runs Beagle benchmark\n- `make run-es-bench`: in docker compose runs Elasticsearch Percolator benchmark\n- `run-fake-percolator-bench`: in docker-compose runs Beagle benchmark when beagle is deployed as an HTTP server\nthat simulates Elasticsearch Percolator API. \n\n## Phrases Benchmarking\n\nGiven a list of phrases highlight those phrases in the given list of documents.\n\n```bash\nclj -m bench.phrases\n```\n\nTo see the available options run\n\n```bash\nclojure -m bench.phrases -h\n```\n\nIt outputs something like this:\n```\n  -d, --dictionary-file DICTIONARY     resources/top-10000.csv  Path to the dictionary file\n  -o, --output OUTPUT                  vals-1569915647794.json  Path to the output file\n  -t, --texts-file TEXTS_CSV_FILE                               Path to the CSV file with texts\n  -s, --dictionary-step STEP           5000                     Step size for increase in dictionary\n  -p, --parallel PARALLEL              true                     Should the benchmark be run in parallel\n  -c, --concurrency CONCURRENCY        16                       Number of concurrent executions.\n  -k, --key KEY                        :content                 CSV header key to select\n  -i, --implementation IMPLEMENTATION  :beagle                  Highlighter implementation\n  -w, --warm-up WARM-UP                true                     Should the warm-up be run\n      --es-host ES_HOST                http://127.0.0.1:9200    Elasticsearch hostname\n      --slop SLOP                      0                        Phrase slop for dictionary entries\n      --case-sensitive CASE_SENSITIVE  true                     Should matching be case sensitive\n      --ascii-fold ASCII_FOLD          false                    Should matching be ascii folded\n      --stem STEM                      false                    Should matching be stemmed\n      --stemmer STEMMER                :english                 which stemmer should be used\n  -h, --help\n\n```\n\nThe results of the benchmark are written to a file specified with an `-o` option. By default, output is written to\nthe current dir in a file `(str \"vals-\" (System/currentTimeMillis) \".json\")`.\n\n## Preview benchmark results\n\n```bash\nclojure -m bench.view BENCHMARK_OUTPUT_FILE \n```\n\n## Download news dataset from\n\nWe run the benchmark on a news dataset downloaded from [Kaggle](https://www.kaggle.com/snapcrack/all-the-news/downloads/all-the-news.zip/4).\n\n## Phrase Percolation Benchmarks\n\n### Sequential Beagle\n\nBeagle highlights phrases in a text document in as low as 1.3 ms with dictionary size of 5000 phrases.\n\n![alt text](resources/average-per-doc.png)\n\nMax time spent (see red line) in highlighting text grows approximately linearly with the size of dictionary. \n\nMinimum time spent in in highlighting text is as low as 0.4 ms irregardless of the dictionary size.\n\n![alt text](resources/min-max-per-doc.png)\n\n### Beagle Concurrent\n\nBeagle with concurrency=16 \n\n![alt text](resources/beagle-c16.png)\n\n### Elasticsearch Percolator\n\nElasticsearch Percolator with concurrency=16 \n\n![alt text](resources/percolator-c16.png)\n\n### Beagle Faking the Elasticsearch Percolator \n\nBeagle implementing the API of Elasticsearch Percolator with concurrency=16\n\n![alt text](resources/beagle-percolator-c16.png)\n\nDuring the benchmarks Elasticsearch Percolator was ~2x slower than Beagle deployed in an HTTP server \nwith the nearly identical API. \n\n## License\n\nCopyright \u0026copy; 2019 [TokenMill UAB](http://www.tokenmill.lt).\n\nDistributed under the The Apache License, Version 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftokenmill%2Fbeagle-performance-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftokenmill%2Fbeagle-performance-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftokenmill%2Fbeagle-performance-benchmarks/lists"}