{"id":16844006,"url":"https://github.com/tobgu/qbench","last_synced_at":"2025-04-11T05:53:29.690Z","repository":{"id":66081763,"uuid":"129965182","full_name":"tobgu/qbench","owner":"tobgu","description":"Benchmark of qframe and other dataframes","archived":false,"fork":false,"pushed_at":"2020-07-10T20:20:01.000Z","size":7455,"stargazers_count":9,"open_issues_count":2,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-11T05:53:23.091Z","etag":null,"topics":["benchmark","data-frame","dataframe","go","golang","pandas"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tobgu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-17T21:07:34.000Z","updated_at":"2023-05-20T04:33:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"01d8757e-a7b4-419b-833e-d7786c070357","html_url":"https://github.com/tobgu/qbench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobgu%2Fqbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobgu%2Fqbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobgu%2Fqbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobgu%2Fqbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tobgu","download_url":"https://codeload.github.com/tobgu/qbench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248351405,"owners_count":21089271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","data-frame","dataframe","go","golang","pandas"],"created_at":"2024-10-13T12:54:24.078Z","updated_at":"2025-04-11T05:53:29.673Z","avatar_url":"https://github.com/tobgu.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"This repository contains a number of basic benchmarks that compare\nthe performance of [QFrame](https://github.com/tobgu/qframe) with\n[Pandas](https://pandas.pydata.org) and [Gota](https://github.com/kniren/gota) (where applicable).\n\nAs always with benchmarks: Take these results with a grain of salt and\nbenchmark your own use cases to get a proper feel for the performance.\n\n## About\nThe benchmarks in this repository have mostly been constructed with\nthe use case of [qocache](https://github.com/tobgu/qocache) in mind.\nAll optimizations done on QFrame so far have also targeted these use\ncases.\n\nIf you have ideas for new benchmarks or improvements to the existing\nones please don't hesitate to open an issue.\n\nAll benchmarks use the [Brewer's Friend Beer Recipes](https://www.kaggle.com/jtrofe/beer-recipes)\ndataset. It contains ~75000 lines and is also checked in to this repository.\n\nMost of the benchmark operations are nonsensical from a beer brewers\nperspective.\n\n## Environment\nThe results presented here are executed in a VirtualBox with two cores assigned\nto it and 4 Gb of base memory. The host OS is Windows 10 and the guest OS\nUbuntu Linux. Running the benchmarks in a VM like this may have some negative\neffects on repeatability and consistency. For most of the results\nhere it does not matter too much though. These benchmarks are pretty\nhigh level and the difference in performance is so great between the\ndifferent implementations that +- a couple of percent does should not\nmatter for the interpretation of them.\n\nIt would be really interesting if someone would like to run this on\na modern bare metal machine for comparison!\n\nProcessor:\n```\n$ cat /proc/cpuinfo\n...\nmodel name\t: Intel(R) Core(TM) i7-3517U CPU @ 1.90GHz\ncache size\t: 4096 KB\n...\n```\n\nOS:\n```\n$ lsb_release -a\nNo LSB modules are available.\nDistributor ID:\tUbuntu\nDescription:\tUbuntu 14.04 LTS\nRelease:\t14.04\nCodename:\ttrusty\n```\n\nGo:\n```\n$ go version\ngo version go1.10 linux/amd64\n```\n\nPandas:\n```\n\u003e\u003e\u003e pd.show_versions()\nINSTALLED VERSIONS\n------------------\ncommit: None\npython: 3.6.3.final.0\npython-bits: 64\nOS: Linux\nOS-release: 3.13.0-83-generic\nmachine: x86_64\nprocessor: x86_64\nbyteorder: little\nLC_ALL: None\nLANG: en_US.UTF-8\nLOCALE: en_US.UTF-8\n\npandas: 0.22.0\npytest: 3.5.0\npip: 9.0.3\nsetuptools: 39.0.1\nCython: None\nnumpy: 1.14.2\nscipy: None\npyarrow: None\nxarray: None\nIPython: None\nsphinx: None\npatsy: None\ndateutil: 2.7.2\npytz: 2018.4\nblosc: None\nbottleneck: None\ntables: None\nnumexpr: None\nfeather: None\nmatplotlib: None\nopenpyxl: None\nxlrd: None\nxlwt: None\nxlsxwriter: None\nlxml: None\nbs4: None\nhtml5lib: None\nsqlalchemy: None\npymysql: None\npsycopg2: None\njinja2: None\ns3fs: None\nfastparquet: None\npandas_gbq: None\npandas_datareader: None\n```\n\n## Results\n### QFrame\n```\nBenchmarkQFrame_ReadCsv-2            \t                                    5\t 207966949 ns/op\t164317979 B/op\t    1501 allocs/op\nBenchmarkQFrame_WriteJsonRecords-2   \t                                    5\t 230018909 ns/op\t69792521 B/op\t      74 allocs/op\nBenchmarkQFrame_Sort/UserId_-_Int-2  \t                                  100\t  10306055 ns/op\t  303152 B/op\t       3 allocs/op\nBenchmarkQFrame_Sort/Name_-__string-2         \t                           20\t  86066995 ns/op\t  303184 B/op\t       3 allocs/op\nBenchmarkQFrame_Sort/Multi_column-2           \t                           10\t 156379617 ns/op\t  303344 B/op\t       5 allocs/op\nBenchmarkQFrame_Filter/Float_gt-2             \t                         2000\t    925995 ns/op\t  196608 B/op\t       2 allocs/op\nBenchmarkQFrame_Filter/Float_custom_gt-2    \t                         2000\t   1130335 ns/op\t  196608 B/op\t       2 allocs/op\nBenchmarkQFrame_Filter/Combine_or-2           \t                         1000\t   1352508 ns/op\t  245936 B/op\t       4 allocs/op\nBenchmarkQFrame_Filter/Combine_and-2          \t                         1000\t   1197479 ns/op\t  256800 B/op\t       6 allocs/op\nBenchmarkQFrame_Filter/String_eq-2            \t                         2000\t   1076656 ns/op\t   85376 B/op\t       2 allocs/op\nBenchmarkQFrame_Filter/String_like_case_sensitive-2         \t          500\t   3777651 ns/op\t  122896 B/op\t       3 allocs/op\nBenchmarkQFrame_Filter/String_like_case_insensitive-2       \t          100\t  14342672 ns/op\t  131704 B/op\t      16 allocs/op\nBenchmarkQFrame_Filter/String_regex_case_sensitive-2        \t           20\t  67469026 ns/op\t  164712 B/op\t      58 allocs/op\nBenchmarkQFrame_Filter/String_regex_case_insensitive-2      \t           20\t  88342431 ns/op\t  172972 B/op\t      61 allocs/op\nBenchmarkQFrame_Filter/Integer_in-2                         \t          500\t   3179964 ns/op\t  304811 B/op\t      10 allocs/op\nBenchmarkQFrame_Eval/Float_abs-2         \t                             2000\t    637177 ns/op\t  612885 B/op\t      41 allocs/op\nBenchmarkQFrame_Eval/Add_columns-2       \t                             2000\t    780004 ns/op\t  612833 B/op\t      40 allocs/op\n\n// These are currently 10% - 20% slower than the Pandas equivalents\nBenchmarkQFrame_Aggregate/Single_col_string_single_float_mean-2    \t      100\t  12083567 ns/op\t 2475952 B/op\t    1396 allocs/op\nBenchmarkQFrame_Aggregate/Single_col_integer_single_float_mean-2   \t      200\t   6040112 ns/op\t 2535856 B/op\t    1389 allocs/op\nBenchmarkQFrame_Aggregate/Double_col_string_single_float_mean-2    \t       30\t  34800359 ns/op\t15473961 B/op\t   48456 allocs/op\nBenchmarkQFrame_Aggregate/Single_col_string_double_float_mean-2    \t      100\t  12443931 ns/op\t 2627584 B/op\t    1405 allocs/op\n```\n\n### Gota\n```\nBenchmarkGota_ReadCSV-2                                            \t       2\t 758721612 ns/op\t228591928 B/op\t 3686954 allocs/op\nBenchmarkGota_WriteJsonRecords-2                                   \t       1\t2771840823 ns/op\t482439320 B/op\t 5828275 allocs/op\nBenchmarkGota_Sort/UserId_-_Int-2                                  \t      30\t  53656268 ns/op\t42841668 B/op\t     131 allocs/op\nBenchmarkGota_Sort/Name_-__string-2                                \t      10\t 152335582 ns/op\t48951630 B/op\t     113 allocs/op\nBenchmarkGota_Sort/Multi_column-2                                  \t       5\t 285486561 ns/op\t78037472 B/op\t     241 allocs/op\nBenchmarkGota_Filter/Float_gt-2                                    \t      50\t  32328655 ns/op\t38116730 B/op\t     609 allocs/op\nBenchmarkGota_Filter/Combine_or-2                                  \t      20\t  51720372 ns/op\t60522417 B/op\t     663 allocs/op\nBenchmarkGota_Filter/Combine_and-2                                 \t      30\t  42981669 ns/op\t48312308 B/op\t    1103 allocs/op\nBenchmarkGota_Filter/String_eq-2                                   \t     200\t   9020487 ns/op\t 1087112 B/op\t     310 allocs/op\nBenchmarkGota_Filter/Integer_in-2                                  \t      10\t 189430304 ns/op\t77769508 B/op\t     779 allocs/op\n```\n\n### Pandas\n```\nName (time in ms)                                                         Mean              Median            StdDev        Rounds\n----------------------------------------------------------------------------------------------------------------------------------\ntest_aggregation[double col string single float mean-\u003clambda\u003e-39065]     30.4766 (5.90)     30.0730 (6.37)    1.6894 (1.62)     31\ntest_aggregation[single col int single float mean-\u003clambda\u003e-176]           5.1619 (1.0)       4.7210 (1.0)     3.3494 (3.22)    129\ntest_aggregation[single col string double float mean-\u003clambda\u003e-175]       11.0780 (2.15)     10.8296 (2.29)    1.0403 (1.0)      71\ntest_aggregation[single col string single float mean-\u003clambda\u003e-175]       10.8325 (2.10)     10.5123 (2.23)    1.2459 (1.20)     73\ntest_filter[combine and-\u003clambda\u003e-7280]                                    4.6996 (1.0)       4.5539 (1.0)     0.4474 (1.0)     192\ntest_filter[combine or-\u003clambda\u003e-39818]                                   10.2313 (2.18)     10.0739 (2.21)    0.6902 (1.54)     89\ntest_filter[contains case insensitive-\u003clambda\u003e-11912]                   100.4382 (21.37)    99.3378 (21.81)   3.4898 (7.80)     11\ntest_filter[contains case sensitive-\u003clambda\u003e-9118]                       59.0800 (12.57)    58.2704 (12.80)   2.0572 (4.60)     18\ntest_filter[integer in-\u003clambda\u003e-53514]                                   11.6831 (2.49)     11.4531 (2.51)    0.9151 (2.05)     85\ntest_filter[regex case insensitive-\u003clambda\u003e-11912]                      304.9066 (64.88)   306.2898 (67.26)   2.9705 (6.64)      5\ntest_filter[regex case sensitive-\u003clambda\u003e-9118]                         126.5499 (26.93)   125.9932 (27.67)   3.2897 (7.35)      8\ntest_filter[single float-\u003clambda\u003e-26823]                                  7.3997 (1.57)      7.1886 (1.58)    0.8638 (1.93)    113\ntest_filter[string eq-\u003clambda\u003e-830]                                      10.1308 (2.16)      9.9458 (2.18)    1.0313 (2.31)     92\ntest_read_csv                                                           416.8440 (81.83)   416.2372 (84.13)   2.7442 (5.98)      5\ntest_sort[columns0]                                                      22.8061 (4.48)     22.5379 (4.56)    1.0598 (2.31)     42\ntest_sort[columns1]                                                     142.2413 (27.92)   143.0750 (28.92)   2.4375 (5.31)      8\ntest_sort[columns2]                                                     184.1537 (36.15)   183.4041 (37.07)   5.8602 (12.77)     6\ntest_write_json_records                                                 317.4334 (62.31)   318.0416 (64.28)   4.9470 (10.78)     5\ntest_eval[float abs-destCol = abs(BoilSize)]                             14.4224 (1.0)      14.1708 (1.0)     2.0511 (1.45)     29\ntest_eval[float add-destCol = OG + FG]                                   15.8143 (1.10)     15.8394 (1.12)    1.4177 (1.0)      41\n----------------------------------------------------------------------------------------------------------------------------------\n```\n\n### Summary\nOverall QFrame performs well ahead of Pandas in most benchmarks. The\nonly benchmarks that it's currently slower at are those containing\nsome sort of grouping. Here Pandas beats QFrame by 10 - 20% in runtime.\n\nCompared to Gota QFrame is much faster in all benchmarks.\n\n## Install Python benchmarks\n```\nvirtualenv -p python3.6 pvenv\n./pvenv/bin/activate\npip install -r requirements.txt\n```\n\n## Run Python benchmarks\n```\nmake pybench\n```\n\n## Run Go benchmarks\nGo dep is used for dependency management so it needs to be installed.\n\n```\ndep ensure\nmake gobench\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftobgu%2Fqbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftobgu%2Fqbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftobgu%2Fqbench/lists"}