{"id":37132045,"url":"https://github.com/vgratian/cosine","last_synced_at":"2026-01-14T15:22:32.489Z","repository":{"id":160365225,"uuid":"126174769","full_name":"vgratian/cosine","owner":"vgratian","description":"Measure performance in calculating cosine similarity: C, C++, Go, Python, Perl and Oberon2.","archived":false,"fork":false,"pushed_at":"2023-07-11T06:44:41.000Z","size":96,"stargazers_count":12,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-06-21T14:19:55.683Z","etag":null,"topics":["benchmark","cosine-similarity","cosine-similiarity","cosinesimilarity","linear-algebra","vectors"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vgratian.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-21T12:24:49.000Z","updated_at":"2024-05-13T02:28:42.000Z","dependencies_parsed_at":"2024-06-21T13:04:02.475Z","dependency_job_id":"e3e0dc98-a4f3-46ef-9398-f6f8bf9e0372","html_url":"https://github.com/vgratian/cosine","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/vgratian/cosine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgratian%2Fcosine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgratian%2Fcosine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgratian%2Fcosine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgratian%2Fcosine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vgratian","download_url":"https://codeload.github.com/vgratian/cosine/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vgratian%2Fcosine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28424291,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T13:30:50.153Z","status":"ssl_error","status_checked_at":"2026-01-14T13:29:08.907Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","cosine-similarity","cosine-similiarity","cosinesimilarity","linear-algebra","vectors"],"created_at":"2026-01-14T15:22:31.815Z","updated_at":"2026-01-14T15:22:32.484Z","avatar_url":"https://github.com/vgratian.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Cosine Benchmark v2\n\nThis framework compares computational performance of programming languages in calculating cosine similarity of random vectors. Current version includes [packages](#packages) in __C__, __C++__, __Go__, __Oberon2__, __Perl__ and a number of optimizations in __Python3__. \n\nRunning `benchmarker.sh` will create a benchmark on your own machine and plot the results (see [Usage](#usage), but check [Requirements](#requirements) first). An example, created on a 8GB/i5 machine:\n\n\u003ccenter\u003e\u003cimg src=example_plot.svg\u003e\u003c/center\u003e\n\nX-axis represents the vector size. For the y-axis, three metrics are used:\n\n- `total_cputime` (user+system) : CPU seconds spent by the package to fulfill the task, measured externally, but includes time spent to read vectors from files and float conversion.\n- `avg_walltime` (per calculation) : Human-experienced seconds spent on each calculation, measured by the package iteself, less reliable in reflecting actual resource usage.\n- `max_rss` (kilobytes) : max memory used by the package, measured externally. \n\nAs one can see, there is a considerable disparity between performance in all three metrics. \n\n\n## Cosine Similarity\nCosine similarity is a measure of similarity between two vectors. It is widely used in machine learning where documents, words or images are treated as vectors.\n\nThe similarity value is calculated by measuring the distance between two vectors and normalizing it by the length of the vectors:\n\u003ccenter\u003e\u003cimg src=\"cosine_similarity.svg\" width=\"50%\"\u003e\u003c/center\u003e\n\n# Requirements\nThe only requirement to run the Benchmarker is GCC (or other C compiler). Optionally [gnuplot](http://www.gnuplot.info/) is used for plotting the results.\n\nEach individual package in [lib/](lib) might have its own requirements (see under [Packages](#packages)). You don't need to meet all package requirements, you can run the benchmark only on selected packages.\n\n# Usage\n\nRun `benchmarker.sh` with 4 positional arguments, which are repsectively:\n- `min` : initial size of vectors\n- `max` : final size of vectors \n- `step` : increase size of vectors after each iteration\n- `repeat` : ask packages to repeat calculation each time (to increase statistical significance)\n\nUse `-s` and `-p` to save results as `.csv` files and draw plots reslectively. Use `--libs \u003clib1,lib2...\u003e` to run the benchmarker on a subset of packages. Run `./benchmarker.sh --help` for more details.\n\n### Examples\n```bash\n$ ./benchmarker.sh -sp 10000 30000 10000 100\n```\n\nWill run 3 iterations, with random vectors of size 10,000, 20,000 and 30,000. Each calculation will be repeated 100 times. Results will be saved and plotted.\n\n```bash\n$ ./benchmarker.sh -sp --libs c,go,py_numpy 10000 30000 10000 100\n```\n\nSame, but on the packages [c](lib/c), [go](lib/go) and [py_numpy](lib/py_numpy).\n\n\n# Packages\n\n| package               | description\t         | requirement         | where to get from      |\n|-----------------------|------------------------|---------------------|------------------------|\n| [c](lib/c)            | C                      | `gcc` or any other c compiler |              | \n| [c++](lib/cpp)        | C++                    | `g++` (C++ frontend of gcc)   |              |\n| [go](lib/go)          | Go                     | `go`   | [golang.org](https://golang.org/doc/install) |\n| [oberon_voc](lib/oberon_voc) | [Oberon-2](https://en.wikipedia.org/wiki/Oberon-2) | `voc` | [Vishap Oberon Compiler](https://github.com/vishaps/voc) |\n| [perl](lib/perl)      | vanilla Perl           | `perl`                    |                        | \n| [py](lib/py)          | vanilla Python         | `python3`           |                        |\n| [py_compr](lib/py_compr) | uses list comprehension |                 |                        |\n| [py_array](lib/py_array) | uses [python arrays](https://docs.python.org/3/library/array.html) | | |\n| [py_numpy](lib/py_compr) | uses NumPy | python3 lib `numpy`  | `pip3 install numpy` or [numpy.org](https://numpy.org/) |\n| [py_sklearn](lib/py_compr) | uses NumPy+Sklearn | python3 lib `skearn`  | `pip3 install sklearn` or [scikit-learn.org](https://scikit-learn.org/) |\n\n\n# Contributing\n\nYou are more than welcome to suggest improvements for the existing packages or add a new package in your own preferred language.\n\nA new package should be a subdirectory in [lib/](lib/). If your language is interpretted, then it should contain an executable file `main` (i.e. a script with a shebang). If it's compiled, then it should contain a Makefile that compiles a binary `main`.\n\n`main` should accept 4 CLI arguments, which are respectively:\n- repeat (int) : how many times to repeat the calculation\n- size (int) : size of the input vectors\n- filepath1 (string) : file with the first vector (line-seperated double-precision floats)\n- filepath1 (string) : file with the second vector \n\n`main` should calculate cosine similarity of the two vectors `repeat` times and write to stdout two values (seperated by space):\n- cosine similarity score (double-precision float)\n- average calculation time (double-precision float), this should be monotonic time (wall time)\n\nCompile your package if necessary and test it as follows:\n```\n$ ./util/randvect.py 100000 -10 10 \u003e v1\n$ ./util/randvect.py 100000 -10 10 \u003e v2\n$ ./lib/my_package/main 100000 100 v1 v2\n```\noutput should be something like this:\n```\n\u003e 0.00262265036644376 0.00015899505716224666\n```\n\n# Why you should not trust this benchmark\nThis project is meant for educational purposes. You should not use it to make a final decision about what language to use for your project (although it might help you to make an *educated* guess). Why?\n\n- I have a very superficial knowledge of some of the languages here, so the benchmark might not reflect their best performance\n- Running this benchmark on different machines will likely yield different results\n- You should always create a benchmark for your own specific task (and maybe hardware). Here's an example: for a job project (with heavy vector-calculations) I had to choose between Python arrays and Python with numpy. I knew numpy should be much faster, but it turned out that the overhead was more than the benefit, and in fact it made my project slower.\n\n# Notes on v1\n\nFirst version of this project included a number of flaws. For example, it used two statically generated vectors of 10s and -10s respectively (so the cosine similarity was always -1). This would poorly reflect the computational performance of the packages, it also did not relfect real-world applications of cosine similarity (which is almost always calculated between vectors of real numbers).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvgratian%2Fcosine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvgratian%2Fcosine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvgratian%2Fcosine/lists"}