{"id":16762416,"url":"https://github.com/tlack/semantics","last_synced_at":"2025-04-10T18:12:35.267Z","repository":{"id":145865954,"uuid":"397036262","full_name":"tlack/semantics","owner":"tlack","description":"Semantic similarity via text embeddings in Elixir - powered by SentenceTransformers by SBert.net","archived":false,"fork":false,"pushed_at":"2021-08-24T19:02:09.000Z","size":27,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-24T15:52:22.955Z","etag":null,"topics":["elixir","natural-language-processing","natural-language-understanding","sentence-transformers","text-embeddings"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tlack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-17T01:00:00.000Z","updated_at":"2025-03-19T12:30:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"eadbcc7f-708d-49c4-a92d-9c6c03884f0f","html_url":"https://github.com/tlack/semantics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlack%2Fsemantics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlack%2Fsemantics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlack%2Fsemantics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlack%2Fsemantics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tlack","download_url":"https://codeload.github.com/tlack/semantics/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248269289,"owners_count":21075773,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elixir","natural-language-processing","natural-language-understanding","sentence-transformers","text-embeddings"],"created_at":"2024-10-13T04:44:42.367Z","updated_at":"2025-04-10T18:12:35.261Z","avatar_url":"https://github.com/tlack.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Semantics\n\nSemantic similarity in Elixir using text embeddings from the excellent Python library [SentenceTransformers by SBert](https://www.sbert.net/index.html#).\n\nThis is a very simple library that provides an `erlport`-based wrapper to SentenceTransformers, and a cosine similarity helper from [Similarity](https://github.com/preciz/similarity).\n\n## Example\n\n```\niex(1)\u003e import Semantics\nSemantics\niex(2)\u003e start_link(\"paraphrase-MiniLM-L6-v2\")   # See SentenceTransformer docs for full list\npython start args: [\n  env: [{'VIRTUAL_ENV', '/home/lookpop/semantics/priv/python/semantics-venv'}],\n  python: '/home/lookpop/semantics/priv/python/semantics-venv/bin/python3',\n  python_path: '/home/lookpop/semantics/priv/python'\n]\nSEMANTICS: loading model paraphrase-MiniLM-L6-v2\n{:ok, #PID\u003c0.206.0\u003e}\niex(3)\u003e embedding(\"I like cats\")\n[0.23906055092811584, -1.1417245864868164, 0.13355520367622375,\n 0.13051727414131165, -0.6010502576828003, 0.20810797810554504,\n 0.9089261293411255, -0.001883262419141829, -0.044903531670570374,\n 0.2549824118614197, -0.5482040047645569, -0.7193037867546082,\n 0.12138155847787857, 0.24462690949440002, 0.3153916895389557,\n 0.13613221049308777, 0.7277143597602844, -0.13291320204734802,\n -0.06399975717067719, -0.28735366463661194, -0.7334134578704834,\n -0.35985904932022095, -0.1697186678647995, 0.3418505787849426,\n -0.8475354313850403, -0.1252552568912506, -0.32450196146965027,\n 0.2670220136642456, -0.28907573223114014, -0.2645415961742401,\n 0.05238057300448418, -0.29865625500679016, 0.05948035791516304,\n -0.7136659026145935, -0.3152972161769867, -0.11816924810409546,\n 0.02663307823240757, -0.20642021298408508, -0.45193952322006226,\n -0.15293395519256592, -0.2800045609474182, -0.2381720095872879,\n 0.49682706594467163, -0.07594038546085358, 0.24341261386871338,\n -0.5986779928207397, 0.011733309365808964, -0.5240899324417114,\n 0.7714636921882629, 0.7268072366714478, ...]\niex(4)\u003e similarity(embedding(\"I like cats\"), embedding(\"I like kittens\"))\n0.907135858166963\niex(5)\u003e similarity(embedding(\"I like cats\"), embedding(\"I like dogs\"))\n0.6468114092540255\niex(6)\u003e similarity(embedding(\"I like cats\"), embedding(\"I like fiduciary responsibility\"))\n0.20907087175155692\niex(7)\u003e similarity(embedding(\"I want to go horseback riding\"), embedding(\"I want to do equestrian stuff\"))\n0.7922539232253905\niex(8)\u003e similarity(embedding(\"I want to go horseback riding\"), embedding(\"Riding crops and saddles\"))\n0.5468014070228772\niex(9)\u003e similarity(embedding(\"I want to go horseback riding\"), embedding(\"Spaceship parts and electric cars\"))\n0.033702825222684064\n```\n\n## Usage notes\n\nYou can do `start_link()` without an argument to use the default model, `paraphrase-MiniLM-L6-v2`.\n\nRefer to [erlport's extensive documentation](http://erlport.org/docs/) for some of the finer points\nof wiring BEAM into Python. \n\nIf you want to see the Python side of things, see `priv/python/app.py`.\n\nSee \"Warning\", below.\n\n## Available models\n\nQuite a few. See [SentenceTransformers pretrained models list](https://www.sbert.net/docs/pretrained_models.html).\n\n## Warning\n\nImportant: The first time Semantics starts, it will try to setup a venv for use in its own deps/semantics/priv/python folder.\nThe requisite Python libraries are almost 1GB. Do not be alarmed by long start times during first initialization.\n\nIf you want to take these steps by hand, the Elixir code will skip autoinstallation. Here's how:\n\n```\nmy_app$ cd deps/semantics\nmy_app/deps/semantics$ cd priv/python\nmy_app/deps/semantics/priv/python$ python3 -m venv semantics-venv\nmy_app/deps/semantics/priv/python$ source semantics-venv/bin/activate\n(semantics-venv) my_app/deps/semantics/priv/python$ python3 -m pip install -r requirements.txt\n```\n\n## Installation\n\n```elixir\ndef deps do\n  [\n    {:semantics, git: \"https://github.com/tlack/semantics\"}\n  ]\nend\n```\n\n## Evaluating models\n\nThere is a Python-level script available to use to evaluate different models against your task.\n\nThe evaluator accepts named groups of tests. Each test consists of two pairs of texts - one \nthat should evaluate closely together in the embedding space, and the other pair that should be further apart.\n\nThe evaluator will try all the models you've configured against your tests and report results.\n\nSee `priv/python/evaluate.py` and its corresponding files \n\n## Fine tuning by retraining models\n\nWhat do you do if the models don't work?\n\nThere is some code in `priv/python/retrain.py` that shows how you can use SentenceTransformer's\nretraining system. It requires labeled pairs of texts, where the label is a similarity score.\n\n# Credits and Contact\n\nNeed help? Want to discuss NLP in Elixir? lackner@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlack%2Fsemantics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftlack%2Fsemantics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlack%2Fsemantics/lists"}