{"id":16762447,"url":"https://github.com/tlack/hairytext","last_synced_at":"2025-03-21T23:32:49.835Z","repository":{"id":44253928,"uuid":"260589344","full_name":"tlack/hairytext","owner":"tlack","description":"A data labeling and NLP tool for Elixir (uses Spacy)","archived":false,"fork":false,"pushed_at":"2023-03-04T15:42:07.000Z","size":758,"stargazers_count":25,"open_issues_count":6,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-14T04:44:45.912Z","etag":null,"topics":["elixir","entity-recognition","nlp","nlp-machine-learning","phoenix-live-view","spacy","text-classification"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tlack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-02T01:03:04.000Z","updated_at":"2024-08-05T00:28:10.000Z","dependencies_parsed_at":"2023-02-05T08:46:13.347Z","dependency_job_id":null,"html_url":"https://github.com/tlack/hairytext","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlack%2Fhairytext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlack%2Fhairytext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlack%2Fhairytext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tlack%2Fhairytext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tlack","download_url":"https://codeload.github.com/tlack/hairytext/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221820595,"owners_count":16886222,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elixir","entity-recognition","nlp","nlp-machine-learning","phoenix-live-view","spacy","text-classification"],"created_at":"2024-10-13T04:44:46.712Z","updated_at":"2024-10-28T11:14:15.111Z","avatar_url":"https://github.com/tlack.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hairy Text\n\n![HairyText diagram](https://i.imgur.com/bKR3zlf.png)\n\nHairy Text is a tool for natural language processing. \n\nWith Hairy Text, you can perform named entity recognition (NER) tasks using the\nworld-class [Spacy](https://spacy.io) library, and label data for training to\nimprove your model. All this from a normal looking and mobile-friendly web\napplication.\n\nIt is written in Elixir + Phoenix LiveView and Python, doesn't require a\ndatabase (totally self contained), and runs fine on a $5/mo server without a GPU.\n\n# Screenshot\n\n## List of examples for labeling\n![HairyText Examples screenshot](https://i.imgur.com/2dvaxjx.png)\n\n## Labeling interface in modal window\n![HairyText Labeler screenshot](https://i.imgur.com/tWDeB6H.png)\n\n## Testing unseen input (and adding to labeling queue)\n![HairyText TEST screenshot](https://i.imgur.com/uXdzYx9.png)\n\n# Features\n\n* **Built with the awesome Spacy NLP framework** (so I probably didn't mess it up!)\n* Easily label text fragments for machine learning / NLP experiments\n* Interactive \"test\" console lets you quickly debug your model\n* Refreshless but highly dynamic Phoenix Liveview web-based user interface (like React, without it)\n* User logins with HTTP AUTH password check\n* Export a .ZIP of your labeled examples and prediction history (both convenient .JSON files)\n* REST JSON API for making predictions (to tie it into the rest of your project)\n* API predictions stored in log and reviewable in app\n* Label examples or images with a category/class\n* Label text inside examples as entities - for instance \"time reference\" or \"place name\"\n* Filter by entity tags or labels\n* Train online via web interface and report live training progress (rough..)\n* Support for multiple projects (some bugs)\n\n# Future\n\n* Make into embeddable component like LiveDashboard\n* Support for \"one at a time\" editing that's more about a workflow of doing one labeling task after another\n* Object detection (aka classify regions inside images)\n* Assist in generating low-confidence predictions to more quickly improve model\n* Each project should have its own DETS files\n* APIs to: label examples new and old, bulk predict, view predictions log\n* Learn from embeddings (BERT, I'm looking at you)\n* uPlot training graphs\n\n# Bugs\n\n* Many Elixir warnings\n* When creating a new example from the Predictions or Test screen, clicking on\nthe example text to label it will cause it to reset. This is really annoying.\nUse a two-step editing process for now.\n* Projects support broken in some ways (training and testing)\n\n# Notes\n\nHairy Text uses a custom DETS-based storage system shim for training examples, logs, etc.\nthat integrates with Elixir's built in Ecto database framework (but only for\ntrivial parts of its functionality).\n\nThe connection to the Python-based NLP backend uses ErlPort\n\n# Motivation\n\nI wanted [Prodigy](https://prodi.gy/) but can't afford such bourgeoisie things.\n\n# Requirements\n\n* Elixir 1.10+\n* Phoenix 1.5 + LiveView \n* Python 3.6+\n* Spacy NLP toolkit for Python\n\n# Installation\n\nFirst, grab the code:\n\n```\n$ git clone https://github.com/tlack/hairytext/\n```\n\nThen, install Spacy for Python. You'll need a recent version of Python. Consider using virtualenv with Hairytext. FYI, The Python code is in priv/python\n\n```\n$ pip install spacy\n```\n\nNext, configure the default username/password:\n```\n$ vim config/config.exs\n```\n\nFinally, install Elixir dependencies and start server:\n\n```\n$ npm install --prefix assets\n$ mix deps.get\n$ iex -S mix phx.server\n```\n\nThen open http://localhost:4141 to start playing. The default username and password is `admin`:`sohairy`\n\n# API\n\n```\n$ curl 'http://localhost:4141/api/predict/9d00fa70-df5c-4a3a-9f0d-8c53f3345417?text=i+am+live+on+twitch' | json_pp\n{\n\t\"text\" : \"i am live on twitch\",\n\t\t\"label\" : \"good\",\n\t\t\"label_confidence\" : 0.999650120735168,\n\t\t\"entities\" : {\n\t\t\t\"service\" : \"twitch\"\n\t\t}\n}\n```\n\nAdd a new example to an image classification project:\n```\n$ curl https://example.com/test/someimage.jpg -o test.jpg\n$ curl -X POST -F \"image=@test.jpg\" \"http://localhost:4141/api/example/16700ec8-dab3-4d53-bcee-9b5e2ea52d3d\"\n```\n\nAdd a new example image to a project using its URL:\n```\n$ curl -X POST -F \"image=http://example.com/images/1.jpg\" \\\n\t\t\"http://localhost:4141/api/example/16700ec8-dab3-4d53-bcee-9b5e2ea52d3d\" \n```\n\nAdd a new example, with a known label, to a project:\n```\n$ curl -X POST -F \"image=http://example.com/images/1.jpg\" \\\n\t\t-F \"label=yellow\" \\\n\t\t\"http://localhost:4141/api/example/16700ec8-dab3-4d53-bcee-9b5e2ea52d3d\" \n```\n\n# Use from iex shell\n\nMake a prediction for some new text. This returns the raw Spacy result format.\n\n```\niex(48)\u003e Spacy.predict(\"i want to go live on whuwhuwhaaaaat at 7am\")\n\t%{\n\t\t'classification' =\u003e %{\n\t\t\t[] =\u003e 0.9619296193122864,\n\t\t\t'bad' =\u003e 0.03623370826244354,\n\t\t\t'good' =\u003e 0.9784072041511536\n\t\t},\n\t\t'entities' =\u003e [[\"service\", \"whuwhuwhaaaaat\"], [\"when\", \"7am\"]],\n\t\t'text' =\u003e \"i want to go live on whuwhuwhaaaaat at 7am\"\n\t}\n```\n\nSee the raw data about an example in the system:\n\n```\niex(418)\u003e HT.Data.list_examples |\u003e List.last |\u003e Map.get(:id) |\u003e HT.Data.get_example!\n%HT.Data.Example{\n__meta__: #Ecto.Schema.Metadata\u003c:built, \"examples\"\u003e,\n\tentities: %{},\n\tid: \"e94e1954-0548-4f51-9570-e63cd298d2d7\",\n\timage: nil,\n\tinserted_at: ~U[2020-04-30 04:05:34.965387Z],\n\tlabel: \"bad\",\n\tproject: \"9d00fa70-df5c-4a3a-9f0d-8c53f3345417\",\n\tsource: nil,\n\tstatus: nil,\n\ttext: \"i hate when people start live streaming. twitch sucks.\",\n\tupdated_at: ~U[2020-05-02 06:42:11.799197Z]\n}\n```\n\nThere is a handy utility feature to use when you have a bulk of images to label.\n\nFirst, copy images from your examples directory into the HairyText\n`image_examples/` subdirectory for your project. Have your HairyText project ID\nat hand for this process (you can find it editing project settings).\n\n```\n$ find /tmp/my-new-examples/ -type f -name \\*png | shuf | head -250 \u003e example-list.txt\n$ cp `cat example-list.txt` ~/hairy-text-path/image_examples/16700ec8-dab3-4d53-bcee-9b5e2ea52d3d\n```\n\nNow we have them in the right path for HairyText to manipulate, but we need to get them into the database.\nLuckily HairyText provides a convenience function to do this.\n\n```\niex\u003e Util.upsert_examples_from_image_folder()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlack%2Fhairytext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftlack%2Fhairytext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftlack%2Fhairytext/lists"}