Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tlack/hairytext
A data labeling and NLP tool for Elixir (uses Spacy)
https://github.com/tlack/hairytext
elixir entity-recognition nlp nlp-machine-learning phoenix-live-view spacy text-classification
Last synced: 3 months ago
JSON representation
A data labeling and NLP tool for Elixir (uses Spacy)
- Host: GitHub
- URL: https://github.com/tlack/hairytext
- Owner: tlack
- Created: 2020-05-02T01:03:04.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-04T15:42:07.000Z (almost 2 years ago)
- Last Synced: 2024-10-14T04:44:45.912Z (3 months ago)
- Topics: elixir, entity-recognition, nlp, nlp-machine-learning, phoenix-live-view, spacy, text-classification
- Language: Elixir
- Homepage:
- Size: 740 KB
- Stars: 25
- Watchers: 3
- Forks: 4
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hairy Text
![HairyText diagram](https://i.imgur.com/bKR3zlf.png)
Hairy Text is a tool for natural language processing.
With Hairy Text, you can perform named entity recognition (NER) tasks using the
world-class [Spacy](https://spacy.io) library, and label data for training to
improve your model. All this from a normal looking and mobile-friendly web
application.It is written in Elixir + Phoenix LiveView and Python, doesn't require a
database (totally self contained), and runs fine on a $5/mo server without a GPU.# Screenshot
## List of examples for labeling
![HairyText Examples screenshot](https://i.imgur.com/2dvaxjx.png)## Labeling interface in modal window
![HairyText Labeler screenshot](https://i.imgur.com/tWDeB6H.png)## Testing unseen input (and adding to labeling queue)
![HairyText TEST screenshot](https://i.imgur.com/uXdzYx9.png)# Features
* **Built with the awesome Spacy NLP framework** (so I probably didn't mess it up!)
* Easily label text fragments for machine learning / NLP experiments
* Interactive "test" console lets you quickly debug your model
* Refreshless but highly dynamic Phoenix Liveview web-based user interface (like React, without it)
* User logins with HTTP AUTH password check
* Export a .ZIP of your labeled examples and prediction history (both convenient .JSON files)
* REST JSON API for making predictions (to tie it into the rest of your project)
* API predictions stored in log and reviewable in app
* Label examples or images with a category/class
* Label text inside examples as entities - for instance "time reference" or "place name"
* Filter by entity tags or labels
* Train online via web interface and report live training progress (rough..)
* Support for multiple projects (some bugs)# Future
* Make into embeddable component like LiveDashboard
* Support for "one at a time" editing that's more about a workflow of doing one labeling task after another
* Object detection (aka classify regions inside images)
* Assist in generating low-confidence predictions to more quickly improve model
* Each project should have its own DETS files
* APIs to: label examples new and old, bulk predict, view predictions log
* Learn from embeddings (BERT, I'm looking at you)
* uPlot training graphs# Bugs
* Many Elixir warnings
* When creating a new example from the Predictions or Test screen, clicking on
the example text to label it will cause it to reset. This is really annoying.
Use a two-step editing process for now.
* Projects support broken in some ways (training and testing)# Notes
Hairy Text uses a custom DETS-based storage system shim for training examples, logs, etc.
that integrates with Elixir's built in Ecto database framework (but only for
trivial parts of its functionality).The connection to the Python-based NLP backend uses ErlPort
# Motivation
I wanted [Prodigy](https://prodi.gy/) but can't afford such bourgeoisie things.
# Requirements
* Elixir 1.10+
* Phoenix 1.5 + LiveView
* Python 3.6+
* Spacy NLP toolkit for Python# Installation
First, grab the code:
```
$ git clone https://github.com/tlack/hairytext/
```Then, install Spacy for Python. You'll need a recent version of Python. Consider using virtualenv with Hairytext. FYI, The Python code is in priv/python
```
$ pip install spacy
```Next, configure the default username/password:
```
$ vim config/config.exs
```Finally, install Elixir dependencies and start server:
```
$ npm install --prefix assets
$ mix deps.get
$ iex -S mix phx.server
```Then open http://localhost:4141 to start playing. The default username and password is `admin`:`sohairy`
# API
```
$ curl 'http://localhost:4141/api/predict/9d00fa70-df5c-4a3a-9f0d-8c53f3345417?text=i+am+live+on+twitch' | json_pp
{
"text" : "i am live on twitch",
"label" : "good",
"label_confidence" : 0.999650120735168,
"entities" : {
"service" : "twitch"
}
}
```Add a new example to an image classification project:
```
$ curl https://example.com/test/someimage.jpg -o test.jpg
$ curl -X POST -F "[email protected]" "http://localhost:4141/api/example/16700ec8-dab3-4d53-bcee-9b5e2ea52d3d"
```Add a new example image to a project using its URL:
```
$ curl -X POST -F "image=http://example.com/images/1.jpg" \
"http://localhost:4141/api/example/16700ec8-dab3-4d53-bcee-9b5e2ea52d3d"
```Add a new example, with a known label, to a project:
```
$ curl -X POST -F "image=http://example.com/images/1.jpg" \
-F "label=yellow" \
"http://localhost:4141/api/example/16700ec8-dab3-4d53-bcee-9b5e2ea52d3d"
```# Use from iex shell
Make a prediction for some new text. This returns the raw Spacy result format.
```
iex(48)> Spacy.predict("i want to go live on whuwhuwhaaaaat at 7am")
%{
'classification' => %{
[] => 0.9619296193122864,
'bad' => 0.03623370826244354,
'good' => 0.9784072041511536
},
'entities' => [["service", "whuwhuwhaaaaat"], ["when", "7am"]],
'text' => "i want to go live on whuwhuwhaaaaat at 7am"
}
```See the raw data about an example in the system:
```
iex(418)> HT.Data.list_examples |> List.last |> Map.get(:id) |> HT.Data.get_example!
%HT.Data.Example{
__meta__: #Ecto.Schema.Metadata<:built, "examples">,
entities: %{},
id: "e94e1954-0548-4f51-9570-e63cd298d2d7",
image: nil,
inserted_at: ~U[2020-04-30 04:05:34.965387Z],
label: "bad",
project: "9d00fa70-df5c-4a3a-9f0d-8c53f3345417",
source: nil,
status: nil,
text: "i hate when people start live streaming. twitch sucks.",
updated_at: ~U[2020-05-02 06:42:11.799197Z]
}
```There is a handy utility feature to use when you have a bulk of images to label.
First, copy images from your examples directory into the HairyText
`image_examples/` subdirectory for your project. Have your HairyText project ID
at hand for this process (you can find it editing project settings).```
$ find /tmp/my-new-examples/ -type f -name \*png | shuf | head -250 > example-list.txt
$ cp `cat example-list.txt` ~/hairy-text-path/image_examples/16700ec8-dab3-4d53-bcee-9b5e2ea52d3d
```Now we have them in the right path for HairyText to manipulate, but we need to get them into the database.
Luckily HairyText provides a convenience function to do this.```
iex> Util.upsert_examples_from_image_folder()
```