https://github.com/dwyl/image-classifier
πΌοΈ Classify images and extract data from or describe their contents using machine learning
https://github.com/dwyl/image-classifier
ai artificial-intelligence classification hugging-face huggingface image image-classification image-recognition learn machine-learning machinelearning tutorial
Last synced: 17 days ago
JSON representation
πΌοΈ Classify images and extract data from or describe their contents using machine learning
- Host: GitHub
- URL: https://github.com/dwyl/image-classifier
- Owner: dwyl
- License: gpl-2.0
- Created: 2023-10-25T08:42:36.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2026-01-02T20:12:14.000Z (3 months ago)
- Last Synced: 2026-01-06T21:34:12.228Z (3 months ago)
- Topics: ai, artificial-intelligence, classification, hugging-face, huggingface, image, image-classification, image-recognition, learn, machine-learning, machinelearning, tutorial
- Language: Jupyter Notebook
- Homepage:
- Size: 23.4 MB
- Stars: 31
- Watchers: 7
- Forks: 5
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Image Captioning & Semantic Search in `Elixir`

[](https://codecov.io/github/dwyl/image-classifier?branch=main)
[](https://github.com/dwyl/image-classifier/issues)
[](https://hits.dwyl.com/dwyl/image-classifier)
Let's use `Elixir` machine learning capabilities
to build an application
that performs **image captioning**
and **semantic search**
to look for uploaded images
with your voice! ποΈ
- [Image Captioning \& Semantic Search in `Elixir`](#image-captioning--semantic-search-in-elixir)
- [Why? π€·](#why-)
- [What? π](#what-)
- [Who? π€](#who-)
- [How? π»](#how-)
- [Prerequisites](#prerequisites)
- [π Image Captioning in `Elixir`](#-image-captioning-in-elixir)
- [0. Creating a fresh `Phoenix` project](#0-creating-a-fresh-phoenix-project)
- [1. Installing initial dependencies](#1-installing-initial-dependencies)
- [2. Adding `LiveView` capabilities to our project](#2-adding-liveview-capabilities-to-our-project)
- [3. Receiving image files](#3-receiving-image-files)
- [4. Integrating `Bumblebee`](#4-integrating-bumblebee)
- [4.1 `Nx` configuration](#41-nx-configuration)
- [4.2 `Async` processing the image for classification](#42-async-processing-the-image-for-classification)
- [4.2.1 Considerations regarding `async` processes](#421-considerations-regarding-async-processes)
- [4.2.2 Alternative for better testing](#422-alternative-for-better-testing)
- [4.3 Image pre-processing](#43-image-pre-processing)
- [4.4 Updating the view](#44-updating-the-view)
- [4.5 Check it out!](#45-check-it-out)
- [4.6 Considerations on user images](#46-considerations-on-user-images)
- [5. Final Touches](#5-final-touches)
- [5.1 Setting max file size](#51-setting-max-file-size)
- [5.2 Show errors](#52-show-errors)
- [5.3 Show image preview](#53-show-image-preview)
- [6. What about other models?](#6-what-about-other-models)
- [7. How do I deploy this thing?](#7-how-do-i-deploy-this-thing)
- [8. Showing example images](#8-showing-example-images)
- [8.1 Creating a hook in client](#81-creating-a-hook-in-client)
- [8.2 Handling the example images list event inside our LiveView](#82-handling-the-example-images-list-event-inside-our-liveview)
- [8.3 Updating the view](#83-updating-the-view)
- [8.4 Using URL of image instead of base64-encoded](#84-using-url-of-image-instead-of-base64-encoded)
- [8.5 See it running](#85-see-it-running)
- [9. Store metadata and classification info](#9-store-metadata-and-classification-info)
- [9.1 Installing dependencies](#91-installing-dependencies)
- [9.2 Adding `Postgres` configuration files](#92-adding-postgres-configuration-files)
- [9.3 Creating `Image` schema](#93-creating-image-schema)
- [9.4 Changing our LiveView to persist data](#94-changing-our-liveview-to-persist-data)
- [10. Adding double MIME type check and showing feedback to the person in case of failure](#10-adding-double-mime-type-check-and-showing-feedback-to-the-person-in-case-of-failure)
- [10.1 Showing a toast component with error](#101-showing-a-toast-component-with-error)
- [11. Benchmarking image captioning models](#11-benchmarking-image-captioning-models)
- [π Semantic search](#-semantic-search)
- [0. Overview of the process](#0-overview-of-the-process)
- [0.1 Audio transcription](#01-audio-transcription)
- [0.2 Creating embeddings](#02-creating-embeddings)
- [0.3 Semantical search](#03-semantical-search)
- [1. Pre-requisites](#1-pre-requisites)
- [2. Transcribe an audio recording](#2-transcribe-an-audio-recording)
- [1.1 Adding a loading spinner](#11-adding-a-loading-spinner)
- [2.2 Defining `Javascript` hook](#22-defining-javascript-hook)
- [2.3 Handling audio upload in `LiveView`](#23-handling-audio-upload-in-liveview)
- [2.4 Serving the `Whisper` model](#24-serving-the-whisper-model)
- [2.5 Handling the model's response and updating elements in the view](#25-handling-the-models-response-and-updating-elements-in-the-view)
- [3. Embeddings and semantic search](#3-embeddings-and-semantic-search)
- [3.1 The `HNSWLib` Index (GenServer)](#31-the-hnswlib-index-genserver)
- [3.2 Saving the `HNSWLib` Index in the database](#32-saving-the-hnswlib-index-in-the-database)
- [3.2 The embeding model](#32-the-embeding-model)
- [4. Using the Index and embeddings](#4-using-the-index-and-embeddings)
- [4.0 Check the folder "hnswlib"](#40-check-the-folder-hnswlib)
- [4.1 Computing the embeddings in our app](#41-computing-the-embeddings-in-our-app)
- [4.1.1 Changing the `Image` schema so it's embeddable](#411-changing-the-image-schema-so-its-embeddable)
- [4.1.2 Using embeddings in semantic search](#412-using-embeddings-in-semantic-search)
- [4.1.2.1 Mount socket assigns](#4121-mount-socket-assigns)
- [4.1.2.2 Consuming image uploads](#4122-consuming-image-uploads)
- [4.1.2.3 Using the embeddings to semantically search images](#4123-using-the-embeddings-to-semantically-search-images)
- [4.1.2.4 Creating embeddings when uploading images](#4124-creating-embeddings-when-uploading-images)
- [4.1.2.5 Update the LiveView view](#4125-update-the-liveview-view)
- [5. Tweaking our UI](#5-tweaking-our-ui)
- [_Please_ star the repo! βοΈ](#please-star-the-repo-οΈ)
## Why? π€·
Whilst building our [app](https://github.com/dwyl/app),
we consider `images` an _essential_ medium of communication.
We needed a fully-offline capable (no 3rd party APIs/Services) image captioning service
using state-of-the-art pre-trained image and embedding models to describe images uploaded in our
[`App`](https://github.com/dwyl/app).
By adding a way of captioning images, we make it _easy_ for people to suggest meta tags that describe images so they become **searchable**.
## What? π
A step-by-step tutorial building a fully functional
`Phoenix LiveView` web application that allows anyone
to upload an image and have it described
and searchable.
In addition to this,
the app will allow the person to record an audio
which describes the image they want to find.
The audio will be transcribed into text
and be semantically queryable.
We do this by encoding the image captions
as vectors and running `knn search` on them.
We'll be using three different models:
- Salesforce's BLIP model [`blip-image-captioning-large`](https://huggingface.co/Salesforce/blip-image-captioning-large)
for image captioning.
- OpenAI's speech recognition model
[`whisper-small`](https://huggingface.co/openai/whisper-small).
- [`sentence-transformers/paraphrase-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2)
embedding model.
## Who? π€
This tutorial is aimed at `Phoenix` beginners
who want to start exploring the machine-learning capabilities
of the Elixir language within a `Phoenix` application.
We propose to use pre-trained models from Hugging Face via `Bumblebee`
and grasp how to:
- run a model, in particular image captioning.
- how to use embeddings.
- how to run a semantic search using an
[Approximate Nearest Neighbour](https://towardsdatascience.com/comprehensive-guide-to-approximate-nearest-neighbors-algorithms-8b94f057d6b6)
algorithm.
If you are completely new to `Phoenix` and `LiveView`,
we recommend you follow the **`LiveView` _Counter_ Tutorial**:
[dwyl/phoenix-liveview-counter-tutorial](https://github.com/dwyl/phoenix-liveview-counter-tutorial)
## How? π»
In these chapters, we'll go over the development process of this small application.
You'll learn how to do this _yourself_, so grab some coffee and let's get cracking!
This section will be divided into two sections.
One will go over **image captioning**
while the second one will expand the application
by adding **semantic search**.
## Prerequisites
This tutorial requires you to have `Elixir` and `Phoenix` installed.
If you don't, please see [how to install Elixir](https://github.com/dwyl/learn-elixir#installation) and [Phoenix](https://hexdocs.pm/phoenix/installation.html#phoenix).
This guide assumes you know the basics of `Phoenix`
and have _some_ knowledge of how it works.
If you don't, we _highly suggest_ you follow our other tutorials first, e.g: [github.com/dwyl/**phoenix-chat-example**](https://github.com/dwyl/phoenix-chat-example)
In addition to this, **_some_ knowledge of `AWS`** - what it is, what an `S3` bucket is/does - **is assumed**.
> [!NOTE]
> If you have questions or get stuck,
> please open an issue!
> [/dwyl/image-classifier/issues](https://github.com/dwyl/image-classifier/issues)
## π Image Captioning in `Elixir`
> In this section, we'll start building our application
> with `Bumblebee` that supports Transformer models.
> At the end of this section,
> you'll have a fully functional application
> that receives an image,
> processes it accordingly
> and captions it.
### 0. Creating a fresh `Phoenix` project
Let's create a fresh `Phoenix` project.
Run the following command in a given folder:
```sh
mix phx.new . --app app --no-dashboard --no-ecto --no-gettext --no-mailer
```
We're running [`mix phx.new`](https://hexdocs.pm/phoenix/Mix.Tasks.Phx.New.html) to generate a new project without a dashboard and mailer (email) service,
since we don't need those features in our project.
After this, if you run `mix phx.server` to run your server, you should be able to see the following page.
We're ready to start building.
### 1. Installing initial dependencies
Now that we're ready to go, let's start by adding some dependencies.
Head over to `mix.exs`and add the following dependencies
to the `deps` section.
```elixir
{:bumblebee, "~> 0.5.0"},
{:exla, "~> 0.7.0"},
{:nx, "~> 0.7.0 "},
{:hnswlib, "~> 0.1.5"},
```
- [**`bumblebee`**](https://github.com/elixir-nx/bumblebee) is a framework that will allows us to integrate
[`Transformer Models`](https://huggingface.co/docs/transformers/index) in `Phoenix`.
The `Transformers` (from [Hugging Face](https://huggingface.co/))
are APIs that allow us to easily download and use
[pre-trained models](https://blogs.nvidia.com/blog/2022/12/08/what-is-a-pretrained-ai-model).
The `Bumblebee` package aims to support all Transformer Models, even if some are still lacking.
You may check which ones are supported by visiting
`Bumblebee`'s repository or by visiting https://jonatanklosko-bumblebee-tools.hf.space/apps/repository-inspector
and checking if the model is currently supported.
- [**`Nx`**](https://hexdocs.pm/nx/Nx.html) is a library that allows us to work with
[`Numerical Elixir`](https://github.com/elixir-nx/), the Elixir's way of doing [numerical computing](https://www.hilarispublisher.com/open-access/introduction-to-numerical-computing-2168-9679-1000423.pdf). It supports tensors and numericla computations.
- [**`EXLA`**](https://hexdocs.pm/exla/EXLA.html) is the Elixir implementation of [Google's XLA](https://www.tensorflow.org/xla/),
a compiler that provides faster linear algebra calculations
with `TensorFlow` models.
This backend compiler is needed for `Nx`.
We are installing `EXLA` because it allows us to compile models _just-in-time_ and run them on CPU and/or GPU.
- [**`Vix`**](https://hexdocs.pm/vix/readme.html) is an Elixir extension for [libvips](https://www.libvips.org/), an image processing library.
In `config/config.exs`, let's add our `:nx` configuration
to use `EXLA`.
```elixir
config :nx, default_backend: EXLA.Backend
```
### 2. Adding `LiveView` capabilities to our project
As it stands, our project is not using `LiveView`.
Let's fix this.
This will launch a super-powered process that establishes a WebSocket connection
between the server and the browser.
In `lib/app_web/router.ex`, change the `scope "/"` to the following.
```elixir
scope "/", AppWeb do
pipe_through :browser
live "/", PageLive
end
```
Instead of using the `PageController`,
we are going to be creating `PageLive`,
a `LiveView` file.
Let's create our `LiveView` files.
Inside `lib/app_web`,
create a folder called `live`
and create the following file
`page_live.ex`.
```elixir
#/lib/app_web/live/page_live.ex
defmodule AppWeb.PageLive do
use AppWeb, :live_view
@impl true
def mount(_params, _session, socket) do
{:ok, socket}
end
end
```
This is a simple `LiveView` controller.
In the same `live` folder,
create a file called `page_live.html.heex`
and use the following code.
```html
Image Classifier
Drag your images and we'll run an AI model to caption it!
Upload a file
or drag and drop
PNG, JPG, GIF up to 5MB
```
This is a simple HTML form that uses
[`Tailwind CSS`](https://github.com/dwyl/learn-tailwind)
to enhance the presentation of the upload form.
We'll also remove the unused header of the page layout,
while we're at it.
Locate the file `lib/app_web/components/layouts/app.html.heex`
and remove the `` class.
The file should only have the following code:
```html
<.flash_group flash={@flash} /> <%= @inner_content %>
```
Now you can safely delete the `lib/app_web/controllers` folder,
which is no longer used.
If you run `mix phx.server`,
you should see the following screen:
This means we've successfully added `LiveView`
and changed our view!
### 3. Receiving image files
Now, let's start by receiving some image files.
With `LiveView`,
we can easily do this by using
[`allow_upload/3`](https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html#allow_upload/3)
when mounting our `LiveView`.
With this function, we can easily accept
file uploads with progress.
We can define file types, max number of entries,
max file size,
validate the uploaded file and much more!
Firstly,
let's make some changes to
`lib/app_web/live/page_live.html.heex`.
```html
Image Classification
Do simple captioning with this
LiveView
demo, powered by
Bumblebee.
<.live_file_input upload={@uploads.image_list}
class="hidden" /> Upload
or drag and drop
PNG, JPG, GIF up to 5MB
```
We've added a few features:
- we used
[`<.live_file_input/>`](https://hexdocs.pm/phoenix_live_view/Phoenix.Component.html#live_file_input/1)
for `LiveView` file upload.
We've wrapped this component
with an element that is annotated with the `phx-drop-target` attribute
pointing to the DOM `id` of the file input.
- because we used `<.live_file_input/>`,
we need to annotate its wrapping element
with `phx-submit` and `phx-change`,
as per
[hexdocs.pm/phoenix_live_view/uploads.html#render-reactive-elements](https://hexdocs.pm/phoenix_live_view/uploads.html#render-reactive-elements)
Because we've added these bindings,
we need to add the event handlers in
`lib/app_web/live/page_live.ex`.
Open it and update it to:
```elixir
defmodule AppWeb.PageLive do
use AppWeb, :live_view
@impl true
def mount(_params, _session, socket) do
{:ok,
socket
|> assign(label: nil, upload_running?: false, task_ref: nil)
|> allow_upload(:image_list,
accept: ~w(image/*),
auto_upload: true,
progress: &handle_progress/3,
max_entries: 1,
chunk_size: 64_000
)}
end
@impl true
def handle_event("validate", _params, socket) do
{:noreply, socket}
end
@impl true
def handle_event("remove-selected", %{"ref" => ref}, socket) do
{:noreply, cancel_upload(socket, :image_list, ref)}
end
@impl true
def handle_event("save", _params, socket) do
{:noreply, socket}
end
defp handle_progress(:image_list, entry, socket) when entry.done? do
uploaded_file =
consume_uploaded_entry(socket, entry, fn %{path: _path} = _meta ->
{:ok, entry}
end)
{:noreply, socket}
end
defp handle_progress(:image_list, _, socket), do: {:noreply, socket}
end
```
- when `mount/3`ing the LiveView,
we are creating three socket assigns:
`label` pertains to the model prediction;
`upload_running?` is a boolean referring to whether the model is running or not;
`task_ref` refers to the reference of the task that was created for image classification
(we'll delve into this further later down the line).
Additionally, we are using the `allow_upload/3` function to define our upload configuration.
The most important settings here are `auto_upload` set to `true`
and the `progress` fields.
By configuring these two properties,
we are telling `LiveView` that _whenever the person uploads a file_,
**it is processed immediately and consumed**.
- the `progress` field is handled by the `handle_progress/3` function.
It receives chunks from the client with a build-in `UploadWriter` function
(as explained in the [docs](https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.UploadWriter.html)).
When the chunks are all consumed, we get the boolean `entry.done? == true`.
We consume the file in this function by using
[`consume_uploaded_entry/3`](https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html#consume_uploaded_entry/3).
The anonymous function returns `{:ok, data}` or `{:postpone, message}`.
Whilst consuming the entry/file,
we can access its path and then use its content.
_For now_, we don't need to use it.
But we will in the future to feed our image classifier with it!
After the callback function is executed,
this function "consumes the entry",
essentially deleting the image from the temporary folder
and removing it from the uploaded files list.()
- the `"validate"`, `"remove-selected"`, `"save"` event handlers
are called whenever the person uploads the image,
wants to remove it from the list of uploaded images
and when wants to submit the form,
respectively.
You may see that we're not doing much with these handlers;
we're simply replying with a `:noreply`
because we don't need to do anything with them.
And that's it!
If you run `mix phx.server`, nothing will change.
### 4. Integrating `Bumblebee`
Now here comes the fun part!
It's time to do some image captioning! π
#### 4.1 `Nx` configuration
We first need to add some initial setup in the
`lib/app/application.ex` file.
Head over there and change
the `start` function like so:
```elixir
@impl true
def start(_type, _args) do
children = [
# Start the Telemetry supervisor
AppWeb.Telemetry,
# Start the PubSub system
{Phoenix.PubSub, name: App.PubSub},
{Nx.Serving, serving: serving(), name: ImageClassifier},
# Start the Endpoint (http/https)
AppWeb.Endpoint
]
opts = [strategy: :one_for_one, name: App.Supervisor]
Supervisor.start_link(children, opts)
end
def serving do
{:ok, model_info} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})
Bumblebee.Vision.image_classification(model_info, featurizer,
top_k: 1,
compile: [batch_size: 10],
defn_options: [compiler: EXLA]
)
end
```
We are using [`Nx.Serving`](https://hexdocs.pm/nx/Nx.Serving.html), which simply allows us to encapsulate tasks; it can be networking, machine learning, data processing or any other task.
In this specific case, we are using it to **batch requests**.
This is extremely useful and important because we are using models that typically run on
[GPU](https://en.wikipedia.org/wiki/Graphics_processing_unit).
The GPU is _really good_ at **parallelizing tasks**.
Therefore, instead of sending an image classification request one by one, we can _batch them_/bundle them together as much as we can and then send it over.
We can define the `batch_size` and `batch_timeout` with `Nx.Serving`.
We're going to use the default values, hence why we're not explicitly defining them.
With `Nx.Serving`, we define a `serving/0` function
that is then used by it, which in turn is executed in the supervision tree since we declare it as a child in the Application module.
In the `serving/0` function, we are loading the[`ResNet-50`](https://huggingface.co/microsoft/resnet-50) model and its featurizer.
> [!NOTE]
> A `featurizer` can be seen as a
> [`Feature Extractor`](https://huggingface.co/docs/transformers/main_classes/feature_extractor).
> It is essentially a component that is responsible for converting input data
> into a format that can be processed by a pre-trained language model.
>
> It takes raw information and performs various transformations,
> such as
> [tokenization](https://neptune.ai/blog/tokenization-in-nlp),
> [padding](https://www.baeldung.com/cs/deep-neural-networks-padding),
> and encoding to prepare the data for model training or inference.
Lastly, this function returns a serving for image classification by calling [`image_classification/3`](https://hexdocs.pm/bumblebee/Bumblebee.Vision.html#image_classification/3), where we can define our compiler and task batch size.
We gave our serving function the name `ImageClassifier` as declared in the Application module.
#### 4.2 `Async` processing the image for classification
Now we're ready to send the image to the model
and get a prediction of it!
Every time we upload an image,
we are going to run **async processing**.
This means that the task responsible for image classification will be run in another process, thus asynchronously, meaning that the LiveView _won't have to wait_ for this task to finish
to continue working.
For this scenario, we are going to be using the
[`Task` module](https://hexdocs.pm/elixir/1.14/Task.html) to spawn processes to complete this task.
Go to `lib/app_web/live/page_live.ex`
and change the following code.
```elixir
def handle_progress(:image_list, entry, socket) when entry.done? do
# Consume the entry and get the tensor to feed to classifier
tensor = consume_uploaded_entry(socket, entry, fn %{} = meta ->
{:ok, vimage} = Vix.Vips.Image.new_from_file(meta.path)
pre_process_image(vimage)
end)
# Create an async task to classify the image
task = Task.async(fn -> Nx.Serving.batched_run(ImageClassifier, tensor) end)
# Update socket assigns to show spinner whilst task is running
{:noreply, assign(socket, upload_running?: true, task_ref: task.ref)}
end
@impl true
def handle_info({ref, result}, %{assigns: %{task_ref: ref}} = socket) do
# This is called everytime an Async Task is created.
# We flush it here.
Process.demonitor(ref, [:flush])
# And then destructure the result from the classifier.
%{predictions: [%{label: label}]} = result
# Update the socket assigns with result and stopping spinner.
{:noreply, assign(socket, label: label, upload_running?: false)}
end
```
> [!NOTE]
> The `pre_process_image/1` function is yet to be defined.
> We'll do that in the following section.
In the `handle_progress/3` function,
whilst we are consuming the image,
we are first converting it to a
[`Vix.Vips.Image`](https://hexdocs.pm/vix/Vix.Vips.Image.html) `Struct`
using the file path.
We then feed this image to the `pre_process_image/1` function that we'll implement later.
What's important is to notice this line:
```elixir
task = Task.async(fn -> Nx.Serving.batched_run(ImageClassifier, tensor) end)
```
We are using
[`Task.async/1`](https://hexdocs.pm/elixir/1.12/Task.html#async/1)
to call our `Nx.Serving` build function `ImageClassifier` we've defined earlier,
thus initiating a batched run with the image tensor.
While the task is spawned,
we update the socket assigns with the reference to the task (`:task_ref`)
and update the `:upload_running?` assign to `true`,
so we can show a spinner or a loading animation.
When the task is spawned using `Task.async/1`,
a couple of things happen in the background.
The new process is monitored by the caller (our `LiveView`),
which means that the caller will receive a
`{:DOWN, ref, :process, object, reason}`
message once the process it is monitoring dies.
And, a link is created between both processes.
Therefore,
we **don't need to use**
[**`Task.await/2`**](https://hexdocs.pm/elixir/1.12/Task.html#await/2).
Instead, we create a new handler to receive the aforementioned.
That's what we're doing in the
`handle_info({ref, result}, %{assigns: %{task_ref: ref}} = socket)` function.
The received message contains a `{ref, result}` tuple,
where `ref` is the monitorβs reference.
We use this reference to stop monitoring the task,
since we received the result we needed from our task
and we can discard an exit message.
In this same function, we destructure the prediction
from the model and assign it to the socket assign `:label`
and set `:upload_running?` to `false`.
Quite beautiful, isn't it?
With this, we don't have to worry if the person closes the browser tab.
The process dies (as does our `LiveView`),
and the work is automatically cancelled,
meaning no resources are spent
on a process for which nobody expects a result anymore.
##### 4.2.1 Considerations regarding `async` processes
When a task is spawned using `Task.async/2`,
**it is linked to the caller**.
Which means that they're related:
if one dies, the other does too.
We ought to take this into account when developing our application.
If we don't have control over the result of the task,
and we don't want our `LiveView` to crash if the task crashes,
we must use a different alternative to spawn our task -
[`Task.Supervisor.async_nolink/3`](https://hexdocs.pm/elixir/1.14/Task.Supervisor.html#async_nolink/3)
can be used for this effect,
meaning we can use it if we want to make sure
our `LiveView` won't die and the error is reported,
even if the task crashes.
We've chosen `Task.async/2` for this very reason.
We are doing something **that takes time/is expensive**
and we **want to stop the task if `LiveView` is closed/crashes**.
However, if you are building something
like a report that has to be generated even if the person closes the browser tab,
this is not the right solution.
##### 4.2.2 Alternative for better testing
We are spawning async tasks by calling `Task.async/1`.
This is creating an **_unsupervised_ task**.
Although it's plausible for this simple app,
it's best for us to create a
[**`Supervisor`**](https://hexdocs.pm/elixir/1.15.7/Supervisor.html)
that manages their child tasks.
This gives more control over the execution
and lifetime of the child tasks.
Additionally, it's better to have these tasks supervised
because it makes it possible to create tests for our `LiveView`.
For this, we need to make a couple of changes.
First, head over to `lib/app/application.ex`
and add a supervisor to the `start/2` function children array.
```elixir
def start(_type, _args) do
children = [
AppWeb.Telemetry,
{Phoenix.PubSub, name: App.PubSub},
{Nx.Serving, serving: serving(), name: ImageClassifier},
{Task.Supervisor, name: App.TaskSupervisor}, # add this line
AppWeb.Endpoint
]
opts = [strategy: :one_for_one, name: App.Supervisor]
Supervisor.start_link(children, opts)
end
```
We are creating a [`Task.Supervisor`](https://hexdocs.pm/elixir/Supervisor.html)
with the name `App.TaskSupervisor`.
Now, in `lib/app_web/live/page_live.ex`,
we create the async task like so:
```elixir
task = Task.Supervisor.async(App.TaskSupervisor, fn -> Nx.Serving.batched_run(ImageClassifier, tensor) end)
```
We are now using
[`Task.Supervisor.async`](https://hexdocs.pm/elixir/1.15.7/Task.Supervisor.html#async/3),
passing the name of the supervisor defined earlier.
And that's it!
We are creating async tasks like before,
the only difference is that they're now **supervised**.
In tests, you can create a small module that waits for the tasks to be completed.
```elixir
defmodule AppWeb.SupervisorSupport do
@moduledoc """
This is a support module helper that is meant to wait for all the children of a supervisor to complete.
If you go to `lib/app/application.ex`, you'll see that we created a `TaskSupervisor`, where async tasks are spawned.
This module helps us to wait for all the children to finish during tests.
"""
@doc """
Find all children spawned by this supervisor and wait until they finish.
"""
def wait_for_completion() do
pids = Task.Supervisor.children(App.TaskSupervisor)
Enum.each(pids, &Process.monitor/1)
wait_for_pids(pids)
end
defp wait_for_pids([]), do: nil
defp wait_for_pids(pids) do
receive do
{:DOWN, _ref, :process, pid, _reason} -> wait_for_pids(List.delete(pids, pid))
end
end
end
```
You can call `AppWeb.SupervisorSupport.wait_for_completion()`
in unit tests so they wait for the tasks to complete.
In our case,
we do that until the _prediction is made_.
#### 4.3 Image pre-processing
As we've noted before,
we need to **pre-process the image before passing it to the model**.
For this, we have three main steps:
- removing the [`alpha` ](https://en.wikipedia.org/wiki/Alpha_compositing)
out of the image, flattening it out.
- convert the image to `sRGB` [colourspace](https://en.wikipedia.org/wiki/Color_space).
This is needed to ensure that the image is consistent
and aligns with the model's training data images.
- set the representation of the image as a `Tensor`
to `height, width, bands`.
The image tensor will then be organized as a three-dimensional array,
where the first dimension represents the height of the image,
the second refers to the width of the image,
and the third pertains to the different
[spectral bands/channels of the image](https://en.wikipedia.org/wiki/Multispectral_imaging).
Our `pre_process_image/1` function will implement these three steps.
Let's implement it now!
In `lib/app_web/live/page_live.ex`,
add the following:
```elixir
defp pre_process_image(%Vimage{} = image) do
# If the image has an alpha channel, flatten it:
{:ok, flattened_image} = case Vix.Vips.Image.has_alpha?(image) do
true -> Vix.Vips.Operation.flatten(image)
false -> {:ok, image}
end
# Convert the image to sRGB colourspace ----------------
{:ok, srgb_image} = Vix.Vips.Operation.colourspace(flattened_image, :VIPS_INTERPRETATION_sRGB)
# Converting image to tensor ----------------
{:ok, tensor} = Vix.Vips.Image.write_to_tensor(srgb_image)
# We reshape the tensor given a specific format.
# In this case, we are using {height, width, channels/bands}.
%Vix.Tensor{data: binary, type: type, shape: {x, y, bands}} = tensor
format = [:height, :width, :bands]
shape = {x, y, bands}
final_tensor =
binary
|> Nx.from_binary(type)
|> Nx.reshape(shape, names: format)
{:ok, final_tensor}
end
```
The function receives a `Vix` image,
as detailed earlier.
We use [`flatten/1`](https://hexdocs.pm/vix/Vix.Vips.Operation.html#flatten/2)
to flatten the alpha out of the image.
The resulting image has its colourspace changed
by calling [`colourspace/3`](https://hexdocs.pm/vix/Vix.Vips.Operation.html#colourspace/3),
where we change the to `sRGB`.
The colourspace-altered image is then converted to a
[tensor](https://hexdocs.pm/vix/Vix.Tensor.html),
by calling
[`write_to_tensor/1`](https://hexdocs.pm/vix/Vix.Vips.Image.html#write_to_tensor/1).
We then
[reshape](https://hexdocs.pm/nx/Nx.html#reshape/3)
the tensor according to the format that was previously mentioned.
This function returns the processed tensor,
that is then used as input to the model.
### 4.4 Updating the view
All that's left is updating the view
to reflect these changes we've made to the `LiveView`.
Head over to `lib/app_web/live/page_live.html.heex`
and change it to this.
```html
<.flash_group flash={@flash} />
Image Classification
Do simple classification with this
LiveView
demo, powered by
Bumblebee.
<.live_file_input upload={@uploads.image_list}
class="hidden" /> Upload
or drag and drop
PNG, JPG, GIF up to 5MB
Description:
<%= if @upload_running? do %>
<% else %> <%= if @label do %>
<%= @label %>
<% else %>
Waiting for image input.
<% end %> <% end %>
```
In these changes,
we've added the output of the model in the form of text.
We are rendering a spinner
if the `:upload_running?` socket assign is set to true.
Otherwise,
we add the `:label`, which holds the prediction made by the model.
You may have also noticed that
we've changed the `phx` event handlers
to `noop`.
This is simply to simplify the `LiveView`.
Head over to `lib/app_web/live/page_live.ex`.
You can now remove the `"validate"`, `"save"`
and `"remove-selected"` handlers,
because we're not going to be needing them.
Replace them with this handler:
```elixir
@impl true
def handle_event("noop", _params, socket) do
{:noreply, socket}
end
```
#### 4.5 Check it out!
And that's it!
Our app is now _functional_ π.
If you run the app,
you can drag and drop or select an image.
After this, a task will be spawned that will run the model
against the image that was submitted.
Once a prediction is made, display it!
You can and **should** try other models.
`ResNet-50` is just one of the many that are supported by `Bumblebee`.
You can see the supported models at https://github.com/elixir-nx/bumblebee#model-support.
#### 4.6 Considerations on user images
To keep the app as simple as possible,
we are receiving the image from the person as is.
Although we are processing the image,
we are doing it so **it is processable by the model**.
We have to understand that:
- in most cases, **full-resolution images are not necessary**,
because neural networks work on much smaller inputs
(e.g. `ResNet-50` works with `224px x 224px` images).
This means that a lot of data is unnecessarily uploaded over the network,
increasing workload on the server to potentially downsize a large image.
- decoding an image requires an additional package,
meaning more work on the server.
We can avoid both of these downsides by moving this work to the client.
We can leverage the
[`Canvas API` ](https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API)
to decode and downsize this image on the client-side,
reducing server workload.
You can see an example implementation of this technique
in `Bumblebee`'s repository
at https://github.com/elixir-nx/bumblebee/blob/main/examples/phoenix/image_classification.exs
However, since we are not using `JavaScript` for anything,
we can (and _should_!) properly downsize our images
so they better fit the training dataset of the model we use.
This will allow the model to process faster
since larger images carry over more data that is ultimately unnecessary
for models to make predictions.
Open
`lib/app_web/live/page_live.ex`,
find the `handle_progress/3` function
and change resize the image _before processing it_.
```elixir
file_binary = File.read!(meta.path)
# Get image and resize
# This is dependant on the resolution of the model's dataset.
# In our case, we want the width to be closer to 640, whilst maintaining aspect ratio.
width = 640
{:ok, thumbnail_vimage} = Vix.Vips.Operation.thumbnail(meta.path, width, size: :VIPS_SIZE_DOWN)
# Pre-process it
{:ok, tensor} = pre_process_image(thumbnail_vimage)
#...
```
We are using
[`Vix.Vips.Operation.thumbnail/3`](https://hexdocs.pm/vix/Vix.Vips.Operation.html#thumbnail/3)
to resize our image to a fixed width
whilst maintaining aspect ratio.
The `width` variable can be dependent on the model that you use.
For example, `ResNet-50` is trained on `224px224` pictures,
so you may want to resize the image to this width.
> **Note**: We are using the `thumbnail/3` function
> instead of `resize/3` because it's _much_ faster.
> Check
> https://github.com/libvips/libvips/wiki/HOWTO----Image-shrinking
> to know why.
### 5. Final Touches
Although our app is functional,
we can make it **better**. π¨
#### 5.1 Setting max file size
In order to better control user input,
we should add a limit to the size of the image that is being uploaded.
It will be easier on our server and ultimately save costs.
Let's add a cap of `5MB` to our app!
Fortunately for you, this is super simple!
You just need to add the `max_file_size`
to the `allow_uploads/2` function
when mounting the `LiveView`!
```elixir
def mount(_params, _session, socket) do
{:ok,
socket
|> assign(label: nil, upload_running?: false, task_ref: nil)
|> allow_upload(:image_list,
accept: ~w(image/*),
auto_upload: true,
progress: &handle_progress/3,
max_entries: 1,
chunk_size: 64_000,
max_file_size: 5_000_000 # add this
)}
end
```
And that's it!
The number is in `bytes`,
hence why we set it as `5_000_000`.
#### 5.2 Show errors
In case a person uploads an image that is too large,
we should show this feedback to the person!
For this, we can leverage the
[`upload_errors/2`](https://hexdocs.pm/phoenix_live_view/Phoenix.Component.html#upload_errors/2)
function.
This function will return the entry errors for an upload.
We need to add a handler for one of these errors to show it first.
Head over `lib/app_web/live/page_live.ex`
and add the following line.
```elixir
def error_to_string(:too_large), do: "Image too large. Upload a smaller image up to 5MB."
```
Now, add the following section below the upload form
inside `lib/app_web/live/page_live.html.heex`.
```html
<%= for entry <- @uploads.image_list.entries do %>
<%= for err <- upload_errors(@uploads.image_list, entry) do %>
<%= error_to_string(err) %>
<% end %>
<% end %>
```
We are iterating over the errors returned by `upload_errors/2`
and invoking `error_to_string/1`,
which we've just defined in our `LiveView`.
Now, if you run the app
and try to upload an image that is too large,
an error will show up.
Awesome! π
#### 5.3 Show image preview
As of now, even though our app predicts the given images,
it is not showing a preview of the image the person submitted.
Let's fix this π οΈ.
Let's add a new socket assign variable
pertaining to the [base64](https://en.wikipedia.org/wiki/Base64) representation
of the image in `lib/app_web/live_page/live.ex`
```elixir
|> assign(label: nil, upload_running?: false, task_ref: nil, image_preview_base64: nil)
```
We've added `image_preview_base64`
as a new socket assign,
initializing it as `nil`.
Next, we need to _read the file while consuming it_,
and properly update the socket assign
so we can show it to the person.
In the same file,
change the `handle_progress/3` function to the following.
```elixir
def handle_progress(:image_list, entry, socket) when entry.done? do
# Consume the entry and get the tensor to feed to classifier
%{tensor: tensor, file_binary: file_binary} = consume_uploaded_entry(socket, entry, fn %{} = meta ->
file_binary = File.read!(meta.path)
{:ok, vimage} = Vix.Vips.Image.new_from_file(meta.path)
{:ok, tensor} = pre_process_image(vimage)
{:ok, %{tensor: tensor, file_binary: file_binary}}
end)
# Create an async task to classify the image
task = Task.Supervisor.async(App.TaskSupervisor, fn -> Nx.Serving.batched_run(ImageClassifier, tensor) end)
# Encode the image to base64
base64 = "data:image/png;base64, " <> Base.encode64(file_binary)
# Update socket assigns to show spinner whilst task is running
{:noreply, assign(socket, upload_running?: true, task_ref: task.ref, image_preview_base64: base64)}
end
```
We're using [`File.read!/1`](https://hexdocs.pm/elixir/1.13/File.html#read/1)
to retrieve the binary representation of the image that was uploaded.
We use [`Base.encode64/2`](https://hexdocs.pm/elixir/1.12/Base.html#encode64/2)
to encode this file binary
and assign the newly created `image_preview_base64` socket assign
with this base64 representation of the image.
Now, all that's left to do
is to _render the image on our view_.
In `lib/app_web/live/page_live.html.heex`,
locate the line:
```html
```
We are going to update this `
`
to show the image with the `image_preview_base64` socket assign.
```html
<%= if @image_preview_base64 do %>
<.live_file_input upload={@uploads.image_list} class="hidden" />
<% else %>
<.live_file_input upload={@uploads.image_list} class="hidden" />
Upload
or drag and drop
PNG, JPG, GIF up to 5MB
<% end %>
```
As you can see,
we are checking if `@image_preview_base64` is defined.
If so, we simply show the image with it as `src` π.
Now, if you run the application,
you'll see that after dragging the image,
it is previewed and shown to the person!
### 6. What about other models?
Maybe you weren't happy with the results from this model.
That's fair. `ResNet-50` is a smaller, "older" model compared to other
image captioning/classification models.
What if you wanted to use others?
Well, as we've mentioned before,
`Bumblebee` uses
[**Transformer models from `HuggingFace`**](https://huggingface.co/docs/transformers/index).
To know if one is supported
(as shown in [`Bumblebee`'s docs](https://github.com/elixir-nx/bumblebee#model-support)),
we need to check the `config.json` file
in the model repository
and copy the class name under `"architectures"`
and search it on `Bumblebee`'s codebase.
For example,
here's one of the more popular image captioning models -
Salesforce's `BLIP` -
https://huggingface.co/Salesforce/blip-image-captioning-large/blob/main/config.json.
If you visit `Bumblebee`'s codebase
and search for the class name,
you'll find it is supported.
Awesome!
Now we can use it!
If you dig around `Bumblebee`'s docs as well
(https://hexdocs.pm/bumblebee/Bumblebee.Vision.html#image_to_text/5),
you'll see that we've got to use `image_to_text/5` with this model.
It needs a `tokenizer`, `featurizer` and a `generation-config`
so we can use it.
Let's do it!
Head over to `lib/app/application.ex`,
and change the `serving/0` function.
```elixir
def serving do
{:ok, model_info} = Bumblebee.load_model({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "Salesforce/blip-image-captioning-base"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "Salesforce/blip-image-captioning-base"})
Bumblebee.Vision.image_to_text(model_info, featurizer, tokenizer, generation_config,
compile: [batch_size: 10],
defn_options: [compiler: EXLA]
)
end
```
As you can see, we're using the repository name of `BLIP`'s model
from the HuggingFace website.
If you run `mix phx.server`,
you'll see that it will download the new models,
tokenizers, featurizer and configs to run the model.
```sh
|======================================================================| 100% (989.82 MB)
[info] TfrtCpuClient created.
|======================================================================| 100% (711.39 KB)
[info] Running AppWeb.Endpoint with cowboy 2.10.0 at 127.0.0.1:4000 (http)
[info] Access AppWeb.Endpoint at http://localhost:4000
[watch] build finished, watching for changes...
```
You may think we're done here.
But we are not! β
The **destructuring of the output of the model may not be the same**.
If you try to submit a photo,
you'll get this error:
```sh
no match of right hand side value:
%{results: [%{text: "a person holding a large blue ball on a beach"}]}
```
This means that we need to make some changes
when parsing the output of the model π.
Head over to `lib/app_web/live/page_live.ex`
and change the `handle_info/3` function
that is called after the async task is completed.
```elixir
def handle_info({ref, result}, %{assigns: %{task_ref: ref}} = socket) do
Process.demonitor(ref, [:flush])
%{results: [%{text: label}]} = result # change this line
{:noreply, assign(socket, label: label, upload_running?: false)}
end
```
As you can see, we are now correctly destructuring the result from the model.
And that's it!
If you run `mix phx.server`,
you'll see that we got far more accurate results!
Awesome! π
> [!NOTE]
> Be aware that `BLIP`
> is a _much_ larger model than `ResNet-50`.
> There are more accurate and even larger models out there
> (e.g:
> [`blip-image-captioning-large`](https://huggingface.co/Salesforce/blip-image-captioning-large),
> the larger version of the model we've just used).
> This is a balancing act: the larger the model, the longer a prediction may take
> and more resources your server will need to have to handle this heavier workload.
> [!WARNING]
>
> We've created a small module that allows you to have multiple models
> cached and downloaded and keep this logic contained.
>
> For this, check the [`deployment guide`](./deployment.md#5-a-better-model-management).
### 7. How do I deploy this thing?
There are a few considerations you may want to have
before considering deploying this.
Luckily for you,
we've created a small document
that will **guide you through deploying this app in `fly.io`**!
Check the [`deployment.md`](./deployment.md) file for more information.
### 8. Showing example images
> [!WARNING]
>
> This section assumes you've made the changes made in the previous section.
> Therefore, you should follow the instructions in
> [`7. How do I deploy this thing?`](#7-how-do-i-deploy-this-thing)
> and come back after you're done.
We have a fully functioning application that predicts images.
Now we can add some cool touches to show the person
some examples if they are inactive.
For this,
we are going to need to make **three changes**.
- create a hook in the **client** (`Javascript`)
to send an event when there's inactivity after
a given number of seconds.
- change `page_live.ex` **LiveView** to accommodate
this new event.
- change the **view** m `page_live.html.heex`
to show these changes to the person.
Let's go over each one!
#### 8.1 Creating a hook in client
We are going to detect the inactivity of the person
with some `Javascript` code.
Head over to `assets/js/app.js`
and change it to the following.
```js
// If you want to use Phoenix channels, run `mix help phx.gen.channel`
// to get started and then uncomment the line below.
// import "./user_socket.js"
// You can include dependencies in two ways.
//
// The simplest option is to put them in assets/vendor and
// import them using relative paths:
//
// import "../vendor/some-package.js"
//
// Alternatively, you can `npm install some-package --prefix assets` and import
// them using a path starting with the package name:
//
// import "some-package"
//
// Include phoenix_html to handle method=PUT/DELETE in forms and buttons.
import "phoenix_html";
// Establish Phoenix Socket and LiveView configuration.
import { Socket } from "phoenix";
import { LiveSocket } from "phoenix_live_view";
import topbar from "../vendor/topbar";
// Hooks to track inactivity
let Hooks = {};
Hooks.ActivityTracker = {
mounted() {
// Set the inactivity duration in milliseconds
const inactivityDuration = 8000; // 8 seconds
// Set a variable to keep track of the timer and if the process to predict example image has already been sent
let inactivityTimer;
let processHasBeenSent = false;
let ctx = this;
// Function to reset the timer
function resetInactivityTimer() {
// Clear the previous timer
clearTimeout(inactivityTimer);
// Start a new timer
inactivityTimer = setTimeout(() => {
// Perform the desired action after the inactivity duration
// For example, send a message to the Elixir process using Phoenix Socket
if (!processHasBeenSent) {
processHasBeenSent = true;
ctx.pushEvent("show_examples", {});
}
}, inactivityDuration);
}
// Call the function to start the timer initially
resetInactivityTimer();
// Reset the timer whenever there is user activity
document.addEventListener("mousemove", resetInactivityTimer);
document.addEventListener("keydown", resetInactivityTimer);
},
};
let csrfToken = document
.querySelector("meta[name='csrf-token']")
.getAttribute("content");
let liveSocket = new LiveSocket("/live", Socket, {
hooks: Hooks,
params: { _csrf_token: csrfToken },
});
// Show progress bar on live navigation and form submits
topbar.config({ barColors: { 0: "#29d" }, shadowColor: "rgba(0, 0, 0, .3)" });
window.addEventListener("phx:page-loading-start", (_info) => topbar.show(300));
window.addEventListener("phx:page-loading-stop", (_info) => topbar.hide());
// connect if there are any LiveViews on the page
liveSocket.connect();
// expose liveSocket on window for web console debug logs and latency simulation:
// >> liveSocket.enableDebug()
// >> liveSocket.enableLatencySim(1000) // enabled for duration of browser session
// >> liveSocket.disableLatencySim()
window.liveSocket = liveSocket;
```
- we have added a `Hooks` variable,
with a property **`ActivityTracker`**.
This hook has the `mounted()` function
that is executed when the component that is hooked with this hook is mounted.
You can find more information at https://hexdocs.pm/phoenix_live_view/js-interop.html.
- inside the `mounted()` function,
we create a `resetInactivityTimer()` function
that is executed every time the
**mouse moves** (`mousemove` event)
and a **key is pressed**(`keydown`).
This function resets the timer
that is run whilst there is a lack of inactivity.
- if the person is inactive for _8 seconds_,
we create an event `"show_examples"`.
We will create a handler in the LiveView
to handle this event later.
- we add the `Hooks` variable to the `hooks`
property when initializing the `livesocket`.
And that's it!
For this hook to _actually be executed_,
we need to create a component that uses it
inside our view file.
For this, we can simply create a hidden component
on top of the `lib/app_web/live/page_live.html.heex` file.
Add the following hidden component.
```html
```
We use the `phx-hook` attribute
to bind the hook we've created in our `app.js` file
to the component so it's executed.
When this component is mounted,
the `mounted()` function inside the hook
is executed.
And that's it! π
Your app won't work yet because
we haven't created a handler
to handle the `"show_examples"` event.
Let's do that right now!
### 8.2 Handling the example images list event inside our LiveView
Now that we have our client sorted,
let's head over to our LiveView
at `lib/app_web/live/page_live.ex`
and make the needed changes!
Before anything,
let's add socket assigns that we will need throughout our adventure!
On top of the file,
change the socket assigns to the following.
```elixir
socket
|> assign(
# Related to the file uploaded by the user
label: nil,
upload_running?: false,
task_ref: nil,
image_preview_base64: nil,
# Related to the list of image examples
example_list_tasks: [],
example_list: [],
display_list?: false
)
```
We've added **three new assigns**:
- **`example_list_tasks`** is a list of
the async tasks that are created for each example image.
- **`example_list`** is a list of the example images
with their respective predictions.
- **`display_list?`** is a boolean that
tells us if the list is to be shown or not.
Awesome! Let's continue.
As we've mentioned before,
we need to create a handler for our `"show_examples"` event.
Add the following function to the file.
```elixir
@image_width 640
def handle_event("show_examples", _data, socket) do
# Only run if the user hasn't uploaded anything
if(is_nil(socket.assigns.task_ref)) do
# Retrieves a random image from Picsum with a given `image_width` dimension
random_image = "https://picsum.photos#{@image_width}/#{@image_width}"
# Spawns prediction tasks for example image from random Picsum image
tasks = for _ <- 1..2 do
{:req, body} = {:req, Req.get!(random_image).body}
predict_example_image(body)
end
# List to change `example_list` socket assign to show skeleton loading
display_example_images = Enum.map(tasks, fn obj -> %{predicting?: true, ref: obj.ref} end)
# Updates the socket assigns
{:noreply, assign(socket, example_list_tasks: tasks, example_list: display_example_images)}
else
{:noreply, socket}
end
end
```
> [!WARNING]
>
> We are using the
> [`req`](https://github.com/wojtekmach/req) package
> to download the file binary from the URL.
> Make sure to install it in the `mix.exs` file.
- we are using the [Picsum API](https://picsum.photos),
an _awesome_ image API with lots of photos!
They provide a `/random` URL that yields a random photo.
In this URL we can inclusively define the dimensions we want!
That's what we're doing in the first line of the function.
We are using a module constant `@image_width 640` on top of the file,
so add that to the top of the file.
This function is relevant because we preferably want to deal
with images that are in the same resolution as the dataset the model was trained in.
- we are creating **two async tasks** that retrieve the binary of the image
and pass it on to a `predict_example_image/1` function
(we will create this function next).
- the two tasks that we've created are in an array `tasks`.
We create _another array_ `display_example_images` with the same number of elements as `tasks`,
which will have two properties:
**`predicting`**, meaning if the image is being predicted by the model;
and **`ref`**, the reference of the task.
- we assign the `tasks` array to the `example_list_tasks` socket assign
and `display_example_images` array to the `example_list` array.
So `example_list` will temporarily hold
objects with `:predicting` and `:ref` properties
whilst the model is being executed.
As we've just mentioned,
we are making use of a function called
`predict_example_image/1` to make predictions
of a given file binary.
Let's implement it now!
In the same file, add:
```elixir
def predict_example_image(body) do
with {:vix, {:ok, img_thumb}} <-
{:vix, Vix.Vips.Operation.thumbnail_buffer(body, @image_width)},
{:pre_process, {:ok, t_img}} <-
{:pre_process, pre_process_image(img_thumb)} do
# Create an async task to classify the image from Picsum
Task.Supervisor.async(App.TaskSupervisor, fn ->
Nx.Serving.batched_run(ImageClassifier, t_img)
end)
|> Map.merge(%{base64_encoded_url: "data:image/png;base64, " <> Base.encode64(body)})
else
{stage, error} -> {stage, error}
end
end
```
For the body of the image to be executed by the model,
it needs to go through some pre-processing.
- we are using [`thumbnail_buffer/3`](https://hexdocs.pm/vix/Vix.Vips.Operation.html#thumbnail_buffer/3)
to make sure it's properly resized
and then feeding the result to our own implemented
`pre_process_image/1` function
so it can be converted to a parseable tensor by the model.
- after these two operations are successfully completed,
we spawn two async tasks (like we've done before)
and feed it to the model.
We add the base64-encoded image
to the return value so it can later be shown to the person.
- if these operations fail, we return an error.
Great job! π
Our example images async tasks have successfully been created
and are on their way to the model!
Now we need to handle these newly created async tasks
once they are completed.
As we know, we are handling our async tasks completion
in the `def handle_info({ref, result}, %{assigns: %{task_ref: ref}} = socket)` function.
Let's change it like so.
```elixir
def handle_info({ref, result}, %{assigns: assigns} = socket) do
# Flush async call
Process.demonitor(ref, [:flush])
# You need to change how you destructure the output of the model depending
# on the model you've chosen for `prod` and `test` envs on `models.ex`.)
label =
case Application.get_env(:app, :use_test_models, false) do
true ->
App.Models.extract_test_label(result)
# coveralls-ignore-start
false ->
App.Models.extract_prod_label(result)
# coveralls-ignore-stop
end
cond do
# If the upload task has finished executing, we update the socket assigns.
Map.get(assigns, :task_ref) == ref ->
{:noreply, assign(socket, label: label, upload_running?: false)}
# If the example task has finished executing, we upload the socket assigns.
img = Map.get(assigns, :example_list_tasks) |> Enum.find(&(&1.ref == ref)) ->
# Update the element in the `example_list` enum to turn "predicting?" to `false`
updated_example_list = Map.get(assigns, :example_list)
|> Enum.map(fn obj ->
if obj.ref == img.ref do
Map.put(obj, :base64_encoded_url, img.base64_encoded_url)
|> Map.put(:label, label)
|> Map.put(:predicting?, false)
else
obj
end end)
{:noreply,
assign(socket,
example_list: updated_example_list,
upload_running?: false,
display_list?: true
)}
end
end
```
The only change we've made is that
we've added a [`cond`](https://hexdocs.pm/elixir/1.16/case-cond-and-if.html#cond)
flow structure.
We are essentially checking
if the task reference that has been completed
is **from the uploaded image from the person** (`:task_ref` socket assign)
or **from an example image** (inside the `:example_list_tasks` socket assign list).
If it's the latter,
we retrieve we are updating
the `example_list` socket assign list
with the **prediction** (`:label`),
the **base64-encoded image from the task list** (`:base64_encoded_url`)
and setting the `:predicting` property to `false`.
And that's it!
Great job! π₯³
#### 8.3 Updating the view
Now that we've made all the necessary changes to our LiveView,
we need to update our view so it reflects them!
Head over to `lib/app_web/live/page_live.html.heex`
and change it to the following piece of code.
```html
π₯ LiveView
+
π Bumblebee
Caption your image!
Upload your own image (up to 5MB) and perform image captioning with
Elixir
!
Powered with
HuggingFaceπ€
transformer models, you can run this project locally and perform
machine learning tasks with a handful lines of code.
<%= if @image_preview_base64 do %>
<.live_file_input upload={@uploads.image_list}
class="hidden" />
<% else %>
<.live_file_input upload={@uploads.image_list}
class="hidden" /> Upload
or drag and drop
PNG, JPG, GIF up to 5MB
<% end %>
<%= for entry <- @uploads.image_list.entries do %>
<%= for err <- upload_errors(@uploads.image_list, entry) do %>
<%= error_to_string(err) %>
<% end %>
<% end %>
Description:
<%= if @upload_running? do %>
<% else %> <%= if @label do %>
<%= @label %>
<% else %>
Waiting for image input.
<% end %> <% end %>
<%= if @display_list? do %>
Examples
<%= for example_img <- @example_list do %>
<%= if example_img.predicting? == true do %>
Loading...
<% else %>
<%= example_img.label %>
<% end %> <% end %>
<% end %>
```
We've made two changes.
- we've added some text to better introduce our application
at the top of the page.
- added a section to show the example images list.
This section is only rendered if the
**`display_list?`** socket assign is set to `true`.
If so, we iterate over the
**`example_list`** socket assign list
and show a [loading skeleton](https://www.freecodecamp.org/news/how-to-build-skeleton-screens-using-css-for-better-user-experience/)
if the image is `:predicting`.
If not, it means the image has already been predicted,
and we show the base64-encoded image
like we do with the image uploaded by the person.
And that's it! π
#### 8.4 Using URL of image instead of base64-encoded
While our example list is being correctly rendered,
we are using additional CPU
to base64 encode our images
so they can be shown to the person.
Initially, we did this because
`https://picsum.photos`
resolves into a different URL every time it is called.
This means that the image that was fed into the model
would be different from the one shown in the example list
if we were to use this URL in our view.
To fix this,
**we need to follow the redirection when the URL is resolved**.
> [!NOTE]
> We can do this with
> [`Finch`](https://github.com/sneako/finch),
> if we wanted to.
>
> We could do something like
> def rand_splash do
>
> ```elixir
> %{scheme: scheme, host: host, path: path} =
> Finch.build(:get, "https://picsum.photos")
> |> Finch.request!(MyFinch)
> |> Map.get(:headers)
> |> Enum.filter(fn {a, _b} -> a == "location" end)
> |> List.first()
> |> elem(1)
> |> URI.parse()
>
> scheme <> "://" <> host <> path
> end
> ```
>
> And then call it, like so.
>
> ```elixir
> App.rand_splash()
> # https://images.unsplash.com/photo-1694813646634-9558dc7960e3
> ```
Because we are already using
[`req`](https://github.com/wojtekmach/req),
let's make use of it
instead of adding additional dependencies.
Let's first add a function that will do this
in `lib/app_web/live/page_live.ex`.
Add the following piece of code
at the end of the file.
```elixir
defp track_redirected(url) do
# Create request
req = Req.new(url: url)
# Add tracking properties to req object
req = req
|> Req.Request.register_options([:track_redirected])
|> Req.Request.prepend_response_steps(track_redirected: &track_redirected_uri/1)
# Make request
{:ok, response} = Req.request(req)
# Return the final URI
%{url: URI.to_string(response.private.final_uri), body: response.body}
end
defp track_redirected_uri({request, response}) do
{request, %{response | private: Map.put(response.private, :final_uri, request.url)}}
end
```
This function adds properties to the request object
and tracks the redirection.
It will add a [`URI`](https://hexdocs.pm/elixir/1.12/URI.html#summary)
object inside `private.final_uri`.
This function returns
the `body` of the image
and the final `url`
it is resolved to
(the URL of the image).
Now all we need to do is use this function!
Head over to the `handle_event("show_examples"...` function
and change the loop to the following.
```elixir
tasks = for _ <- 1..2 do
%{url: url, body: body} = track_redirected(random_image)
predict_example_image(body, url)
end
```
We are making use of `track_redirected/1`,
the function we've just created.
We pass both `body` and `url` to `predict_example_image/1`,
which we will now change.
```elixir
def predict_example_image(body, url) do
with {:vix, {:ok, img_thumb}} <-
{:vix, Vix.Vips.Operation.thumbnail_buffer(body, @image_width)},
{:pre_process, {:ok, t_img}} <- {:pre_process, pre_process_image(img_thumb)} do
# Create an async task to classify the image from Picsum
Task.Supervisor.async(App.TaskSupervisor, fn ->
Nx.Serving.batched_run(ImageClassifier, t_img)
end)
|> Map.merge(%{url: url})
else
{:vix, {:error, msg}} -> {:error, msg}
{:pre_process, {:error, msg}} -> {:error, msg}
end
end
```
Instead of using `base64_encoded_url`,
we are now using the `url` we've acquired.
The last step we need to do in our LiveView
is to finally use this `url`
in `handle_info/3`.
```elixir
def handle_info({ref, result}, %{assigns: assigns} = socket) do
Process.demonitor(ref, [:flush])
label =
case Application.get_env(:app, :use_test_models, false) do
true ->
App.Models.extract_test_label(result)
false ->
App.Models.extract_prod_label(result)
end
cond do
Map.get(assigns, :task_ref) == ref ->
{:noreply, assign(socket, label: label, upload_running?: false)}
img = Map.get(assigns, :example_list_tasks) |> Enum.find(&(&1.ref == ref)) ->
updated_example_list = Map.get(assigns, :example_list)
|> Enum.map(fn obj ->
if obj.ref == img.ref do
obj
|> Map.put(:url, img.url) # change here
|> Map.put(:label, label)
|> Map.put(:predicting?, false)
else
obj
end end)
{:noreply,
assign(socket,
example_list: updated_example_list,
upload_running?: false,
display_list?: true
)}
end
end
```
And that's it!
The last thing we need to do is change our view
so it uses the `:url` parameter
instead of the obsolete `:base64_encoded_url`.
Head over to `lib/app_web/live/page_live.html.heex`
and change the `
` being shown in the example list
so it uses the `:url` parameter.
```html
```
And we're done! π
We are now rendering the image on the client
through the URL the Picsum API resolves into
instead of having the LiveView server
encoding the image.
Therefore, we're saving some CPU
to the thing that matters the most:
_running our model_.
#### 8.5 See it running
Now let's see our application in action!
We are expecting the examples to be shown after
**8 seconds** of inactivity.
If the person is inactive for this time duration,
we fetch a random image from Picsum API
and feed it to our model!
You should see different images every time you use the app.
Isn't that cool? π
### 9. Store metadata and classification info
Our app is shaping up quite nicely!
As it stands, it's an application that does inference on images.
However, it doesn't save them.
Let's expand our application so it has a **database**
where image classification is saved/persisted!
We'll use **`Postgres`** for this.
Typically, when you create a new `Phoenix` project
with `mix phx.new`,
[a `Postgres` database will be automatically created](https://hexdocs.pm/phoenix/installation.html#postgresql).
Because we didn't do this,
we'll have to configure this ourselves.
Let's do it!
#### 9.1 Installing dependencies
We'll install all the needed dependencies first.
In `mix.exs`, add the following snippet
to the `deps` section.
```elixir
# HTTP Request
{:httpoison, "~> 2.2"},
{:mime, "~> 2.0.5"},
{:ex_image_info, "~> 0.2.4"},
# DB
{:phoenix_ecto, "~> 4.4"},
{:ecto_sql, "~> 3.10"},
{:postgrex, ">= 0.0.0"},
```
- [**`httpoison`**](https://github.com/edgurgel/httpoison),
[**`mime`**](https://hex.pm/packages/mime) and
[**`ex_image_info`**](https://hex.pm/packages/ex_image_info)
are used to make HTTP requests,
get the content type and information from an image file,
respectively.
These will be needed to upload a given image to
[`imgup`](https://github.com/dwyl/imgup),
by making multipart requests.
- [**`phoenix_ecto`**](https://github.com/phoenixframework/phoenix_ecto),
[**`ecto_sql`**](https://github.com/elixir-ecto/ecto_sql) and
[**`postgrex`**](https://github.com/elixir-ecto/postgrex)
are needed to properly configure our driver in Elixir
that will connect to a Postgres database,
in which we will persist data.
Run `mix deps.get` to install these dependencies.
#### 9.2 Adding `Postgres` configuration files
Now let's create the needed files
to properly connect to a Postgres relational database.
Start by going to `lib/app`
and create `repo.ex`.
```elixir
defmodule App.Repo do
use Ecto.Repo,
otp_app: :app,
adapter: Ecto.Adapters.Postgres
end
```
This module will be needed in our configuration files
so our app knows where the database is.
Next, in `lib/app/application.ex`,
add the following line to the `children` array
in the supervision tree.
```elixir
children = [
AppWeb.Telemetry,
App.Repo, # add this line
{Phoenix.PubSub, name: App.PubSub},
...
]
```
Awesome! π
Now let's head over to the files inside the `config` folder.
In `config/config.exs`,
add these lines.
```elixir
config :app,
ecto_repos: [App.Repo],
generators: [timestamp_type: :utc_datetime]
```
We are referring the module we've previously created (`App.Repo`)
to `ecto_repos` so `ecto` knows where the configuration
for the database is located.
In `config/dev.exs`,
add:
```elixir
config :app, App.Repo,
username: "postgres",
password: "postgres",
hostname: "localhost",
database: "app_dev",
stacktrace: true,
show_sensitive_data_on_connection_error: true,
pool_size: 10
```
We are defining the parameters of the database
that is used during development.
In `config/runtime.exs`,
add:
```elixir
if config_env() == :prod do
database_url =
System.get_env("DATABASE_URL") ||
raise """
environment variable DATABASE_URL is missing.
For example: ecto://USER:PASS@HOST/DATABASE
"""
maybe_ipv6 = if System.get_env("ECTO_IPV6") in ~w(true 1), do: [:inet6], else: []
config :app, App.Repo,
# ssl: true,
url: database_url,
pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10"),
socket_options: maybe_ipv6
# ...
```
We are configuring the runtime database configuration..
In `config/test.exs`,
add:
```elixir
config :app, App.Repo,
username: "postgres",
password: "postgres",
hostname: "localhost",
database: "app_test#{System.get_env("MIX_TEST_PARTITION")}",
pool: Ecto.Adapters.SQL.Sandbox,
pool_size: 10
```
Here we're defining the database used during testing.
Now let's create a **migration file** to create our database table.
In `priv/repo/migrations/`, create a file
called `20231204092441_create_images.exs`
(or any other timestamp string)
with the following piece of code.
```elixir
defmodule App.Repo.Migrations.CreateImages do
use Ecto.Migration
def change do
create table(:images) do
add :url, :string
add :description, :string
add :width, :integer
add :height, :integer
timestamps(type: :utc_datetime)
end
end
end
```
And that's it!
Those are the needed files that our application needs
to properly connect and persist data into the Postgres database.
You can now run `mix ecto.create` and `mix ecto.migrate`
to create the database
and the `"images"` table.
#### 9.3 Creating `Image` schema
For now, let's create a simple table `"images"`
in our database
that has the following properties:
- **`description`**: the description of the image
from the model.
- **`width`**: width of the image.
- **`height`**: height of the image.
- **`url`**: public URL where the image is publicly hosted.
With this in mind, let's create a new file!
In `lib/app/`, create a file called `image.ex`.
```elixir
defmodule App.Image do
use Ecto.Schema
alias App.{Image, Repo}
@primary_key {:id, :id, autogenerate: true}
schema "images" do
field(:description, :string)
field(:width, :integer)
field(:url, :string)
field(:height, :integer)
timestamps(type: :utc_datetime)
end
def changeset(image, params \\ %{}) do
image
|> Ecto.Changeset.cast(params, [:url, :description, :width, :height])
|> Ecto.Changeset.validate_required([:url, :description, :width, :height])
end
@doc """
Uploads the given image to S3
and adds the image information to the database.
"""
def insert(image) do
%Image{}
|> changeset(image)
|> Repo.insert!()
end
end
```
We've just created the `App.Image` schema
with the aforementioned fields.
We've created `changeset/1`, which is used to cast
and validate the properties of a given object
before interacting with the database.
`insert/1` receives an object,
runs it through the changeset
and inserts it in the database.
#### 9.4 Changing our LiveView to persist data
Now that we have our database set up,
let's change some of our code so we persist data into it!
In this section, we'll be working in the `lib/app_web/live/page_live.ex` file.
First, let's import `App.Image`
and create an `ImageInfo` struct to hold the information
of the image throughout the process of uploading
and classifying the image.
```elixir
defmodule AppWeb.PageLive do
use AppWeb, :live_view
alias App.Image # add this import
alias Vix.Vips.Image, as: Vimage
defmodule ImageInfo do
@doc """
General information for the image that is being analysed.
This information is useful when persisting the image to the database.
"""
defstruct [:mimetype, :width, :height, :url, :file_binary]
end
# ...
```
We are going to be using `ImageInfo`
in our socket assigns.
Let's add to it when the LiveView is mounting!
```elixir
|> assign(
label: nil,
upload_running?: false,
task_ref: nil,
image_info: nil, # add this line
image_preview_base64: nil,
example_list_tasks: [],
example_list: [],
display_list?: false
)
```
When the person uploads an image,
we want to retrieve its info (namely its _height_, _width_)
and upload the image to an `S3` bucket (we're doing this through `imgup`)
so we can populate the `:url` field of the schema in the database.
We can retrieve this information _while consuming the entry_
/uploading the image file.
For this, go to `handle_progress(:image_list, entry, socket)`
and change the function to the following.
```elixir
def handle_progress(:image_list, entry, socket) when entry.done? do
# We've changed the object that is returned from `consume_uploaded_entry/3` to return an `image_info` object.
%{tensor: tensor, image_info: image_info} =
consume_uploaded_entry(socket, entry, fn %{} = meta ->
file_binary = File.read!(meta.path)
# Add this line. It uses `ExImageInfo` to retrieve the info from the file binary.
{mimetype, width, height, _variant} = ExImageInfo.info(file_binary)
{:ok, thumbnail_vimage} =
Vix.Vips.Operation.thumbnail(meta.path, @image_width, size: :VIPS_SIZE_DOWN)
{:ok, tensor} = pre_process_image(thumbnail_vimage)
# Add this line. Uploads the image to the S3, which returns the `url` and `compressed url`.
# (we'll implement this function next)
url = Image.upload_image_to_s3(meta.path, mimetype) |> Map.get("url")
# Add this line. We are constructing the image_info object to be returned.
image_info = %ImageInfo{mimetype: mimetype, width: width, height: height, file_binary: file_binary, url: url}
# Return it
{:ok, %{tensor: tensor, image_info: image_info}}
end)
task =
Task.Supervisor.async(App.TaskSupervisor, fn ->
Nx.Serving.batched_run(ImageClassifier, tensor)
end)
base64 = "data:image/png;base64, " <> Base.encode64(image_info.file_binary)
# Change this line so `image_info` is defined when the image is uploaded
{:noreply, assign(socket, upload_running?: true, task_ref: task.ref, image_preview_base64: base64, image_info: image_info)}
#else
# {:noreply, socket}
#end
end
```
Check the comment lines for more explanation on the changes that have bee nmade.
We are using `ExImageInfo` to fetch the information from the image
and assigning it to the `image_info` socket we defined earlier.
We are also using `Image.upload_image_to_s3/2` to upload our image to `imgup`.
Let's define this function in `lib/app/image.ex`.
```elixir
def upload_image_to_s3(file_path, mimetype) do
extension = MIME.extensions(mimetype) |> Enum.at(0)
# Upload to Imgup - https://github.com/dwyl/imgup
upload_response =
HTTPoison.post!(
"https://imgup.fly.dev/api/images",
{:multipart,
[
{
:file,
file_path,
{"form-data", [name: "image", filename: "#{Path.basename(file_path)}.#{extension}"]},
[{"Content-Type", mimetype}]
}
]},
[]
)
# Return URL
Jason.decode!(upload_response.body)
end
```
We're using `HTTPoison` to make a multipart request to the `imgup` server,
effectively uploading the image to the server.
If the upload is successful, it returns the `url` of the uploaded image.
Let's go back to `lib/app_web/live/page_live.ex`.
Now that we have `image_info` in the socket assigns,
we can use it to **insert a row in the `"images"` table in the database**.
We only want to do this after the model is done running,
so simply change `handle_info/2` function
(which is called after the model is done with classifying the image).
```elixir
cond do
# If the upload task has finished executing, we update the socket assigns.
Map.get(assigns, :task_ref) == ref ->
# Insert image to database
image = %{
url: assigns.image_info.url,
width: assigns.image_info.width,
height: assigns.image_info.height,
description: label
}
Image.insert(image)
# Update socket assigns
{:noreply, assign(socket, label: label, upload_running?: false)}
# ...
```
In the `cond do` statement,
we want to change the one pertaining to the image that is uploaded,
_not the example list_ that is defined below.
We simply create an `image` variable with information
that is passed down to `Image.insert/1`,
effectively adding the row to the database.
And that's it!
Now every time a person uploads an image
and the model is executed,
we are saving its location (`:url`),
information (`:width` and `:height`)
and the result of the classifying model
(`:description`).
π₯³
> [!NOTE]
>
> If you're curious and want to see the data in your database,
> we recommend using [`DBeaver`](https://dbeaver.io/),
> an open-source database manager.
>
> You can learn more about it at https://github.com/dwyl/learn-postgresql.
### 10. Adding double MIME type check and showing feedback to the person in case of failure
Currently, we are not handling any errors
in case the upload of the image to `imgup` fails.
Although this is not critical,
it'd be better if we could show feedback to the person
in case the upload to `imgup` fails.
This is good for us as well,
because we _can monitor and locate the error faster_
if we log the errors.
For this, let's head over to `lib/app/image.ex`
and update the `upload_image_to_s3/2` function we've implemented.
```elixir
def upload_image_to_s3(file_path, mimetype) do
extension = MIME.extensions(mimetype) |> Enum.at(0)
# Upload to Imgup - https://github.com/dwyl/imgup
upload_response =
HTTPoison.post(
"https://imgup.fly.dev/api/images",
{:multipart,
[
{
:file,
file_path,
{"form-data", [name: "image", filename: "#{Path.basename(file_path)}.#{extension}"]},
[{"Content-Type", mimetype}]
}
]},
[]
)
# Process the response and return error if there was a problem uploading the image
case upload_response do
# In case it's successful
{:ok, %HTTPoison.Response{status_code: 200, body: body}} ->
%{"url" => url, "compressed_url" => _} = Jason.decode!(body)
{:ok, url}
# In case it returns HTTP 400 with specific reason it failed
{:ok, %HTTPoison.Response{status_code: 400, body: body}} ->
%{"errors" => %{"detail" => reason}} = Jason.decode!(body)
{:error, reason}
# In case the request fails for whatever other reason
{:error, %HTTPoison.Error{reason: reason}} ->
{:error, reason}
end
end
```
As you can see,
we are returning `{:error, reason}` if an error occurs,
and providing feedback alongside it.
If it's successful, we return `{:ok, url}`.
Because we've just changed this function,
we need to also update `def handle_progress(:image_list...`
inside `lib/app_web/live/page_live.ex`
to properly handle this new function output.
We are also introducing a double MIME type check to ensure that only image files are uploaded and processed.
We use [GenMagic](https://hexdocs.pm/gen_magic/readme.html). It provides supervised and customisable access to `libmagic` using a supervised external process.
[This gist](https://gist.github.com/leommoore/f9e57ba2aa4bf197ebc5) explains that Magic numbers are the first bits of a file
which uniquely identifies the type of file.
We use the GenMagic server as a daemon; it is started in the Application module.
It is referenced by its name.
When we run `perform`, we obtain a map and compare the mime type with the one read by `ExImageInfo`.
If they correspond with each other, we continue, or else we stop the process.
On your computer, for this to work locally, you should install the package `libmagic-dev`.
> [!NOTE]
>
> Depending on your OS, you may install `libmagic` in different ways.
> A quick Google search will suffice,
> but here are a few resources nonetheless:
>
> - Mac: https://gist.github.com/eparreno/1845561
> - Windows: https://github.com/nscaife/file-windows
> - Linux: https://zoomadmin.com/HowToInstall/UbuntuPackage/libmagic-dev
>
> **Definitely read `gen_magic`'s installation section in https://github.com/evadne/gen_magic#installation**.
> You may need to perform additional steps.
You'll need to add [`gen_magic`](https://github.com/evadne/gen_magic)
to `mix.exs`.
This dependency will allow us to access `libmagic` through `Elixir`.
```elixir
def deps do
[
{:gen_magic, "~> 1.1.1"}
]
end
```
In the `Application` module, you should add the `GenMagic` daemon
(the C lib is loaded once for all and referenced by its name).
```elixir
#application.ex
children = [
...,
{GenMagic.Server, name: :gen_magic},
]
```
In the Dockerfile (needed to deploy this app), we will install the `libmagic-dev` as well:
```Dockerfile
RUN apt-get update -y && \
apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates libmagic-dev\
&& apt-get clean && rm -f /var/lib/apt/lists/*_*
```
Add the following function in the module App.Image:
```elixir
@doc """
Check file type via magic number. It uses a GenServer running the `C` lib "libmagic".
"""
def gen_magic_eval(path, accepted_mime) do
GenMagic.Server.perform(:gen_magic, path)
|> case do
{:error, reason} ->
{:error, reason}
{:ok,
%GenMagic.Result{
mime_type: mime,
encoding: "binary",
content: _content
}} ->
if Enum.member?(accepted_mime, mime),
do: {:ok, %{mime_type: mime}},
else: {:error, "Not accepted mime type."}
{:ok, %GenMagic.Result{} = res} ->
require Logger
Logger.warning("β οΈ MIME type error: #{inspect(res)}")
{:error, "Not acceptable."}
end
end
end
```
In the `page_live.ex` module, add the functions:
```elixir
@doc"""
Use the previous function and eturn the GenMagic reponse from the previous function
"""
def magic_check(path) do
App.Image.gen_magic_eval(path, @accepted_mime)
|> case do
{:ok, %{mime_type: mime}} ->
{:ok, %{mime_type: mime}}
{:error, msg} ->
{:error, msg}
end
end
@doc """
Double-checks the MIME type of uploaded file to ensure that the file
is an image and is not corrupted.
"""
def check_mime(magic_mime, info_mime) do
if magic_mime == info_mime, do: :ok, else: :error
end
```
We are now ready to double-check the file input
with `ExImageInfo` and `GenMagic` to ensure the safety of the uploads.
```elixir
def handle_progress(:image_list, entry, socket) when entry.done? do
# We consume the entry only if the entry is done uploading from the image
# and if consuming the entry was successful.
with %{tensor: tensor, image_info: image_info} <-
consume_uploaded_entry(socket, entry, fn %{path: path} ->
with {:magic, {:ok, %{mime_type: mime}}} <-
{:magic, magic_check(path)},
{:read, {:ok, file_binary}} <-
{:read, File.read(path)},
{:image_info, {mimetype, width, height, _variant}} <-
{:image_info, ExImageInfo.info(file_binary)},
{:check_mime, :ok} <-
{:check_mime, check_mime(mime, mimetype)},
# Get image and resize
{:ok, thumbnail_vimage} <-
Vix.Vips.Operation.thumbnail(path, @image_width, size: :VIPS_SIZE_DOWN),
# Pre-process it
{:ok, tensor} <-
pre_process_image(thumbnail_vimage) do
# Upload image to S3
Image.upload_image_to_s3(path, mimetype)
|> case do
{:ok, url} ->
image_info = %ImageInfo{
mimetype: mimetype,
width: width,
height: height,
file_binary: file_binary,
url: url
}
{:ok, %{tensor: tensor, image_info: image_info}}
# If S3 upload fails, we return error
{:error, reason} ->
{:ok, %{error: reason}}
end
else
{:error, reason} -> {:postpone, %{error: reason}}
end
end) do
# If consuming the entry was successful, we spawn a task to classify the image
# and update the socket assigns
task =
Task.Supervisor.async(App.TaskSupervisor, fn ->
Nx.Serving.batched_run(ImageClassifier, tensor)
end)
# Encode the image to base64
base64 = "data:image/png;base64, " <> Base.encode64(image_info.file_binary)
{:noreply,
assign(socket,
upload_running?: true,
task_ref: task.ref,
image_preview_base64: base64,
image_info: image_info
)}
# Otherwise, if there was an error uploading the image, we log the error and show it to the person.
else
%{error: reason} ->
Logger.warning("β οΈ Error uploading image. #{inspect(reason)}")
{:noreply, push_event(socket, "toast", %{message: "Image couldn't be uploaded to S3.\n#{reason}"})}
_ ->
{:noreply, socket}
end
end
```
Phew! That's a lot!
Let's go through the changes we've made.
- we are using the [`with` statement](https://www.openmymind.net/Elixirs-With-Statement/)
to only feed the image to the model for classification
in case the upload to `imgup` succeeds.
We've changed what `consume_uploaded_entry/3` returns
in case the upload fails - we return `{:ok, %{error: reason}}`.
- in case the upload fails,
we pattern match the `{:ok, %{error: reason}}` object
and push a `"toast"` event to the Javascript client
(we'll implement these changes shortly).
Because we push an event in case the upload fails,
we are going to make some changes to the Javascript client.
We are going to **show a toast with the error when the upload fails**.
#### 10.1 Showing a toast component with error
To show a [toast component](https://getbootstrap.com/docs/4.3/components/toasts/),
we are going to use
[`toastify.js`](https://apvarun.github.io/toastify-js/).
Navigate to `assets` folder
and run:
```sh
pnpm install toastify-js
```
With this installed, we need to import `toastify` styles
in `assets/css/app.css`.
```css
@import "../node_modules/toastify-js/src/toastify.css";
```
All that's left is **handle the `"toast"` event in `assets/js/app.js`**.
Add the following snippet of code to do so.
```js
// Hook to show message toast
Hooks.MessageToaster = {
mounted() {
this.handleEvent("toast", (payload) => {
Toastify({
text: payload.message,
gravity: "bottom",
position: "right",
style: {
background: "linear-gradient(to right, #f27474, #ed87b5)",
},
duration: 4000,
}).showToast();
});
},
};
```
With the `payload.message` we're receiving from the LiveView
(remember when we executed `push_event/3` in our LiveView?),
we are using it to create a `Toastify` object
that is shown in case the upload fails.
And that's it!
Quite easy, isn't it? π
If `imgup` is down or the image that was sent was for example, invalid, an error should be shown, like so.
### 11. Benchmarking image captioning models
You may be wondering: which model is most suitable for me?
Depending on the use case,
`Bumblebee` supports different models
for different scenarios.
To help you make up your mind,
we've created a guide
that benchmarks some of `Bumblebee`-supported models
for image captioning.
Although few models are supported,
as they add more models,
this comparison table will grow.
So any contribution is more than welcome! π
You may check the guide
and all of the code
inside the
[`_comparison`](./_comparison/) folder.
## π Semantic search
> In this section, we will focus on implementing a
> **_full-text_ search query** through the captions of the images.
> At the end of this,
> you'll be able to transcribe audio,
> create embeddings from the audio transcription
> and search the closest related image.
> [!NOTE]
>
> This section was kindly implemented and documented by
> [@ndrean](https://github.com/ndrean). It is based on articles written by Sean Moriarty's published in the Dockyard's blog.
> Do check him out! π
We can leverage machine learning to greatly improve this search process:
we'll look for images whose captions _are close in terms of meaning_
to the search.
In this section, you'll learn how to perform
[**semantic search**](https://www.elastic.co/what-is/semantic-search)
with machine learning.
These techniques are widely used in search engines,
including in widespread tools like
[Elastic Search](https://www.elastic.co/).
### 0. Overview of the process
Let's go over the process in detail so we know what to expect.
As it stands, when images are uploaded and captioned,
the URL is saved, as well as the caption,
in our local database.
Here's an overview of how semantic search usually works
(which is what we'll exactly implement).
> Source: https://www.elastic.co/what-is/semantic-search
We will use the following toolchain:
#### 0.1 Audio transcription
We simply let the user start and stop the recording
by using a submit button in a form.
This can of course be greatly refined by using **Voice Detection**. You may find an example [here](https://github.com/ricky0123/vad).
Firstly, we will:
- record an audio with [MediaRecorder](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder) API.
- run a **Speech-To-Text** process to produce a text transcription
from the audio.
We will use the _pre-trained_ model [openai/whisper-small](https://huggingface.co/openai/whisper-small)
from
and use it with the help of the [Bumblebee.Audio.speech_to_text_whisper](https://hexdocs.pm/bumblebee/Bumblebee.Audio.html#speech_to_text_whisper/5) function.
We get an `Nx.Serving` that we will use to run this model with an input.
#### 0.2 Creating embeddings
We then want to find images whose captions
approximate this text in terms of meaning.
This transcription is the `"target text"`.
This is where **embeddings** come into play:
they are **vector representations** of certain inputs,
which in our case, are the text transcription of the audio file recorded by the user.
We encode each transcription as an embedding
and then use an approximation algorithm to find the closest neighbours.
Our next steps will be to prepare the
[symmetric semantic search](https://www.sbert.net/examples/applications/semantic-search/README.html#symmetric-vs-asymmetric-semantic-search).
We will use a
[transformer]() model,
more specifically the pre-trained [sBert](https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models)
system available in
[Huggingface](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
We transform a text into a vector with the sentence-transformer model [`sentence-transformers/paraphrase-MiniLM-L6-v2` ](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2).
> [!NOTE]
> You may find models in the [MTEB English leaderboard](https://huggingface.co/spaces/mteb/leaderboard). We looked for "small" models in terms of file size and dimensions. You may want to try and use [GTE small](https://huggingface.co/thenlper/gte-small).
We will run the model with the help of the
[Bumblebee.Text.TextEmbedding.text_embedding](https://hexdocs.pm/bumblebee/Bumblebee.Text.html#text_embedding/3) function.
This encoding is done for each image caption.
#### 0.3 Semantical search
At this point, we have:
- the embedding of the text transcription of the recording made by the user
(e.g `"a dog"`).
- all the embeddings of all the images in our "image bank".
To search for the images that are related to `"a dog"`,
we need to apply an algorithm that compares these embeddings!
For this, we will run a [**knn_neighbour**](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) search.
There are several ways to do this.
- we can use`pgvector` , a vector extension of Postgres. It is used to store vectors (the embeddings) and to run similarity searches.
With `pgvector`, we can run:
- a full exact search with the [cosine similarity](https://github.com/pgvector/pgvector#distances) operator `<=>`,
- or use an Approximate Nearest Neighbour seach with indexing algorithms. The extension proposes the [`IVFFLAT`](https://github.com/pgvector/pgvector#ivfflat) and the [`HNSWLIB`](https://github.com/pgvector/pgvector#hnsw) algorithms. You can find some explanations on both algorithms in https://tembo.io/blog/vector-indexes-in-pgvector and https://neon.tech/blog/understanding-vector-search-and-hnsw-index-with-pgvector.
> [!NOTE]
> Note that [Supabase](https://supabase.com/docs/guides/database/extensions/pgvector) can use the `pgvector` extension, and you can use [Supabase with Fly.io](https://fly.io/docs/reference/supabase/).
> [!WARNING]
> Note that you need to save the embeddings (as vectors) into the database, so the database will be intensively used. This may lead to scaling problems and potential race conditions.
- we can alternatively use the `hnswlib` library and its Elixir binding [HNSWLib](https://github.com/elixir-nx/hnswlib).
This "externalises" the ANN search from the database as it uses an in-memory file.
This file needs to be persisted on disk, thus at the expense of using the filesystem with again potential race conditions.
It works with an **[index struct](https://www.datastax.com/guides/what-is-a-vector-index)**: this struct will allow us to efficiently retrieve vector data.
**We will use this last option**,
mostly because we use Fly.io
and `pgvector` is hard to come by on this platform.
We will use a GenServer to wrap all the calls to `hnswlib` so every writes will be run synchronously.
Additionally, you don't rely on a framework that does the heavy lifting for you.
We're here to learn, aren't we? π
We will append incrementally the computed embedding from the captions into the Index.
We will get an indice which simply is the order of this embedding in the Index.
We then run a "knn_search" algorithm; the input will be the embedding of the audio transcript.
This algorithm will return the most relevant position(s) - `indices` -
among the `Index` indices that minimize the chosen distance between this input and the existing vectors.
This is where we'll need to save:
- whether the index,
- or the embedding
to look up for the corresponding image(s), depending upon if you append items one by one or by batch.
In our case, you will append items one by one so we will use the index to uniquely recover the nearest image whose caption is close semantically to our audio.
Do note that the measured distance is dependent on the [similarity metric](https://www.pinecone.io/learn/vector-similarity/)
used by the embedding model.
Because the "sentence-transformer" model we've chosen was trained with **_cosine_similarity_**,
this is what we'll use.
Bumblebee may have options to correctly use this metric, but we used a normalisation process which fits our needs.
### 1. Pre-requisites
We have already installed all the dependencies that we need.
> [!WARNING] > **You will also need to install [`ffmpeg`](https://ffmpeg.org/)**.
> Indeed, `Bumblebee` uses `ffmpeg` under the hood to process audio files into tensors,
> but it uses it _as an external dependency_.
And now we're ready to rock and roll! πΈ
### 2. Transcribe an audio recording
> **Source:**
We first need to capture the audio and upload it to the server.
The process is quite similar to the image upload, except that we
use a special Javascript hook to record the audio
and upload it to the Phoenix LiveView.
We use a `live_file_input` in a form to capture the audio and use the Javascript `MediaRecorder API`.
The Javascript code is triggered by an attached hook `Audio` declared in the HTML.
We also let the user listen to his audio by adding an [embedded audio element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio) `` in the HTML.
Its source is the audio blob as a URL object.
#### 1.1 Adding a loading spinner
We also add a spinner to display that the transcription process is running,
in the same way as we did for the captioning process.
To avoid code duplication, we introduce a Phoenix component "Spinner".
Create the file
`spinner.ex` in `lib/app_web/components/`
and create the `Spinner` component,
like so:
```elixir
# /lib/app_web/components/spinner.ex
defmodule AppWeb.Spinner do
use Phoenix.Component
attr :spin, :boolean, default: false
def spin(assigns) do
~H"""
"""
end
end
```
In `page_live_html.heex`, add the following snippet of code.
```html
# page_live.html.heex
<.live_file_input upload={@uploads.speech} class="hidden" />
Record
```
You can also use this component to display the spinner when the captioning task is running,
so this part of your code will shrink to:
```elixir
<%= if @upload_running? do %>
<% else %>
<%= if @label do %>
<%= @label %>
<% else %>
Waiting for image input.
<% end %>
<% end %>
```
#### 2.2 Defining `Javascript` hook
Currently, we provide a basic user experience:
we let the user click on a button to start and stop the recording.
When doing image captioning,
we carefully worked on the size of the image file
used for the captioning model
to optimize the app's latency.
In the same spirit,
we can downsize the original audio file
so it's easier for the model to process it.
This will have the benefit of less overhead in our application.
[Even though the `whisper` model does downsize the audio's sampling rate](https://github.com/openai/whisper/discussions/870),
we can do this on our client side to skip this step
and marginally reduce the overhead of running the model in our application.
The main parameters we're dealing with are:
- **lower sampling rate** (the higher the more accurate is the sound),
- **use mono** instead of stereo,
- and **the file type** (WAV, MP3)
Since most microphones on PC have a single channel (mono) and sample at `48kHz`,
we will focus on resampling to `16kHz`.
We will *not* make the conversion to mp3 here.
Next, we define the hook in a new JS file, located in the `assets/js` folder.
To resample to `16kHz`, we use an [AudioContext](https://developer.mozilla.org/en-US/docs/Web/API/AudioContext),
and pass the desired `sampleRate`.
We then use the method [decodeAudioData](https://developer.mozilla.org/en-US/docs/Web/API/BaseAudioContext/decodeAudioData)
which receives an [AudioBuffer](https://developer.mozilla.org/en-US/docs/Web/API/AudioBuffer).
We get one from the `Blob` method [arrayBuffer()](https://developer.mozilla.org/en-US/docs/Web/API/Blob/arrayBuffer).
The important part is the `Phoenix.js` function `upload`,
to which we pass an identifier `"speech"`:
this sends the data as `Blob` via a channel to the server.
We use an action button in the HTML,
and attach Javascript listeners to it on the `"click"`, `"dataavailable"` and `"stop"` events.
We also play with the CSS classes to modify the appearance of the action button when recording or not.
Navigate to the `assets` folder and run the following command.
We will use this to lower the sampling rate
and conver the recorded audio file.
```bash
npm add "audiobuffer-to-wav"
```
Create a file called `assets/js/micro.js`
and use the code below.
```js
// /assets/js/micro.js
import toWav from "audiobuffer-to-wav";
export default {
mounted() {
let mediaRecorder,
audioChunks = [];
// Defining the elements and styles to be used during recording
// and shown on the HTML.
const recordButton = document.getElementById("record"),
audioElement = document.getElementById("audio"),
text = document.getElementById("text"),
blue = ["bg-blue-500", "hover:bg-blue-700"],
pulseGreen = ["bg-green-500", "hover:bg-green-700", "animate-pulse"];
_this = this;
// Adding event listener for "click" event
recordButton.addEventListener("click", () => {
// Check if it's recording.
// If it is, we stop the record and update the elements.
if (mediaRecorder && mediaRecorder.state === "recording") {
mediaRecorder.stop();
text.textContent = "Record";
}
// Otherwise, it means the user wants to start recording.
else {
navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
// Instantiate MediaRecorder
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.start();
// And update the elements
recordButton.classList.remove(...blue);
recordButton.classList.add(...pulseGreen);
text.textContent = "Stop";
// Add "dataavailable" event handler
mediaRecorder.addEventListener("dataavailable", (event) => {
audioChunks.push(event.data);
});
// Add "stop" event handler for when the recording stops.
mediaRecorder.addEventListener("stop", () => {
const audioBlob = new Blob(audioChunks);
// update the source of the Audio tag for the user to listen to his audio
audioElement.src = URL.createObjectURL(audioBlob);
// create an AudioContext with a sampleRate of 16000
const audioContext = new AudioContext({ sampleRate: 16000 });
// async read the Blob as ArrayBuffer to feed the "decodeAudioData"
const arrayBuffer = await audioBlob.arrayBuffer();
// decodes the ArrayBuffer into the AudioContext format
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
// converts the AudioBuffer into a WAV format
const wavBuffer = toWav(audioBuffer);
// builds a Blob to pass to the Phoenix.JS.upload
const wavBlob = new Blob([wavBuffer], { type: "audio/wav" });
// upload to the server via a chanel with the built-in Phoenix.JS.upload
_this.upload("speech", [wavBlob]);
// close the MediaRecorder instance
mediaRecorder.stop();
// cleanups
audioChunks = [];
recordButton.classList.remove(...pulseGreen);
recordButton.classList.add(...blue);
});
});
}
});
},
};
```
Now let's import this file and declare our `hook` object in our `livesocket` object.
In our `assets/js/app.js` file, let's do:
```js
// /assets/js/app.js
...
import Audio from "./micro.js";
...
let liveSocket = new LiveSocket("/live", Socket, {
params: { _csrf_token: csrfToken },
hooks: { Audio },
});
```
#### 2.3 Handling audio upload in `LiveView`
We now need to add some server-side code.
The uploaded audio file will be saved on disk
as a temporary file in the `/priv/static/uploads` folder.
We will also make this file _unique_ every time a user records an audio.
We use an `Ecto.UUID` string to the file name and pass it into the Liveview socket.
The Liveview `mount/3` function returns a socket.
Let's update it
and pass extra arguments -
typically booleans for the UI such as the button disabling and the spinner -
as well as another `allow_upload/3` to handle the upload process of the audio file.
In `lib/app_web/live/page_live.ex`,
we change the code like so:
```elixir
#page_live.ex
@upload_dir Application.app_dir(:app, ["priv", "static", "uploads"])
@tmp_wav Path.expand("priv/static/uploads/tmp.wav")
def mount(_,_,socket) do
socket
|> assign(
...,
transcription: nil,
mic_off?: false,
audio_running?: false,
tmp_wav: @tmp_wav
)
|> allow_upload(:speech,
accept: :any,
auto_upload: true,
progress: &handle_progress/3,
max_entries: 1
)
|> allow_upload(:image_list, ...)
end
```
We then create a specific `handle_progress` for the `:speech` event
as we did with the `:image_list` event.
It will launch a task to run the **Automatic Speech Recognition model**
on this audio file.
We named the serving `"Whisper"`.
```elixir
def handle_progress(:speech, entry, %{assigns: assigns} = socket) when entry.done? do
tmp_wav =
socket
|> consume_uploaded_entry(entry, fn %{path: path} ->
tmp_wav = assigns.tmp_wav <> Ecto.UUID.generate() <> ".wav"
:ok = File.cp!(path, tmp_wav)
{:ok, tmp_wav}
end)
audio_task =
Task.Supervisor.async(
App.TaskSupervisor,
fn ->
Nx.Serving.batched_run(Whisper, {:file, @tmp_wav})
end
)
{:noreply, socket
|> assign(
audio_ref: audio_task.ref,
mic_off?: true,
audio_running?: true,
tmp_wav: tmp_wav,
)}
end
```
And that's it for the Liveview portion!
#### 2.4 Serving the `Whisper` model
Now that we are adding several models,
let's refactor our `models.ex` module
that manages the models.
Since we're dealing with multiple models,
we want our app to shutdown if there's any problem loading them.
We now add the model `Whisper` in the
`lib/app/application.ex`
so it's available throughout the application on runtime.
```elixir
# lib/app/application.ex
def check_models_on_startup do
App.Models.verify_and_download_models()
|> case do
{:error, msg} ->
Logger.error("β οΈ #{msg}")
System.stop(0)
:ok ->
:ok
end
end
def start(_type, _args) do
end
# model check-up
:ok = check_models_on_startup()
children = [
...,
# Nx serving for Speech-to-Text
{Nx.Serving,
serving:
if Application.get_env(:app, :use_test_models) == true do
App.Models.audio_serving_test()
else
App.Models.audio_serving()
end,
name: Whisper},
,...
]
...
```
As you can see, we're using a serving similar
to the captioning model we've implemented earlier.
For this to work, we need to make some changes to the
`models.ex` module.
Recall that this module simply manages the models that are
downloaded locally and used in our application.
To implement the functions above,
we change the `lib/app/models.ex` module so it looks like so.
```elixir
defmodule ModelInfo do
@moduledoc """
Information regarding the model being loaded.
It holds the name of the model repository and the directory it will be saved into.
It also has booleans to load each model parameter at will - this is because some models (like BLIP) require featurizer, tokenizations and generation configuration.
"""
defstruct [:name, :cache_path, :load_featurizer, :load_tokenizer, :load_generation_config]
end
defmodule App.Models do
@moduledoc """
Manages loading the modules and their location according to env.
"""
require Logger
# IMPORTANT: This should be the same directory as defined in the `Dockerfile`
# where the models will be downloaded into.
@models_folder_path Application.compile_env!(:app, :models_cache_dir)
# Embedding-------
@embedding_model %ModelInfo{
name: "sentence-transformers/paraphrase-MiniLM-L6-v2",
cache_path: Path.join(@models_folder_path, "paraphrase-MiniLM-L6-v2"),
load_featurizer: false,
load_tokenizer: true,
load_generation_config: true
}
# Captioning --
@captioning_test_model %ModelInfo{
name: "microsoft/resnet-50",
cache_path: Path.join(@models_folder_path, "resnet-50"),
load_featurizer: true
}
@captioning_prod_model %ModelInfo{
name: "Salesforce/blip-image-captioning-base",
cache_path: Path.join(@models_folder_path, "blip-image-captioning-base"),
load_featurizer: true,
load_tokenizer: true,
load_generation_config: true
}
# Audio transcription --
@audio_test_model %ModelInfo{
name: "openai/whisper-small",
cache_path: Path.join(@models_folder_path, "whisper-small"),
load_featurizer: true,
load_tokenizer: true,
load_generation_config: true
}
@audio_prod_model %ModelInfo{
name: "openai/whisper-small",
cache_path: Path.join(@models_folder_path, "whisper-small"),
load_featurizer: true,
load_tokenizer: true,
load_generation_config: true
}
def extract_captioning_test_label(result) do
%{predictions: [%{label: label}]} = result
label
end
def extract_captioning_prod_label(result) do
%{results: [%{text: label}]} = result
label
end
@doc """
Verifies and downloads the models according to configuration
and if they are already cached locally or not.
The models that are downloaded are hardcoded in this function.
"""
def verify_and_download_models() do
{
Application.get_env(:app, :force_models_download, false),
Application.get_env(:app, :use_test_models, false)
}
|> case do
{true, true} ->
# Delete any cached pre-existing models
File.rm_rf!(@models_folder_path)
with :ok <- download_model(@captioning_test_model),
:ok <- download_model(@embedding_model),
:ok <- download_model(@audio_test_model) do
:ok
else
{:error, msg} -> {:error, msg}
end
{true, false} ->
# Delete any cached pre-existing models
File.rm_rf!(@models_folder_path)
with :ok <- download_model(@captioning_prod_model),
:ok <- download_model(@audio_prod_model),
:ok <- download_model(@embedding_model) do
:ok
else
{:error, msg} -> {:error, msg}
end
{false, false} ->
# Check if the prod model cache directory exists or if it's not empty.
# If so, we download the prod models.
with :ok <- check_folder_and_download(@captioning_prod_model),
:ok <- check_folder_and_download(@audio_prod_model),
:ok <- check_folder_and_download(@embedding_model) do
:ok
else
{:error, msg} -> {:error, msg}
end
{false, true} ->
# Check if the test model cache directory exists or if it's not empty.
# If so, we download the test models.
with :ok <- check_folder_and_download(@captioning_test_model),
:ok <- check_folder_and_download(@audio_test_model),
:ok <- check_folder_and_download(@embedding_model) do
:ok
else
{:error, msg} -> {:error, msg}
end
end
end
@doc """
Serving function that serves the `Bumblebee` captioning model used throughout the app.
This function is meant to be called and served by `Nx` in `lib/app/application.ex`.
This assumes the models that are being used exist locally, in the @models_folder_path.
"""
def caption_serving do
load_offline_model(@captioning_prod_model)
|> then(fn response ->
case response do
{:ok, model} ->
%Nx.Serving{} =
Bumblebee.Vision.image_to_text(
model.model_info,
model.featurizer,
model.tokenizer,
model.generation_config,
compile: [batch_size: 1],
defn_options: [compiler: EXLA],
# needed to run on `Fly.io`
preallocate_params: true
)
{:error, msg} ->
{:error, msg}
end
end)
end
@doc """
Serving function that serves the `Bumblebee` audio transcription model used throughout the app.
"""
def audio_serving do
load_offline_model(@audio_prod_model)
|> then(fn response ->
case response do
{:ok, model} ->
%Nx.Serving{} =
Bumblebee.Audio.speech_to_text_whisper(
model.model_info,
model.featurizer,
model.tokenizer,
model.generation_config,
chunk_num_seconds: 30,
task: :transcribe,
defn_options: [compiler: EXLA],
preallocate_params: true
)
{:error, msg} ->
{:error, msg}
end
end)
end
@doc """
Serving function for tests only. It uses a test audio transcription model.
"""
def audio_serving_test do
load_offline_model(@audio_test_model)
|> then(fn response ->
case response do
{:ok, model} ->
%Nx.Serving{} =
Bumblebee.Audio.speech_to_text_whisper(
model.model_info,
model.featurizer,
model.tokenizer,
model.generation_config,
chunk_num_seconds: 30,
task: :transcribe,
defn_options: [compiler: EXLA],
preallocate_params: true
)
{:error, msg} ->
{:error, msg}
end
end)
end
@doc """
Serving function for tests only. It uses a test captioning model.
This function is meant to be called and served by `Nx` in `lib/app/application.ex`.
This assumes the models that are being used exist locally, in the @models_folder_path.
"""
def caption_serving_test do
load_offline_model(@captioning_test_model)
|> then(fn response ->
case response do
{:ok, model} ->
%Nx.Serving{} =
Bumblebee.Vision.image_classification(
model.model_info,
model.featurizer,
top_k: 1,
compile: [batch_size: 10],
defn_options: [compiler: EXLA],
# needed to run on `Fly.io`
preallocate_params: true
)
{:error, msg} ->
{:error, msg}
end
end)
end
# Loads the models from the cache folder.
# It will load the model and the respective the featurizer, tokenizer and generation config if needed,
# and return a map with all of these at the end.
@spec load_offline_model(map()) ::
{:ok, map()} | {:error, String.t()}
defp load_offline_model(model) do
Logger.info("βΉοΈ Loading #{model.name}...")
# Loading model
loading_settings = {:hf, model.name, cache_dir: model.cache_path, offline: true}
Bumblebee.load_model(loading_settings)
|> case do
{:ok, model_info} ->
info = %{model_info: model_info}
# Load featurizer, tokenizer and generation config if needed
info =
if Map.get(model, :load_featurizer) do
{:ok, featurizer} = Bumblebee.load_featurizer(loading_settings)
Map.put(info, :featurizer, featurizer)
else
info
end
info =
if Map.get(model, :load_tokenizer) do
{:ok, tokenizer} = Bumblebee.load_tokenizer(loading_settings)
Map.put(info, :tokenizer, tokenizer)
else
info
end
info =
if Map.get(model, :load_generation_config) do
{:ok, generation_config} =
Bumblebee.load_generation_config(loading_settings)
Map.put(info, :generation_config, generation_config)
else
info
end
# Return a map with the model and respective parameters.
{:ok, info}
{:error, msg} ->
{:error, msg}
end
end
# Downloads the pre-trained models according to a given %ModelInfo struct.
# It will load the model and the respective the featurizer, tokenizer and generation config if needed.
@spec download_model(map()) :: {:ok, map()} | {:error, binary()}
defp download_model(model) do
Logger.info("βΉοΈ Downloading #{model.name}...")
# Download model
downloading_settings = {:hf, model.name, cache_dir: model.cache_path}
# Download featurizer, tokenizer and generation config if needed
Bumblebee.load_model(downloading_settings)
|> case do
{:ok, _} ->
if Map.get(model, :load_featurizer) do
{:ok, _} = Bumblebee.load_featurizer(downloading_settings)
end
if Map.get(model, :load_tokenizer) do
{:ok, _} = Bumblebee.load_tokenizer(downloading_settings)
end
if Map.get(model, :load_generation_config) do
{:ok, _} = Bumblebee.load_generation_config(downloading_settings)
end
:ok
{:error, msg} ->
{:error, msg}
end
end
# Checks if the folder exists and downloads the model if it doesn't.
def check_folder_and_download(model) do
:ok = File.mkdir_p!(@models_folder_path)
model_location =
Path.join(model.cache_path, "huggingface")
if File.ls(model_location) == {:error, :enoent} or File.ls(model_location) == {:ok, []} do
download_model(model)
|> case do
:ok -> :ok
{:error, msg} -> {:error, msg}
end
else
Logger.info("βΉοΈ No download needed: #{model.name}")
:ok
end
end
end
```
That's a lot! But we just need to focus on some new parts we've added:
- we've created **`audio_serving_test/1`** and
**`audio_serving/1`**, our audio serving functions
that are used in the `application.ex` file.
- added `@audio_prod_model` and `@audio_test_model`,
the `Whisper` model definitions to be used to download the models locally.
- refactored the image captioning model definitions to be more clear.
Now we're successfully serving audio-to-text capabilities
in our application!
#### 2.5 Handling the model's response and updating elements in the view
We expect the response of this task to be
in the following form:
```elixir
%{
chunks:
[%{
text: "Hi there",
#^^^the text of our audio
start_timestamp_seconds: nil,
end_timestamp_seconds: nil
}]
}
```
We capture this response in a `handle_info` callback
where we simply prune the temporary audio file
and update the socket state with the result,
and update the booleans used for our UI
(the spinner element, the button availability and reset of the task once done).
```elixir
def handle_info({ref, %{chunks: [%{text: text}]} = _result}, %{assigns: assigns} = socket)
when assigns.audio_ref == ref do
Process.demonitor(ref, [:flush])
File.rm!(assigns.tmp_wav)
{:noreply,
assign(socket,
transcription: String.trim(text),
mic_off?: false,
audio_running?: false,
audio_ref: nil,
tmp_wav: @tmp_wav
)}
end
```
And that's it for this section!
Our application is now able to **record audio**
and **transcribe it**. π
### 3. Embeddings and semantic search
We want to encode every caption and the input text
into an embedding which is a vector of a specific vector space.
In other words, we encode a string into a list of numbers.
We chose the transformer `"sentence-transformers/paraphrase-MiniLM-L6-v2"` model.
This transformer uses a **`384`** dimensional vector space.
Since this transformer is trained with a `cosine metric`,
we embed the vector space of embeddings with the same distance.
You can read more about [cosine_similarity here](https://en.wikipedia.org/wiki/Cosine_similarity).
This model is loaded and served by an `Nx.Serving` started in the Application module like all other models.
#### 3.1 The `HNSWLib` Index (GenServer)
This library [`HNSWLib`](https://github.com/elixir-nx/hnswlib)
works with an **[index](https://www.datastax.com/guides/what-is-a-vector-index)**.
We instantiate the Index file in a `GenServer` which holds the index in the state.
We will use an Index file that is saved locally in our file system.
This file will be updated any time we append an embedding;
all the client calls and writes to the HNSWLib index are handled by the GenServer.
They will happen synchronously. We want to minimize the race conditions in case several users interact with the app.
This app is only meant to run **on a single node**.
It is started in the Application module (`application.ex`).
When the app starts, we either read or create this file. The file is saved in the "/priv/static/uploads" folder.
Because we are deploying with Fly.io, we need to persist the Index file in the database because the machine - thus its attached volume - is pruned when inactive.
It is crucial to save the correspondence between the `Image` table and the Index file to retrieve the correct images.
In simple terms, **the file in the `Index` table in the DB must correspond to the Index file in the system.**
We therefore disable a user from loading several times the same file as otherwise,
we would have several indexes for the same picture.
This is done through **SHA computation**.
Since computations using models is a long-run process,
and because several users may interact with the app,
we need several steps to ensure that the information is synchronized between the database and the index file.
We also endow the vector space with a `:cosine` pseudo-metric.
Add the following `GenServer` file:
it will load the Index file,
and also provide a client API to interact with the Index,
which is held in the state of the GenServer.
Again, this solution works for a single node _only_.
```elixir
defmodule App.KnnIndex do
use GenServer
@moduledoc """
A GenServer to load and handle the Index file for HNSWLib.
It loads the index from the FileSystem if existing or from the table HnswlibIndex.
It creates an new one if no Index file is found in the FileSystem
and if the table HnswlibIndex is empty.
It holds the index and the App.Image singleton table in the state.
"""
require Logger
@dim 384
@max_elements 200
@upload_dir Application.app_dir(:app, ["priv", "static", "uploads"])
@saved_index if Application.compile_env(:app, :knnindex_indices_test, false),
do: Path.join(@upload_dir, "indexes_test.bin"),
else: Path.join(@upload_dir, "indexes.bin")
# Client API ------------------
def start_link(args) do
:ok = File.mkdir_p!(@upload_dir)
GenServer.start_link(__MODULE__, args, name: __MODULE__)
end
def index_path do
@saved_index
end
def save_index_to_db do
GenServer.call(__MODULE__, :save_index_to_db)
end
def get_count do
GenServer.call(__MODULE__, :get_count)
end
def add_item(embedding) do
GenServer.call(__MODULE__, {:add_item, embedding})
end
def knn_search(input) do
GenServer.call(__MODULE__, {:knn_search, input})
end
def not_empty_index do
GenServer.call(__MODULE__, :not_empty)
end
# ---------------------------------------------------
@impl true
def init(args) do
# Trying to load the index file
index_path = Keyword.fetch!(args, :index)
space = Keyword.fetch!(args, :space)
case File.exists?(index_path) do
# If the index file doesn't exist, we try to load from the database.
false ->
{:ok, index, index_schema} =
App.HnswlibIndex.maybe_load_index_from_db(space, @dim, @max_elements)
{:ok, {index, index_schema, space}}
# If the index file does exist, we compare the one with teh table and check for incoherences.
true ->
Logger.info("βΉοΈ Index file found on disk. Let's compare it with the database...")
App.Repo.get_by(App.HnswlibIndex, id: 1)
|> case do
nil ->
{:stop,
{:error,
"Error comparing the index file with the one on the database. Incoherence on table."}}
schema ->
check_integrity(index_path, schema, space)
end
end
end
defp check_integrity(path, schema, space) do
# We check the count of the images in the database and the one in the index.
with db_count <-
App.Repo.all(App.Image) |> length(),
{:ok, index} <-
HNSWLib.Index.load_index(space, @dim, path),
{:ok, index_count} <-
HNSWLib.Index.get_current_count(index),
true <-
index_count == db_count do
Logger.info("βΉοΈ Integrity: β
")
{:ok, {index, schema, space}}
# If it fails, we return an error.
else
false ->
{:stop,
{:error, "Integrity error. The count of images from index differs from the database."}}
{:error, msg} ->
Logger.error("β οΈ #{msg}")
{:stop, {:error, msg}}
end
end
@impl true
def handle_call(:save_index_to_db, _, {index, index_schema, space} = state) do
# We read the index file and try to update the index on the table as well.
File.read(@saved_index)
|> case do
{:ok, file} ->
{:ok, updated_schema} =
index_schema
|> App.HnswlibIndex.changeset(%{file: file})
|> App.Repo.update()
{:reply, {:ok, updated_schema}, {index, updated_schema, space}}
{:error, msg} ->
{:reply, {:error, msg}, state}
end
end
def handle_call(:get_count, _, {index, _, _} = state) do
{:ok, count} = HNSWLib.Index.get_current_count(index)
{:reply, count, state}
end
def handle_call({:add_item, embedding}, _, {index, _, _} = state) do
# We add the new item to the index and update it.
with :ok <-
HNSWLib.Index.add_items(index, embedding),
{:ok, idx} <-
HNSWLib.Index.get_current_count(index),
:ok <-
HNSWLib.Index.save_index(index, @saved_index) do
{:reply, {:ok, idx}, state}
else
{:error, msg} ->
{:reply, {:error, msg}, state}
end
end
def handle_call({:knn_search, nil}, _, state) do
{:reply, {:error, "No index found"}, state}
end
def handle_call({:knn_search, input}, _, {index, _, _} = state) do
# We search for the nearest neighbors of the input embedding.
case HNSWLib.Index.knn_query(index, input, k: 1) do
{:ok, labels, _distances} ->
response =
labels[0]
|> Nx.to_flat_list()
|> hd()
|> then(fn idx ->
App.Repo.get_by(App.Image, %{idx: idx + 1})
end)
# TODO: add threshold on "distances"
{:reply, response, state}
{:error, msg} ->
{:reply, {:error, msg}, state}
end
end
def handle_call(:not_empty, _, {index, _, _} = state) do
case HNSWLib.Index.get_current_count(index) do
{:ok, 0} ->
Logger.warning("β οΈ Empty index.")
{:reply, :error, state}
{:ok, _} ->
{:reply, :ok, state}
end
end
end
```
Let's unpack a bit of what we are doing here.
- we first are **defining the module constants**.
Here, we add the dimensions of the embedding vector space
(these are dependent on the model you choose).
Check with the model you've used to tweak this settings optimally.
- define the upload directory where **the index file will be saved inside the filesystem**.
- when the GenServer is initialized (`init/1` function),
we perform several _integrity verifications_,
checking if both the `Index` file in the filesystem
and the file in the `Index` table
(from now on, this table will be called `HnswlibIndex`,
under the name of the same schema).
These validations essentially make sure the content
of both files are the same.
- the other functions provide a basic API for
callers to add items to the index file,
so it is saved.
#### 3.2 Saving the `HNSWLib` Index in the database
As you may have seen from the previous GenServer,
we are calling functions from a module called
`App.HnswlibIndex` that we have not yet created.
This module pertains to the **schema** that will hold
information of the `HNSWLib` table.
This table will only have a single row,
with the file contents.
As we've discussed earlier,
we will compare the Index file in this row
with the one in the filesystem
to check for any inconsistencies that may arise.
Let's implement this module now!
Inside `lib/app`, create a file called `hnswlib_index.ex`
and use the following code.
```elixir
defmodule App.HnswlibIndex do
use Ecto.Schema
alias App.HnswlibIndex
require Logger
@moduledoc """
Ecto schema to save the HNSWLib Index file into a singleton table
with utility functions
"""
schema "hnswlib_index" do
field(:file, :binary)
field(:lock_version, :integer, default: 1)
end
def changeset(struct \\ %__MODULE__{}, params \\ %{}) do
struct
|> Ecto.Changeset.cast(params, [:id, :file])
|> Ecto.Changeset.optimistic_lock(:lock_version)
|> Ecto.Changeset.validate_required([:id])
end
@doc """
Tries to load index from DB.
If the table is empty, it creates a new one.
If the table is not empty but there's no file, an index is created from scratch.
If there's one, we use it and load it to be used throughout the application.
"""
def maybe_load_index_from_db(space, dim, max_elements) do
# Check if the table has an entry
App.Repo.get_by(HnswlibIndex, id: 1)
|> case do
# If the table is empty
nil ->
Logger.info("βΉοΈ No index file found in DB. Creating new one...")
create(space, dim, max_elements)
# If the table is not empty but has no file
response when response.file == nil ->
Logger.info("βΉοΈ Empty index file in DB. Recreating one...")
# Purge the table and create a new file row in it
App.Repo.delete_all(App.HnswlibIndex)
create(space, dim, max_elements)
# If the table is not empty and has a file
index_db ->
Logger.info("βΉοΈ Index file found in DB. Loading it...")
# We get the path of the index
with path <- App.KnnIndex.index_path(),
# Save the file on disk
:ok <- File.write(path, index_db.file),
# And load it
{:ok, index} <- HNSWLib.Index.load_index(space, dim, path) do
{:ok, index, index_db}
end
end
end
defp create(space, dim, max_elements) do
# Inserting the row in the table
{:ok, schema} =
HnswlibIndex.changeset(%__MODULE__{}, %{id: 1})
|> App.Repo.insert()
# Creates index
{:ok, index} =
HNSWLib.Index.new(space, dim, max_elements)
# Builds index for testing only
if Application.get_env(:app, :use_test_models, false) do
empty_index =
Application.app_dir(:app, ["priv", "static", "uploads"])
|> Path.join("indexes_empty.bin")
HNSWLib.Index.save_index(index, empty_index)
end
{:ok, index, schema}
end
end
```
In this module:
- we are creating **two fields**: `lock_version`,
to simply check the version of the file;
and `file`,
the binary content of the index file.
- `lock_version` will be extremely useful to
perform [**optmistic locking**](https://stackoverflow.com/questions/129329/optimistic-vs-pessimistic-locking),
which is what we do in the `changeset/2` function.
This will allow us to prevent deadlocking
when two different people upload the same image at the same time,
and overcome any race condition that may occur.
This will maintain the data consistency in the Index file.
- `maybe_load_index_from_db/3` fetches the singleton row
on this table and checks if the file exists in the row.
If it doesn't, it creates a new one.
Otherwise, it just loads the existing one inside the row.
- `create/3` creates a new index file.
It's a private function that encapsulates creating
the Index file so it can be used in the singleton row
inside the table.
And that's it!
We've added additional code to conditionally create different indexes
according to the environment
(useful for testing),
but you can safely ignore those conditional calls
if you're not interested in testing
(though you should π).
#### 3.2 The embeding model
We provide a serving for the embedding model in the `App.Models` module.
It should look like this:
```elixir
#App.Models
@embedding_model %ModelInfo{
name: "sentence-transformers/paraphrase-MiniLM-L6-v2",
cache_path: Path.join(@models_folder_path, "paraphrase-MiniLM-L6-v2"),
load_featurizer: false,
load_tokenizer: true,
load_generation_config: true
}
def embedding() do
load_offline_model(@embedding_model)
|> then(fn response ->
case response do
{:ok, model} ->
# return n %Nx.Serving{} struct
%Nx.Serving{} =
Bumblebee.Text.TextEmbedding.text_embedding(
model.model_info,
model.tokenizer,
defn_options: [compiler: EXLA],
preallocate_params: true
)
{:error, msg} ->
{:error, msg}
end
end)
end
def verify_and_download_models() do
force_models_download = Application.get_env(:app, :force_models_download, false)
use_test_models = Application.get_env(:app, :use_test_models, false)
case {force_models_download, use_test_models} do
{true, true} ->
File.rm_rf!(@models_folder_path)
download_model(@captioning_test_model)
download_model(@audio_test_model)
{true, false} ->
File.rm_rf!(@models_folder_path)
download_model(@embedding_model)
^^^
download_model(@captioning_prod_model)
download_model(@audio_prod_model)
{false, false} ->
check_folder_and_download(@embedding_model)
^^
check_folder_and_download(@captioning_prod_model)
check_folder_and_download(@audio_prod_model)
{false, true} ->
check_folder_and_download(@captioning_test_model)
check_folder_and_download(@audio_test_model)
end
end
```
You then add the `Nx.Serving` for the embeddings:
```elixir
#application.ex
children = [
...,
{Nx.Serving,
serving: App.Models.embedding(),
name: Embedding,
batch_size: 5
},
...
]
```
Your `application.ex` file should look like so:
```elixir
defmodule App.Application do
# See https://hexdocs.pm/elixir/Application.html
# for more information on OTP Applications
@moduledoc false
require Logger
use Application
@upload_dir Application.app_dir(:app, ["priv", "static", "uploads"])
@saved_index if Application.compile_env(:app, :knnindex_indices_test, false),
do: Path.join(@upload_dir, "indexes_test.bin"),
else: Path.join(@upload_dir, "indexes.bin")
def check_models_on_startup do
App.Models.verify_and_download_models()
|> case do
{:error, msg} ->
Logger.error("β οΈ #{msg}")
System.stop(0)
:ok ->
Logger.info("βΉοΈ Models: β
")
:ok
end
end
@impl true
def start(_type, _args) do
:ok = check_models_on_startup()
children = [
# Start the Telemetry supervisor
AppWeb.Telemetry,
# Setup DB
App.Repo,
# Start the PubSub system
{Phoenix.PubSub, name: App.PubSub},
# Nx serving for the embedding
{Nx.Serving, serving: App.Models.embedding(), name: Embedding, batch_size: 1},
# Nx serving for Speech-to-Text
{Nx.Serving,
serving:
if Application.get_env(:app, :use_test_models) == true do
App.Models.audio_serving_test()
else
App.Models.audio_serving()
end,
name: Whisper},
# Nx serving for image classifier
{Nx.Serving,
serving:
if Application.get_env(:app, :use_test_models) == true do
App.Models.caption_serving_test()
else
App.Models.caption_serving()
end,
name: ImageClassifier},
{GenMagic.Server, name: :gen_magic},
# Adding a supervisor
{Task.Supervisor, name: App.TaskSupervisor},
# Start the Endpoint (http/https)
AppWeb.Endpoint
# Start a worker by calling: App.Worker.start_link(arg)
# {App.Worker, arg}
]
# We are starting the HNSWLib Index GenServer only during testing.
# Because this GenServer needs the database to be seeded first,
# we only add it when we're not testing.
# When testing, you need to spawn this process manually (it is done in the test_helper.exs file).
children =
if Application.get_env(:app, :start_genserver, true) == true do
Enum.concat(children, [{App.KnnIndex, [space: :cosine, index: @saved_index]}])
else
children
end
# See https://hexdocs.pm/elixir/Supervisor.html
# for other strategies and supported options
opts = [strategy: :one_for_one, name: App.Supervisor]
Supervisor.start_link(children, opts)
end
# Tell Phoenix to update the endpoint configuration
# whenever the application is updated.
@impl true
def config_change(changed, _new, removed) do
AppWeb.Endpoint.config_change(changed, removed)
:ok
end
end
```
> [!NOTE]
>
> We have added a few alterations to how the supervision tree
> in `application.ex` is initialized.
> This is because we _test our code_,
> so that's why you see some of these changes above.
>
> If you don't want to change test the code,
> you can ignore the conditional changes that are made
> to the supervision tree according to the environment
> (which we do to check if the code is being tested or not).
### 4. Using the Index and embeddings
In this section, we'll go over how to use the Index
and the embeddings and tie everything together to
have a working application π.
If you want to better understand embeddings and
how to use `HNSWLib`,
the math behind it and see a working example
of running an embedding model,
you can check the next section.
However, _it is entirely optional_
and not necessary for our app.
#### 4.0 Check the folder "hnswlib"
For a working example on how to use the index in `hnswlib`,
you can run the ".exs" file there.
#### 4.1 Computing the embeddings in our app
```elixir
@tmp_wav Path.expand("priv/static/uploads/tmp.wav")
def mount(_, _, socket) do
{:ok,
socket
|> assign(
...,
# Related to the Audio
transcription: nil,
mic_off?: false,
audio_running?: false,
audio_search_result: nil,
tmp_wav: @tmp_wav,
)
|> allow_upload(:speech,...)
[...]
}
end
```
Recall that every time you upload an image,
you get back a URL from our bucket
and you compute a caption as a string.
We will now compute an embedding from this string
and save it in the Index.
This is done in the `handle_info` callback.
Update the Liveview `handle_info` callback where we handle the captioning results:
```elixir
def handle_info({ref, result}, %{assigns: assigns} = socket) do
# Flush async call
Process.demonitor(ref, [:flush])
cond do
# If the upload task has finished executing,
# we update the socket assigns.
Map.get(assigns, :task_ref) == ref ->
image =
%{
url: assigns.image_info.url,
width: assigns.image_info.width,
height: assigns.image_info.height,
description: label
}
with %{embedding: data} <-
Nx.Serving.batched_run(Embedding, label),
# compute a normed embedding (cosine case only) on the text result
normed_data <-
Nx.divide(data, Nx.LinAlg.norm(data)),
{:check_used, {:ok, pending_image}} <-
{:check_used, App.Image.check_before_append_to_index(image.sha1)} do
{:ok, idx} =
App.KnnIndex.add_item(normed_data) do
# save the App.Image to the DB
Map.merge(image, %{idx: idx, caption: label})
|> App.Image.insert()
{:noreply,
socket
|> assign(
upload_running?: false,
task_ref: nil,
label: label
)
}
else
{:error, msg} ->
{:noreply,
socket
|> put_flash(:error, msg)
|> assign(
upload_running?: false,
task_ref: nil,
label: nil
)
}
end
[...]
end
end
```
Every time we produce an audio file, we transcribe it into a text.
We then compute the embedding of the audio input transcription and run an ANN search.
The last step should return a (possibly) populated `%App.Image{}` struct with a look-up in the database.
We then update the `"audio_search_result"` assign with it and display the transcription.
Modify the following handler:
```elixir
def handle_info({ref, %{chunks: [%{text: text}]} = result}, %{assigns: assigns} = socket)
when assigns.audio_ref == ref do
Process.demonitor(ref, [:flush])
File.rm!(@tmp_wav)
# compute an normed embedding (cosine case only) on the text result
# and returns an App.Image{} as the result of a "knn_search"
with %{embedding: input_embedding} <-
Nx.Serving.batched_run(Embedding, text),
normed_input_embedding <-
Nx.divide(input_embedding, Nx.LinAlg.norm(input_embedding)),
{:not_empty_index, :ok} <-
{:not_empty_index, App.KnnIndex.not_empty_index()},
# {:not_empty_index, App.HnswlibIndex.not_empty_index(index)},
%App.Image{} = result <-
App.KnnIndex.knn_search(normed_input_embedding) do
{:noreply,
assign(socket,
transcription: String.trim(text),
mic_off?: false,
audio_running?: false,
audio_search_result: result,
audio_ref: nil,
tmp_wav: @tmp_wav
)}
# record without entries
{:not_empty_index, :error} ->
{:noreply,
assign(socket,
mic_off?: false,
audio_search_result: nil,
audio_running?: false,
audio_ref: nil,
tmp_wav: @tmp_wav
)}
nil ->
{:noreply,
assign(socket,
transcription: String.trim(text),
mic_off?: false,
audio_search_result: nil,
audio_running?: false,
audio_ref: nil,
tmp_wav: @tmp_wav
)}
end
end
```
We next come back to the `knn_search` function we defined in the "KnnIndex" GenServer.
The "approximate nearest neighbour" search function uses the function `HNSWLib.Index.knn_query/3`.
It returns a tuple `{:ok, indices, distances}` where "indices" and "distances" are lists.
The length is the number of neighbours you want to find parametrized by the `k` parameter.
With `k=1`, we ask for a single neighbour.
> [!NOTE]
>
> You may further use a cut-off distance to exclude responses that might not be meaningful.
We will now display the found image with the URL field of the `%App.Image{}` struct.
Add this to `"page_live.html.heex"`:
```html
```
##### 4.1.1 Changing the `Image` schema so it's embeddable
Now we'll save the index found.
Let's add a column to the `Image` table.
To do this, run a `mix` task to generate a timestamped file.
```bash
mix ecto.gen.migration add_idx_to_images
```
In the `"/priv/repo"` folder, open the newly created file and add:
```elixir
defmodule App.Repo.Migrations.AddIdxToImages do
use Ecto.Migration
def change do
alter table(:images) do
add(:idx, :integer, default: 0)
add(:sha1, :string)
end
end
end
```
and run the migration
by running `mix ecto.migrate`.
Modify the `App.Image` struct and the changeset:
```elixir
@primary_key {:id, :id, autogenerate: true}
schema "images" do
field(:description, :string)
field(:width, :integer)
field(:url, :string)
field(:height, :integer)
field(:idx, :integer)
field(:sha1, :string)
timestamps(type: :utc_datetime)
end
def changeset(image, params \\ %{}) do
image
|> Ecto.Changeset.cast(params, [:url, :description, :width, :height, :idx, :sha1])
|> Ecto.Changeset.validate_required([:width, :height])
|> Ecto.Changeset.unique_constraint(:sha1, name: :images_sha1_index)
|> Ecto.Changeset.unique_constraint(:idx, name: :images_idx_index)
end
```
We've added the fields `idx` and `sha1` to the image schema.
The former pertains to the index of the image
within the `HNSWLIB` index file,
so we can look for the image.
The latter pertains to the `sha1` representation of the image.
This will allow us to check if two images are the same,
so we can avoid adding duplicate images
and save some throughput in our application.
In our `changeset/2` function,
we've fundamentally added two `unique_constraint/3` functions
to check for the uniqueness of the newly added
`idx` and `sha1` function.
These are enforced at the database level so we don't have
duplicated images.
In addition to these changes,
we are going to need functions to
**calculate the `sha1` of the image**.
Add the following functions to the same file.
```elixir
def calc_sha1(file_binary) do
:crypto.hash(:sha, file_binary)
|> Base.encode16()
end
def check_sha1(sha1) when is_binary(sha1) do
App.Repo.get_by(App.Image, %{sha1: sha1})
|> case do
nil ->
nil
%App.Image{} = image ->
{:ok, image}
end
end
```
- `calc_sha1/1` uses the `:crypto` package to hash the file binary
and encode it.
- `check_sha1/1` fetches an image according to a given `sha1` code
and returns the result.
And that's all we need to deal with our images!
##### 4.1.2 Using embeddings in semantic search
Now we have
- all the embedding models ready to be used,
- our Index file correctly created and maintained through
filesystem and in the database in the `hnswlib_index` schema,
- the needed `sha1` functions to check for duplicated images.
It's time to bring everything together and use all of these tools
to implement semantic search into our application.
We are going to be working inside `lib/app_web/live/page_live.ex` from now on.
###### 4.1.2.1 Mount socket assigns
First, we are going to update our socket assigns on `mount/3`.
```elixir
@image_width 640
@accepted_mime ~w(image/jpeg image/jpg image/png image/webp)
@tmp_wav Path.expand("priv/static/uploads/tmp.wav")
@impl true
def mount(_params, _session, socket) do
{:ok,
socket
|> assign(
# Related to the file uploaded by the user
label: nil,
upload_running?: false,
task_ref: nil,
image_info: nil,
image_preview_base64: nil,
# Related to the list of image examples
example_list_tasks: [],
example_list: [],
display_list?: false,
# Related to the Audio
transcription: nil,
mic_off?: false,
audio_running?: false,
audio_search_result: nil,
tmp_wav: @tmp_wav
)
|> allow_upload(:image_list,
accept: ~w(image/*),
auto_upload: true,
progress: &handle_progress/3,
max_entries: 1,
chunk_size: 64_000,
max_file_size: 5_000_000
)
|> allow_upload(:speech,
accept: :any,
auto_upload: true,
progress: &handle_progress/3,
max_entries: 1
)}
end
```
To reiterate:
- we've added a few fields related to audio.
- `transcription` will pertain to the result of the audio transcription
that will occur after transcribing the audio from the person.
- `mic_off?` is simply a toggle to visually show the person
whether the microphone is recording or not.
- `audio_running?` is a boolean to show the person
if the audio transcription and semantic search are occurring (loading).
- `audio_search_result` is the result of the image
that is closest semantically to the image's label from the
transcribed audio.
- `tmp_wav` is the path of the temporary audio file
that is saved in the filesystem while the audio is being transcribed.
- additionally, we also have added
`allow_upload/3` pertaining to the audio upload
(it is tagged as `:speech` and is being handled
in the same function as the upload `:image_list`).
These are the socket assigns
that will allow us to dynamically update the person using our app
with what the app is doing.
###### 4.1.2.2 Consuming image uploads
As you can see, we are using `handle_progress/3`
with `allow_upload/3`.
As we know, `handle_progress/3` is called whenever an upload happens
(whether an image or recording of the person's voice).
We define two different declarations for how we want to
process `:image` uploads and `:speech` uploads.
Let's start with the first one.
We have added `sha1` and `idx` as fields to our image schema.
Therefore, we are going to need to make some changes
to the `handle_progress/3` of the `:image_list`.
Change it like so:
```elixir
def handle_progress(:image_list, entry, socket) when entry.done? do
# We consume the entry only if the entry is done uploading from the image
# and if consuming the entry was successful.
consume_uploaded_entry(socket, entry, fn %{path: path} ->
with {:magic, {:ok, %{mime_type: mime}}} <- {:magic, magic_check(path)},
# Check if file can be properly read
{:read, {:ok, file_binary}} <- {:read, File.read(path)},
# Check the image info
{:image_info, {mimetype, width, height, _variant}} <-
{:image_info, ExImageInfo.info(file_binary)},
# Check mime type
{:check_mime, :ok} <- {:check_mime, check_mime(mime, mimetype)},
# Get SHA1 code from the image and check it
sha1 = App.Image.calc_sha1(file_binary),
{:sha_check, nil} <- {:sha_check, App.Image.check_sha1(sha1)},
# Get image and resize
{:ok, thumbnail_vimage} <- Vops.thumbnail(path, @image_width, size: :VIPS_SIZE_DOWN),
# Pre-process the image as tensor
{:pre_process, {:ok, tensor}} <- {:pre_process, pre_process_image(thumbnail_vimage)} do
# Create image info to be saved as partial image
image_info = %{
mimetype: mimetype,
width: width,
height: height,
sha1: sha1,
description: nil,
url: nil,
# set a random big int to the "idx" field
idx: :rand.uniform(1_000_000_000_000) * 1_000
}
# Save partial image
App.Image.insert(image_info)
|> case do
{:ok, _} ->
image_info =
Map.merge(image_info, %{
file_binary: file_binary
})
{:ok, %{tensor: tensor, image_info: image_info, path: path}}
{:error, changeset} ->
{:error, changeset.errors}
end
|> handle_upload()
else
{:magic, {:error, msg}} -> {:postpone, %{error: msg}}
{:read, msg} -> {:postpone, %{error: inspect(msg)}}
{:image_info, nil} -> {:postpone, %{error: "image_info error"}}
{:check_mime, :error} -> {:postpone, %{error: "Bad mime type"}}
{:sha_check, {:ok, %App.Image{}}} -> {:postpone, %{error: "Image already uploaded"}}
{:pre_process, {:error, _msg}} -> {:postpone, %{error: "pre_processing error"}}
{:error, reason} -> {:postpone, %{error: inspect(reason)}}
end
end)
|> case do
# If consuming the entry was successful, we spawn a task to classify the image
# and update the socket assigns
%{tensor: tensor, image_info: image_info} ->
task =
Task.Supervisor.async(App.TaskSupervisor, fn ->
Nx.Serving.batched_run(ImageClassifier, tensor)
end)
# Encode the image to base64
base64 = "data:image/png;base64, " <> Base.encode64(image_info.file_binary)
{:noreply,
assign(socket,
upload_running?: true,
task_ref: task.ref,
image_preview_base64: base64,
image_info: image_info
)}
# Otherwise, if there was an error uploading the image, we log the error and show it to the person.
%{error: error} ->
Logger.warning("β οΈ Error uploading image. #{inspect(error)}")
{:noreply, push_event(socket, "toast", %{message: "Image couldn't be uploaded to S3.\n#{error}"})}
end
end
```
Let's go over these changes.
Some of these is code that has been written prior,
but for clarification, we'll go over them again.
- we use `consume_uploaded_entry/3` to consume the image
that the person uploads.
To consume the image successfully,
the image goes through an array of validations.
- we use `magic_check/1` to check the MIME type of the image validity.
- we read the contents of the image using `ExImageInfo.info/1`.
- we check if the MIME type is valid using `check_mime/2`.
- we calculate the `sha1` with the `App.Image.calc_sha1/1` function
we've developed earlier.
- we resize the image and scale it down to the same width
as the images that are trained using the image captioning model we've chosen
(to yield better results and to save memory bandwidth).
We use `Vix.Operations.thumbnail/3` to resize the image.
- finally, we convert the resized image to a tensor using
`pre_process_image/1` so it can be consumed by our image captioning model.
- after this series of validations,
we use the image info we've obtained earlier to
**create an "early-save" of the image**.
With this, we are saving the image and associating it with
the `sha1` that was retrieved from the image contents.
We are doing this "partial image saving"
in case two identical images are being uploaded at the same time.
Because we are enforcing `sha1` to be unique at the database level,
this race condition is solved by the database optimistically.
- afterwards, we call `handle_upload/0`.
This function will upload the image to the `S3` bucket.
We are going to implement this function in just a second π.
- if the upload is successful,
using the tensor and the image information from the previous steps,
we spawn the async task to run the model.
This step should be familiar to you
since we've already implemented this.
Finally, we update the socket assigns accordingly.
- we handle all possible errors in the `else` statement of the
`with` flow control statement before the image is uploaded.
Hopefully, this demystifies some of the code we've just implemented!
Because we are using `handle_upload/0` in this function
to upload the image to our `S3` bucket,
let's do it right now!
```elixir
def handle_upload({:ok, %{path: path, tensor: tensor, image_info: image_info} = map})
when is_map(map) do
# Upload the image to S3
Image.upload_image_to_s3(path, image_info.mimetype)
|> case do
# If the upload is successful, we update the socket assigns with the image info
{:ok, url} ->
image_info =
struct(
%ImageInfo{},
Map.merge(image_info, %{url: url})
)
{:ok, %{tensor: tensor, image_info: image_info}}
# If S3 upload fails, we return error
{:error, reason} ->
Logger.warning("β οΈ Error uploading image: #{inspect(reason)}")
{:postpone, %{error: "Bucket error"}}
end
end
def handle_upload({:error, error}) do
Logger.warning("β οΈ Error creating partial image: #{inspect(error)}")
{:postpone, %{error: "Error creating partial image"}}
end
```
This function is fairly easy to understand.
We upload the image by calling `Image.upload_image_to_s3/2` and,
if successful,
we add the returning URL to the image struct.
Otherwise, we handle the error and return it.
After this small detour,
let's implement the `handle_progress/3`
for the **`:speech` uploads**,
that is, the audio the person records.
```elixir
def handle_progress(:speech, entry, %{assigns: assigns} = socket) when entry.done? do
# We consume the audio file
tmp_wav =
socket
|> consume_uploaded_entry(entry, fn %{path: path} ->
tmp_wav = assigns.tmp_wav <> Ecto.UUID.generate() <> ".wav"
:ok = File.cp!(path, tmp_wav)
{:ok, tmp_wav}
end)
# After consuming the audio file, we spawn a task to transcribe the audio
audio_task =
Task.Supervisor.async(
App.TaskSupervisor,
fn ->
Nx.Serving.batched_run(Whisper, {:file, tmp_wav})
end
)
# Update the socket assigns
{:noreply,
assign(socket,
audio_ref: audio_task.ref,
mic_off?: true,
tmp_wav: tmp_wav,
audio_running?: true,
audio_search_result: nil,
transcription: nil
)}
end
```
As we know, this function is called after the upload is completed.
In the case of audio uploads,
the hook is called by the person recording their voice
in `assets/js/app.js`.
Similarly to the `handle_progress/3` function of the `:image_list` uploads,
we also use `consume_uploaded_entry/3` to consume the audio file.
- we consume the audio file and save it in our filesystem
as a `.wav` file.
- we spawn the async task and use the `whisper` audio transcription model
with the audio file we've just saved.
- we update the socket assigns accordingly.
Pretty simple, right?
###### 4.1.2.3 Using the embeddings to semantically search images
In this section, we'll finally use
our embedding model and semantically search for our images!
As you've seen in the previous section,
we've spawned the task to transcribe the audio into the `whipser` model.
Now we need a handler!
For this scenario,
add the following function.
```elixir
@impl true
def handle_info({ref, %{chunks: [%{text: text}]} = _result}, %{assigns: assigns} = socket)
when assigns.audio_ref == ref do
Process.demonitor(ref, [:flush])
File.rm!(assigns.tmp_wav)
# Compute an normed embedding (cosine case only) on the text result
# and returns an App.Image{} as the result of a "knn_search"
with {:not_empty_index, :ok} <-
{:not_empty_index, App.KnnIndex.not_empty_index()},
%{embedding: input_embedding} <-
Nx.Serving.batched_run(Embedding, text),
%Nx.Tensor{} = normed_input_embedding <-
Nx.divide(input_embedding, Nx.LinAlg.norm(input_embedding)),
%App.Image{} = result <-
App.KnnIndex.knn_search(normed_input_embedding) do
{:noreply,
assign(socket,
transcription: String.trim(text),
mic_off?: false,
audio_running?: false,
audio_search_result: result,
audio_ref: nil,
tmp_wav: @tmp_wav
)}
else
# Stop transcription if no entries in the Index
{:not_empty_index, :error} ->
{:noreply,
socket
|> push_event("toast", %{message: "No images yet"})
|> assign(
mic_off?: false,
transcription: "!! The image bank is empty. Please upload some !!",
audio_search_result: nil,
audio_running?: false,
audio_ref: nil,
tmp_wav: @tmp_wav
)}
nil ->
{:noreply,
assign(socket,
transcription: String.trim(text),
mic_off?: false,
audio_search_result: nil,
audio_running?: false,
audio_ref: nil,
tmp_wav: @tmp_wav
)}
end
end
```
Let's break down this function:
- given the **recording text transcription**:
- we check if the Index file holding is _not empty_.
- we use the text transcription and run it
**through the embedding model** and get its result.
- with the embedding we've received from the model,
we **normalize it**.
- with the normalized embedding,
we \*\*run it through a `knn search`.
For this, we call the `App.KnnIndex.knn_search/1` function
we've defined in the `App.KnnIndex` GenServer
we've implemented earlier on.
- the `knn search` returns the closest semantical image
(through the image caption)
from the audio transcription.
- upon the success of this process, we update the socket assigns.
- otherwise, we handle each error case accordingly
and update the socket assigns.
And that's it!
We just add to sequentially call the functions
that we've implemented prior!
###### 4.1.2.4 Creating embeddings when uploading images
Now that we have _used_ the embeddings,
there's one thing we forgot:
**we forgot to keep track of the embeddings of each image that is uploaded**.
These embeddings are saved in the Index file.
To fix this, we need to create an embedding of the image
after it is uploaded and captioned.
Head over to the `handle_info/2` pertaining to the image captioning,
and change it to the following piece of code:
```elixir
def handle_info({ref, result}, %{assigns: assigns} = socket) do
# Flush async call
Process.demonitor(ref, [:flush])
# You need to change how you destructure the output of the model depending
# on the model you've chosen for `prod` and `test` envs on `models.ex`.)
label =
case Application.get_env(:app, :use_test_models, false) do
true ->
App.Models.extract_captioning_test_label(result)
# coveralls-ignore-start
false ->
App.Models.extract_captioning_prod_label(result)
# coveralls-ignore-stop
end
%{image_info: image_info} = assigns
cond do
# If the upload task has finished executing, we run the embedding model on the image
Map.get(assigns, :task_ref) == ref ->
image =
%{
url: image_info.url,
width: image_info.width,
height: image_info.height,
description: label,
sha1: image_info.sha1
}
# Create embedding task
with %{embedding: data} <- Nx.Serving.batched_run(Embedding, label),
# Compute a normed embedding (cosine case only) on the text result
normed_data <- Nx.divide(data, Nx.LinAlg.norm(data)),
# Check the SHA1 of the image
{:check_used, {:ok, pending_image}} <-
{:check_used, App.Image.check_sha1(image.sha1)} do
Ecto.Multi.new()
# Save updated Image to DB
|> Ecto.Multi.run(:update_image, fn _, _ ->
idx = App.KnnIndex.get_count() + 1
Ecto.Changeset.change(pending_image, %{
idx: idx,
description: image.description,
url: image.url
})
|> App.Repo.update()
end)
# Save Index file to DB
|> Ecto.Multi.run(:save_index, fn _, _ ->
{:ok, _idx} = App.KnnIndex.add_item(normed_data)
App.KnnIndex.save_index_to_db()
end)
|> App.Repo.transaction()
|> case do
{:error, :update_image, _changeset, _} ->
{:noreply,
socket
|> push_event("toast", %{message: "Invalid entry"})
|> assign(
upload_running?: false,
task_ref: nil,
label: nil
)}
{:error, :save_index, _, _} ->
{:noreply,
socket
|> push_event("toast", %{message: "Please retry"})
|> assign(
upload_running?: false,
task_ref: nil,
label: nil
)}
{:ok, _} ->
{:noreply,
socket
|> assign(
upload_running?: false,
task_ref: nil,
label: label
)}
end
else
{:check_used, nil} ->
{:noreply,
socket
|> push_event("toast", %{message: "Race condition"})
|> assign(
upload_running?: false,
task_ref: nil,
label: nil
)}
{:error, msg} ->
{:noreply,
socket
|> push_event("toast", %{message: msg})
|> assign(
upload_running?: false,
task_ref: nil,
label: nil
)}
end
# If the example task has finished executing, we upload the socket assigns.
img = Map.get(assigns, :example_list_tasks) |> Enum.find(&(&1.ref == ref)) ->
# Update the element in the `example_list` enum to turn "predicting?" to `false`
updated_example_list = update_example_list(assigns, img, label)
{:noreply,
assign(socket,
example_list: updated_example_list,
upload_running?: false,
display_list?: true
)}
end
end
```
Let's go over the flow of this function:
- we extract the captioning label from the result of the image captioning model.
This code is the same as it was before.
- afterwards, we get the label
and **feed it into the embedding model**.
- the embedding model yields the embedding,
_we normalize it_ and **check if the `sha1` code of the image is already being used**.
- if these three processes occur successfuly,
we perform a _database transaction_ where
we **save the updated image to the database**,
**update the Index file count (we increment it)**
and **save the index file to the database**.
- we finally update the socket assigns accordingly.
- if any of the previous calls fail,
we handle these error scenarios
and update the socket assigns.
And that's it!
Our app is fully loaded with semantic search capabilities! π
###### 4.1.2.5 Update the LiveView view
All that's left is updating our view.
We are going to add basic elements
to make this transition as smooth as possible.
Head over to `lib/app_web/live/page_live.html.heex`
and update it as so:
```html
π₯ LiveView
+
π Bumblebee
Caption your image!
Upload your own image (up to 5MB) and perform image captioning with
Elixir
!
Powered with
HuggingFaceπ€
transformer models, you can run this project locally and perform
machine learning tasks with a handful lines of code.
<%= if @image_preview_base64 do %>
<%= if not @upload_running? do %>
<.live_file_input upload={@uploads.image_list} class="hidden" />
<% end %>
<% else %>
<.live_file_input upload={@uploads.image_list}
class="hidden" /> Upload
or drag and drop
PNG, JPG, GIF up to 5MB
<% end %>
<%= for entry <- @uploads.image_list.entries do %>
<%= for err <- upload_errors(@uploads.image_list, entry) do %>
<%= error_to_string(err) %>
<% end %>
<% end %>
Description:
<%= if @label do %>
<%= @label %>
<% else %>
Waiting for image input.
<% end %>
Semantic search using an audio
Please record a phrase. You can listen to your audio. It will be
transcripted automatically into a text and appear below. The semantic
search for matching images will then run automatically and the found
image appear below.
<.live_file_input upload={@uploads.speech} class="hidden" />
Record
Transcription:
<%= if @audio_running? do %>
<% else %>
<%= if @transcription do %>
<%= @transcription %>
<% else %>
Waiting for audio input.
<% end %>
<% end %>
Examples
<%= for example_img <- @example_list do %>
<%= if example_img.predicting? == true do %>
Loading...
<% else %>
<%= example_img.label %>
<% end %>
```
As you may have noticed,
we've made some changes to the Audio portion of the HTML.
- we check if the `@transcription` assign exists.
If so, we display the text to the person.
- we check if the `@audio_search_result` assign is not `nil`.
If that's the case, the image that is semantically closest
to the audio transcription is shown to the person.
And that's it!
We are simply showing the person
the results.
And with that, you've successfully added
semantic search into the application!
Pat yourself on the back! π
You've expanded your knowledge in key areas of machine learning
and artificial intelligence,
that is increasingly becoming more prevalent!
### 5. Tweaking our UI
Now that we have all the features we want in our application,
let's make it prettier!
As it stands, it's responsive enough.
But we can always make it better!
We're going to show you the changes you're going to need to make
and then explain it to you what it means!
Head over to `lib/app_web/live/page_live.html.heex` and change it like so:
```html
π₯ LiveView
+
π Bumblebee
Upload
Search
Caption your image!
Powered with
HuggingFaceπ€
transformer models,
you can run this project locally and perform machine learning tasks with a handful lines of code.
<%= if @image_preview_base64 do %>
<%= if not @upload_running? do %>
<.live_file_input upload={@uploads.image_list} class="hidden" />
<% end %>
<% else %>
<.live_file_input upload={@uploads.image_list} class="hidden" /> Upload
or drag and drop
PNG, JPG, GIF up to 5MB
<% end %>
<%= for entry <- @uploads.image_list.entries do %>
<%= for err <- upload_errors(@uploads.image_list, entry) do %>
<%= error_to_string(err) %>
<% end %>
<% end %>
Description:
<%= if @upload_running? do %>
<% else %>
<%= if @label do %>
<%= @label %>
<% else %>
Waiting for image input.
<% end %>
<% end %>
<%= if @display_list? do %>
Examples
<%= for example_img <- @example_list do %>
<%= if example_img.predicting? == true do %>
Loading...
<% else %>
<%= example_img.label %>
<% end %>
<% end %>
<% end %>
...or search it!
Record a phrase or some key words.
We'll detect them and semantically search it in our database of images!
After recording your audio, you can listen to it. It will be transcripted automatically into text and appear below.
Semantic search will automatically kick in and the resulting image will be shown below.
<.live_file_input upload={@uploads.speech} class="hidden" />
Record
Transcription:
<%= if @audio_running? do %>
<% else %>
<%= if @transcription do %>
<%= @transcription %>
<% else %>
Waiting for audio input.
<% end %>
<% end %>
<%= @audio_search_result.description %>
```
That may look like a lot, but we've done just a handful of changes!
- we've reestructured our HTML so it's easier to read.
You just have a few key elements: the pill on top of the page,
a toggle that we've added (to show the **upload section**
and the **search section**) and two containers
with the upload and search section, respectively.
The code is practically intact for each section.
- we've made minor styling changes to each section.
On the _upload section_,
we added a small callout section.
In the _search section_,
we added a small calout section as well,
and added the description of the image
that is found after the audio transcription occurs.
And that's it!
What's important
**is that only one second is shown at a time on mobile devices**
and **both sections are shown on desktop devices**
(over `1024` pixels - [pertaining to the `lg` breakpoint of `TailwindCSS`](https://tailwindcss.com/docs/responsive-design)).
On desktop devices, the toggle button should disappear.
To accomplish these, we need to add a bit of `Javascript` and `CSS` magic. πͺ
Head over to `assets/js/app.js`
and add the following code.
```js
document.getElementById("upload_option").addEventListener("click", function () {
document.getElementById("upload_container").style.display = "block";
document.getElementById("search_container").style.display = "none";
document
.getElementById("upload_option")
.classList.replace("bg-white", "bg-blue-500");
document
.getElementById("upload_option")
.classList.replace("text-gray-900", "text-white");
document
.getElementById("upload_option")
.classList.replace("hover:bg-gray-50", "hover:bg-blue-600");
document
.getElementById("upload_option")
.getElementsByTagName("svg")[0]
.classList.replace("text-gray-400", "text-white");
document
.getElementById("search_option")
.classList.replace("bg-blue-500", "bg-white");
document
.getElementById("search_option")
.classList.replace("text-white", "text-gray-900");
document
.getElementById("search_option")
.classList.replace("hover:bg-blue-600", "hover:bg-gray-50");
document
.getElementById("search_option")
.getElementsByTagName("svg")[0]
.classList.replace("text-white", "text-gray-400");
});
document.getElementById("search_option").addEventListener("click", function () {
document.getElementById("upload_container").style.display = "none";
document.getElementById("search_container").style.display = "block";
document
.getElementById("search_option")
.classList.replace("bg-white", "bg-blue-500");
document
.getElementById("search_option")
.classList.replace("text-gray-900", "text-white");
document
.getElementById("search_option")
.classList.replace("hover:bg-gray-50", "hover:bg-blue-600");
document
.getElementById("search_option")
.getElementsByTagName("svg")[0]
.classList.replace("text-gray-400", "text-white");
document
.getElementById("upload_option")
.classList.replace("bg-blue-500", "bg-white");
document
.getElementById("upload_option")
.classList.replace("text-white", "text-gray-900");
document
.getElementById("upload_option")
.classList.replace("hover:bg-blue-600", "hover:bg-gray-50");
document
.getElementById("upload_option")
.getElementsByTagName("svg")[0]
.classList.replace("text-white", "text-gray-400");
});
```
The code is self-explanatory.
We are changing the styles of the toggle buttons
according to the button that is clicked.
The other thing we need to make sure is that
**both sections are shown on desktop devices**,
regardless of what the current section is selected.
Luckily, we can override styles by adding this piece of code
to `assets/css/app.css`.
```css
@media (min-width: 1024px) {
#upload_container,
#search_container {
display: block !important; /* Override any inline styles */
}
}
```
And that's it!
We can see our slightly refactored UI in all of its glory
by running `mix phx.server`!
## _Please_ star the repo! βοΈ
If you find this package/repo useful,
please star it on GitHub, so that we know! β
Thank you! π